Distributed Bayesian posterior voting strategy for massive data

Xuerui Li; Lican Kang; Yanyan Liu; Yuanshan Wu; Xuerui Li; Lican Kang; Yanyan Liu; Yuanshan Wu

doi:10.3934/era.2022098

Electronic Research Archive

2022, Volume 30, Issue 5: 1936-1953. doi: 10.3934/era.2022098

Previous Article Next Article

Research article

Distributed Bayesian posterior voting strategy for massive data

1.
School of Mathematics and Statistics, Wuhan University, China
2.
Center for Quantitative Medicine Duke-NUS Medical School, Singapore
3.
School of Statistics and Mathematics, Zhongnan University of Economics and Law, China

Received: 21 March 2022 Revised: 01 April 2022 Accepted: 06 April 2022 Published: 11 April 2022

The emergence of massive data has driven recent interest in developing statistical learning and large-scale algorithms for analysis on distributed platforms. One of the widely used statistical approaches is split-and-conquer (SaC), which was originally performed by aggregating all local solutions through a simple average to reduce the computational burden caused by communication costs. Aiming at lower computation cost and satisfactorily acceptable accuracy, this paper extends SaC to Bayesian variable selection for ultra-high dimensional linear regression and builds BVSaC for aggregation. Suppose ultrahigh-dimensional data are stored in a distributed manner across multiple computing nodes, with each computing resource containing a disjoint subset of data. On each node machine, we perform variable selection and coefficient estimation through a hierarchical Bayes formulation. Then, a weighted majority voting method BVSaC is used to combine the local results to retain good performance. The proposed approach only requires a small portion of computation cost on each local dataset and therefore eases the computational burden, especially in Bayesian computation, meanwhile, pays a little cost to receive accuracy, which in turn increases the feasibility of analyzing extraordinarily large datasets. Simulations and a real-world example show that the proposed approach performed as well as the whole sample hierarchical Bayes method in terms of the accuracy of variable selection and estimation.

Keywords:

Citation: Xuerui Li, Lican Kang, Yanyan Liu, Yuanshan Wu. Distributed Bayesian posterior voting strategy for massive data[J]. Electronic Research Archive, 2022, 30(5): 1936-1953. doi: 10.3934/era.2022098

Related Papers:

[1]	Feiyue Wang, Tang Wee Teo, Shoubao Gao . China primary school students' STEM views, attitudes, self-concept, identity and experiences: A pilot study in Shandong province. STEM Education, 2024, 4(4): 381-420. doi: 10.3934/steme.2024022
[2]	Dragana Martinovic, Marina Milner-Bolotin . Examination of modelling in K-12 STEM teacher education: Connecting theory with practice. STEM Education, 2021, 1(4): 279-298. doi: 10.3934/steme.2021018
[3]	Van Thi Hong Ho, Hanh Thi Thuy Doan, Ha Thanh Vo, Thanh Thi Nguyen, Chi Thi Nguyen, Chinh Ngoc Dao, Dzung Trung Le, Trang Gia Hoang, Nga Thi Hang Nguyen, Ngoc Hoan Le, Gai Thi Tran, Duc Trong Nguyen . Effects of teaching approaches on science subject choice toward STEM career orientation of Vietnamese students. STEM Education, 2025, 5(3): 498-514. doi: 10.3934/steme.2025024
[4]	Ruiheng Cai, Feng-kuang Chiang . A laser-cutting-centered STEM course for improving engineering problem-solving skills of high school students in China. STEM Education, 2021, 1(3): 199-224. doi: 10.3934/steme.2021015
[5]	Rommel AlAli, Wardat Yousef . Enhancing student motivation and achievement in science classrooms through STEM education. STEM Education, 2024, 4(3): 183-198. doi: 10.3934/steme.2024012
[6]	Fitore Bajrami Lubishtani, Milot Lubishtani . Advancing geodesy education: Innovative pedagogical approaches and integration into STEM curricula. STEM Education, 2025, 5(2): 229-249. doi: 10.3934/steme.2025012
[7]	Rrezart Prebreza, Bleona Beqiraj, Besart Prebreza, Arianit Krypa, Marigona Krypa . Factors influencing the lower number of women in STEM compared to men: A case study from Kosovo. STEM Education, 2025, 5(1): 19-40. doi: 10.3934/steme.2025002
[8]	Ping Chen, Aminuddin Bin Hassan, Firdaus Mohamad Hamzah, Sallar Salam Murad, Heng Wu . Impact of gender role stereotypes on STEM academic performance among high school girls: Mediating effects of educational aspirations. STEM Education, 2025, 5(4): 617-642. doi: 10.3934/steme.2025029
[9]	Changyan Di, Qingguo Zhou, Jun Shen, Li Li, Rui Zhou, Jiayin Lin . Innovation event model for STEM education: A constructivism perspective. STEM Education, 2021, 1(1): 60-74. doi: 10.3934/steme.2021005
[10]	Yicong Zhang, Yanan Lu, Xianqing Bao, Feng-Kuang Chiang . Impact of participation in the World Robot Olympiad on K-12 robotics education from the coach's perspective. STEM Education, 2022, 2(1): 37-46. doi: 10.3934/steme.2022002

Abstract

1. Introduction

As the most complex and important organ in the human body, the brain plays an indispensable role in regulating the body's motor functions and maintaining higher cognitive activities such as consciousness, language, memory, sensation and emotion. Therefore, studying the brain not only lets us have a better understanding of the brain, but it can also provide auxiliary methods for the treatment and diagnosis of brain disease. In recent years, many countries around the world have attached great importance to brain science research; researchers have started to study the brain and formed a unique brain science knowledge system. The brain-computer interface (BCI) ^[1,2] is a communication and control technology that directly translates the perceptual thinking generated by the brain into the corresponding action of external devices, which is of high scientific research value and is currently being widely applied in medical health, vehicle driving ^[3], daily life entertainment ^[4], etc. However, BCI systems still have problems that need to be solved urgently, such as a long user training time and limited online performance ^[5].

Currently, many advanced machine learning algorithms have been proposed and used in electroencephalography (EEG) classification, including Bayesian classifiers ^[6], support vector machines ^[7], linear discriminant analysis (LDA) ^[8], adaptive classifiers and deep learning classifiers ^[9]. In recent years, because transfer learning can take advantage of similarities between data, tasks or models, it has been utilized to analyze EEG signals with individual differences and those that are non-stationary, making it possible to apply models and the knowledge learned in the old domain to the new domain. In this way, not only can the classification performance of unlabeled data be improved by using labeled data, but the model training time can also be drastically reduced ^[10,11,12]. In transfer learning, a domain is defined as follows ^[13–14]: A domain $\mathcal{D}$ consists of a feature space $\mathcal{X}$ and its associated marginal probability distribution $P(X)$ , i.e., $\mathcal{D} = \{ \mathcal{X}, P(X)\}$ , where $X \in \mathcal{X}$ . A source domain ${\mathcal{D}_S}$ and a target domain ${\mathcal{D}_T}$ are different if they have different feature spaces, i.e., ${\mathcal{X}_S} \ne {\mathcal{X}_T}$ , and/or different marginal probability distributions, i.e., ${P_S}(X) \ne {P_T}(X)$ .

However, most of the existing transfer learning methods train classifiers by batch learning in an offline manner, where all source and target data are pre-given ^[15,16], whereas this assumption may not be practical in real application scenarios where collecting enough data is very time-consuming. In addition, data are often transferred in a stream and cannot be collected in their entirety. Therefore, researchers have introduced online learning into the field of transfer learning and proposed an online transfer learning (OTL) framework. Unlike online learning, which merely considers the dynamic changes of data on a data domain, OTL also considers the changes in the data distributions in the source domain and target domain. OTL has been used in areas such as online feature selection ^[17] and graphical retrieval ^[18]. It combines the advantages of dynamically updating classification models for online learning with the ability to effectively exploit knowledge from source domains of transfer learning ^[19], aiming to apply online learning tasks to target domains by transferring knowledge from the source domain. Existing OTL approaches focus on how to use knowledge from the source domain for online learning in the target domain. Most of them use strategies based on integrated learning, i.e., directly combining source and target classifiers. Zhao et al. ^[20] proposed a single-source domain OTL algorithm in homogeneous and heterogeneous spaces, and it dynamically weights the combination of source and target classifiers to form the final classifier. Kang et al. ^[21] proposed an OTL algorithm for multi-class classification, utilizing a new loss function and updating method to extend the OTL of binary classification to multi-class tasks. Based on Zhao's work, Ge et al. ^[22] first proposed a multi-source online transfer framework. Zhou et al. ^[23] proposed an online transfer strategy, reusing the features extracted from a pre-trained model, and it can achieve automatic feature extraction and achieve good classification accuracy. However, the OTL algorithm can only handle single-source domain transfer, which can easily lead to negative transfer ^[13] when facing multiple source domains. Negative transfer may happen when the source domain data and task contribute to the reduced performance of learning in the target domain.

Nevertheless, single-source transfer learning requires a high similarity between the source domain samples and the target domain samples during training. For example, in EEG signal classification, the class features of source subjects and target subjects are required to be as similar as possible. Since the knowledge that can be provided by a single source domain is limited, researchers tend to naturally think of applying transfer learning to the target domain by exploiting knowledge from multiple source domains. To utilize knowledge from multiple source domains, researchers have developed some boosting-based algorithms to adjust the weights of different domains or samples ^[24,25,26]. Eaton and DesJardins ^[24] proposed a method that uses AdaBoost to assign higher weights to source domains that are more similar to the target domain and adjust the weights of individual samples in each source domain. Yao and Doretto ^[25] proposed a multi-source transfer method that first integrates the knowledge from multiple source domains and then migrates the knowledge to the target domain; it achieved better performance on some benchmark datasets. Tan et al. ^[26] utilized different views from different source domains to assist the target domain task. Jiang et al. ^[27] proposed a general multi-source transfer framework to preserve independent information between different tasks. However, these studies assume that the data have already been known when dealing with target domain data, and they conducted transfer only via offline batch learning. Recently, after noticing the shortcomings of OTL, Wu et al. ^[17] proposed a HomOTLMS algorithm to train a classifier in each source domain and classify samples by weighting the classifiers in the source and target domains. Du et al. ^[28] proposed the HomOTL-ODDM algorithm to update the mapping matrix in an online manner, thus further reducing the differences between domains; the experimental results showed that the transfer effect can be significantly improved after considering the differences in data distributions. Li et al. ^[29] proposed an online EEG classification method based on instance transfer (OECIT). It can align the EEG data online, and combines with HomOTL, greatly reducing computational and memory costs.

To address the negative transfer problem in online single-source transfer and further reduce the individual differences between subjects, this paper presents a multi-source online transfer EEG classification algorithm based on source domain selection (SDS). Different from the HomOTLMS and HomOTL-ODDM algorithms, the proposed algorithm dynamically selects several suitable source domains and uses the multi-source online transfer algorithm to train a classifier in each selected source domain; it then combines the trained classifier with the target domain classifier for weighting, which not only reduces the training time but also achieves better classification results in the target domain. The algorithm was applied to two publicly available motor imagery EEG datasets and analyzed in comparison with the same type of multi-source online transfer algorithms to confirm the superiority of this algorithm.

2. Materials and methods

2.1. Euclidean alignment (EA)

Suppose a subject has $n$ trials; we can have

$\bar {\boldsymbol{E }} = \frac{1}{n}\sum\limits_{i = 1}^n {{{\boldsymbol{ X}}_i}{\boldsymbol{X }}_i^T}$

(1)

where $\bar {\boldsymbol{ E}}$ is the Euclidean mean of all EEG trials from a subject and ${{\boldsymbol{X }}_i} \in {\mathbb{R}^{c \times t}}$ denotes the trial segments of EEG signals, where $c$ is the number of EEG channels and $t$ is the number of time samples. Then, we perform alignment by using

${\tilde {\boldsymbol{X }}_i} = {\bar {\boldsymbol{E }}^{ - 1/2}}{{\boldsymbol{X }}_i}$

(2)

2.2. Common spatial pattern (CSP) feature extraction

Given a spatiotemporal EEG signal $X$ of one $N$ -channel, where $X$ is an $N\times T$ matrix and $T$ denotes the number of samples per channel, the normalized covariance matrix of the EEG signal is shown below.

${\boldsymbol{C }} = \frac{{{\boldsymbol{X }}{{\boldsymbol{X }}^T}}}{{trace({\boldsymbol{X }}{{\boldsymbol{X }}^T})}}$

(3)

The covariance matrices ${C}_{1}$ and ${C}_{2}$ for each category can be calculated from the sample means. The projection matrix of CSP is

${{\boldsymbol{W }}_{csp}} = {{\boldsymbol{ U}}^T}{\boldsymbol{P }}$

(4)

where $U$ is the orthogonal matrix and $P$ is the whitening feature matrix.

After filtering with the projection matrix ${W}_{csp}$ , the feature matrix is obtained:

${{\boldsymbol{ Z}}_0} = {{\boldsymbol{ W}}_{csp}}^T{\boldsymbol{X }}$

(5)

The first $m$ rows and the last $m$ rows of the feature matrix are taken to construct the matrix $Z = \left(\begin{array}{cccc}{z}_{1} & {z}_{2} & \cdots & {z}_{2m}\end{array}\right)\in {R}^{N\times 2m}$ . The feature vector $F = {\left(\begin{array}{cccc}{f}_{1} & {f}_{2} & \cdots & {f}_{2m}\end{array}\right)}^{T}\in {R}^{2m\times 1}$ can be obtained after normalization.

2.3. SDS

The SDS method can select the nearest source domain to reduce computational cost and potentially improve classification performance.

Supposing that there are Z different source domains, for the z-th source domain, the average feature vector for each source domain class ${m}_{z, c}\left(c = \mathrm{1, 2}\right)$ is calculated first. After obtaining a small number of target domain labels, the known label information is used to calculate the average feature vector for each target domain class ${m}_{t, c}$ . Then, the distance between the two domains is expressed as

$d\left(z,t\right) = {\sum }_{c = 1}^{2}‖{m}_{z,c}-{m}_{t,c}‖$

(6)

The next step is to cluster these distances ${\left\{d\left(z, t\right)\right\}}_{z = 1, \dots, Z}$ to the k means. Assuming that the clusters are divided into $\left({C}_{1}, {C}_{2}, \dots, {C}_{k}\right)$ , the objective is to minimize the squared error E, which can be expressed as

$E = {\sum }_{i = 1}^{k}{\sum }_{x\in {C}_{i}}{‖d\left(z,t\right)-{\mu }_{i}‖}_{2}^{2}$

(7)

where ${\mu }_{i}$ is the mean vector of cluster ${C}_{i}$ , which is also known as a centroid, with the following expression:

${\mu }_{i} = \frac{1}{\left|{C}_{i}\right|}{\sum }_{x\in {C}_{i}}d\left(z,t\right)$

(8)

Finally, the cluster with the smallest center of mass, i.e., the source domain closest to the target domain, is selected. In this way, OTL is performed for only $Z/l$ source domains where $l$ is the number of clusters of the k-means cluster. The larger the value of $l$ , the lower the computational cost. However, when $l$ is too large, there may not be enough source domains for classification, resulting in unstable classification performance. Therefore, to minimize computational cost and improve classification performance, $l$ was set to equal 2.

2.4. Multi-source OTL

In this section, a multi-source OTL with SDS (MSOTL-SDS) algorithm is proposed. The algorithm flow is shown in Figure 1.

Figure 1. Flowchart for the multi-source OTL algorithm based on SDS.

DownLoad: Full-Size Img PowerPoint

We set $n$ source domains ${D}^{S} = \{{D}^{{S}_{1}}, {D}^{{S}_{2}}, \dots, {D}^{{S}_{n}}\}$ and a certain number of labeled samples $\left\{\left(\left({x}_{t}, {y}_{t}\right)|t = \mathrm{1, 2}, \dots, m\right)\right\}$ from the target domain ${D}^{T}$ . For the i-th source domain ${D}^{{S}_{i}}$ , ${x}^{{S}_{i}}\times y$ represents the source domain data space, ${x}^{{s}_{i}}\in {R}^{{d}_{i}}$ represents the feature space and $y = \left\{-1, +1\right\}$ represents the label space. ${f}^{{S}_{i}}$ represents the classifier learned on the second source domain. $x\times y$ denotes the target domain data space, the feature space is $x{\in R}^{d}$ and the target domain and source domain share the same label space $y$ .

When the source domain data are extracted by domain alignment and CSP features, the source domains differing largely from the target domain samples are eliminated by SDS, which leaves only $k$ ( $1 < k < n$ ) source domains; the classifier is trained by training each of these $k$ source domains ${D}^{S} = \{{D}^{{S}_{1}}, {D}^{{S}_{2}}, \dots, {D}^{{S}_{k}}\}$ and then assigning weights accordingly. In online mode, after domain alignment and feature extraction by online Euclidean space data alignment (OEA) and CSP, the samples $\left({x}_{t}, {y}_{t}\right)$ are sent to a classifier ${F}_{t}\left(\cdot \right)$ , which is weighted by all classifiers, for label prediction. After obtaining the true label, the weight of the classifier and the online target domain classifier are updated by the loss function.

Assume that the target domain classifier is given as follows:

${f}^{T}\left(x\right) = {\sum }_{i = 1}^{t}{\alpha }_{i}{y}_{i}k\left({x}_{i},x\right)$

(9)

where ${\alpha }_{i}$ is the coefficient of the i-th target sample.

Set the weight vector of the source domain classifier as ${u}_{t} = {\left({u}_{t}^{1}, {u}_{t}^{2}, \dots, {u}_{t}^{n}\right)}^{T}$ , construct the target domain weight variable $v$ and apply the Hedge algorithm to dynamically update the weights of the source and target domain classifiers as follows:

${u}_{t+1}^{i} = {u}_{t}^{i}{\beta }^{{Z}_{t}^{i}},{Z}_{t}^{i} = I\left(sign\left({y}_{t}{f}^{{S}_{i}}\left({x}_{t}\right)\right) < 0\right),i = \mathrm{1,2},\dots ,n$

(10)

${v}_{t+1} = {v}_{t}{\beta }^{{Z}_{t}^{v}},{Z}_{t}^{v} = I\left(sign\left({y}_{t}{f}_{t}^{T}\left({x}_{t}\right)\right) < 0\right),i = \mathrm{1,2},\dots ,n$

(11)

Finally, its class label is predicted by the following prediction function:

${\stackrel{~}{y}}_{t} = \text{sign}\left({\sum }_{i = 1}^{n}{p}_{t}^{i}{f}^{{S}_{i}}\left({x}_{t}\right)+{p}_{t}^{v}{f}_{t}^{T}\left({x}_{t}\right)\right)$

(12)

where ${p}_{t}^{i}$ and ${p}_{t}^{v}$ correspond to the weights of the classifiers in the source domain and target domain, respectively, and the calculation formula is given as follows:

${p}_{t}^{i} = \frac{{u}_{t}^{i}}{{\sum }_{j = 1}^{n}{u}_{t}^{j}+{v}_{t}},{p}_{t}^{v} = \frac{{v}_{t}}{{\sum }_{j = 1}^{n}{u}_{t}^{j}+{v}_{t}}$

(13)

Algorithm 1 MSOTL-SDS algorithm
Input: source domain classifier ${f}^{S}=\left({f}^{{S}_{1}}, {f}^{{S}_{2}}, \dots, {f}^{{S}_{k}}\right)$ , initial trade-off parameter $C$ , discount weight $\beta \in \left(\mathrm{0, 1}\right)$
Initialization: online classifier ${f}_{1}^{T}=$ 0, weight ${u}_{1}=\frac{1}{n+1}, {v}_{1}=\frac{1}{n+1}$
// SDS section for $s=\mathrm{1, 2}, \dots, k$ do The distance $d\left(s, t\right)$ from the sth source domain to the target domain is calculated by using Eq (6) end if The k-means clustering of ${\left\{d\left(S, t\right)\right\}}_{s=1, \dots, k}$ is performed to select the source domain with a smaller center of mass to form ${S}^{\text{'}}={S}_{1}, {S}_{2}, \dots, {S}_{n}\left(n < k\right)$ // Multi-source online transfer
for $t=\mathrm{1, 2}, \dots, m$ do
Receiving test specimens. ${x}_{t}\in X$ Calculate the classifier weights by using Eq (10) The predicted label is obtained by using Eq (12)
Receive real tags ${y}_{t}=\left\{-1, +1\right\}$
The weights of each classifier are updated by Eqs (10) and (11)
Calculate the loss function ${l}_{t}={\left[1-{y}_{t}{f}_{t}^{T}{x}_{t}\right]}_{+}$
if ${l}_{k} > 0$ then
${f}_{t+1}^{T}={f}_{t}^{T}+{\tau }_{t}{y}_{t}{x}_{t},$ where ${\tau }_{t}=min\left\{C, {l}_{t}/{‖{x}_{t}‖}^{2}\right\}$
end if
end for output: ${f}_{t}\left(x\right)=\text{sign}\left({\sum }_{i=1}^{n}{p}_{t}^{i}{f}^{{S}_{i}}\left({x}_{t}\right)+{p}_{t}^{v}{f}_{t}^{T}\left({x}_{t}\right)\right)$

| Show Table

DownLoad: CSV

3. Experiments and result analysis

3.1. Data description

Here, the algorithm was evaluated on two publicly available datasets of motor imagery EEG signals, where the first dataset was Dataset Ⅱa from BCI Competition Ⅳ ^[30] and the second dataset was Dataset 2 from BNCI Horizon 2020 ^[31]. These two datasets have multiple subjects and are suitable for multi-source classification.

1) Dataset Ⅱa of BCI Competition Ⅳ. It comprises 22-channel EEG signals obtained from two different sessions of nine healthy subjects, and the sampling rate was 250 Hz. Each subject was instructed to perform four motor imagery tasks, including movements of the left hand, right hand, feet and tongue. Each task had 72 trials in one session.

2) Dataset 2 (BNCI Horizon 2020 Dataset). It consists of EEG data from 14 healthy subjects, eight of whom were children. The data for each subject consists of two categories of motor imagery EEG of the subject's right hand and foot, with 50 samples in each category. Each experimental signal was recorded by using 15 electrodes, with electrode positions following the International 10–20 system; 15 channels of EEG signals were recorded and sampled at 512 Hz.

3.2. Data preprocessing

The preprocessing part is introduced first. EEG signals from 2.5 to 5.5 s were selected for BCI Competition Ⅳ Dataset Ⅱa, and, for Dataset 2 from BNCI Horizon 2020, the EEG signals ranged from 5.5 to 8.5 s. Bandpass filtering was performed by using a 5th-order Butterworth filter with the frequency ranging from 8 to 30 Hz. The samples in the target domain were randomly arranged 20 times and the online experiment was repeated 20 times; finally, the average results of the 20 repetitions were recorded.

3.3. Comparison methods

To verify the effectiveness, the proposed MSOTL-SDS algorithm is compared against six state-of-the-art algorithms for EEG classification, grouped as follows:

ⅰ) OECIT-Ⅰ ^[29]: After aligning the sample domains of the target domain using OEA, the classification was performed with an OTL algorithm.

ⅱ) OECIT-Ⅱ ^[29]: After aligning the sample domain of the target domain using OEA, the classification was performed with an OTL algorithm; a different weight update strategy was utilized instead of OECIT-Ⅰ.

ⅲ) HomOTLMS ^[17]: Migrate multiple-source domain knowledge in the OTL process by constructing the final classifier in a band-weighted classifier integration approach.

ⅳ) HomOTL-ODDM ^[28]: A multi-source online transfer algorithm for the simultaneous reduction of marginal distribution and conditional distribution differences between domains via a linear feature transformation process.

ⅴ) EA-CSP-LDA ^[32]: EA was used to align the data from different domains; then, the source domain data were used to design the filters and an LDA classifier was employed for classification.

ⅵ) CA-Joint distribution adaptation (JDA) ^[33]: The data were aligned before the JDA algorithm.

ⅶ) Riemannian alignment-minimum distance to the Riemannian mean (RA-MDMR) ^[34]: It is a Riemannian space method that centers the covariance matrix relative to the reference covariance matrix.

4. Results and discussion

A comparison of the online classification results for the two datasets are given in Tables 1 and 2. The MSOTL-SDS algorithm achieved the highest average accuracies of 79.29 and 70.86% for the two datasets, respectively.

Table 1. Comparison of online classification accuracy on BCI Competition Ⅳ Dataset Ⅱa.

Subject	1	2	3	4	5	6	7	8	9	Avg
RA-MDMR	72.22	56.94	84.03	65.97	60.42	67.36	61.81	86.81	82.64	70.91
CA-JDA	66.75	47.45	65.52	59.34	54.55	54.27	54.27	73.01	65.18	60.04
EA-CSP-JDA	86.08	56.84	97.74	72.26	51.56	65.56	68.51	89.10	72.43	73.34
OECIT-Ⅰ	88.91	56.31	97.56	73.57	58.44	67.01	71.04	92.88	75.87	75.73
OECIT-Ⅱ	89.51	55.49	97.88	74.06	58.03	66.64	71.09	92.98	75.94	75.74
HomOTLMS	89.25	56.36	97.84	74.22	58.90	68.17	73.92	93.69	76.51	76.54
HomOTL-ODDM	90.91	57.31	97.54	75.57	58.84	68.01	74.04	93.88	75.87	76.89
MSOTL-SDS	90.88	60.93	97.80	76.78	60.88	73.64	78.09	94.69	79.94	79.29

| Show Table

DownLoad: CSV

Table 2. Comparison of online classification accuracy on Dataset 2.

Subject	OECIT-Ⅰ	OECIT-Ⅱ	HomOTLMS	HomOTL-ODDM	MSOTL-SDS
1	54.96	55.48	56.28	59.29	62.31
2	63.31	64.03	67.43	68.49	71.77
3	62.74	63.21	64.33	66.53	72.24
4	76.35	76.36	76.98	78.43	77.86
5	55.88	55.83	56.46	57.98	60.61
6	47.29	48.68	47.36	48.26	49.67
7	58.46	58.59	58.33	59.44	60.79
8	84.79	83.57	84.64	85.68	84.96
9	81.73	82.45	82.92	83.79	85.93
10	74.63	76.44	77.38	78.29	78.16
11	70.96	71.08	72.68	73.46	72.76
12	57.46	58.44	58.69	59.78	62.21
13	74.33	73.29	74.33	75.33	78.38
14	68.26	68.78	70.26	71.33	74.35
Avg	66.51	66.87	67.72	69.01	70.86

| Show Table

DownLoad: CSV

For BCI Competition Ⅳ Dataset Ⅱa, we mainly selected 22-channel EEG data from the left and right hands. The classification accuracy of the MSOTL-SDS algorithm was higher than those of the other seven algorithms for more than half of the subjects. The HomOTLMS and HomOTL-ODDM algorithms had slightly lower accuracies, and HomOTL-ODDM was better than HomOTLMS. More importantly, except for Subject 3, the classification accuracies of the multi-source online transfer algorithms were higher than that of the single-source online transfer algorithm OECIT, indicating that the multi-source online transfer algorithm can effectively eliminate the effect of negative transfer when dealing with multiple source domains. The classification accuracy of the MSOTL-SDS algorithm was 2.4% higher than that of the best multi-source online transfer algorithm, HomOTL-ODDM, and 3.55% higher than the single-source online learning algorithm OECIT-Ⅱ.

For Dataset 2, the MSOTL-SDS algorithm achieved the best classification accuracy on EEG data for subjects other than Subjects 4, 8, 10 and 11, while HomOTL-ODDM performed best on these four subjects, indicating that, for multi-source online classification, it is necessary to consider the conditional and marginal distributions of the samples. Similarly, the accuracies of the MSOTL-SDS algorithm were 1.85 and 3.99% higher than HomOTL-ODDM and OECIT-Ⅱ, respectively.

Datasets 1 and 2 are both multi-source categorical datasets taken from healthy subjects, and they both have data on multiple subjects. The difference is that Dataset Ⅱa is from the earlier BCI Competition Ⅳ 2008, and it has a lower sampling rate and a higher number of EEG channels; it also contains more EEG feature information. Dataset 2 is from BNCI Horizon 2020, and it has a higher sampling rate but reduced number of channels; it is also more difficult, so all eight algorithms, including the MSOTL-SDS algorithm, achieved better performance on Dataset Ⅱa.

In Tables 1 and 2, it is noticeable that the MSOTL-SDS algorithm showed significant improvement in terms of accuracy for some subjects with poorly differentiated class features, such as Subjects 2, 6 and 7 in BCI Competition Ⅳ Dataset Ⅱa and Subjects 1, 2, 3, 5, 9, 12 and 14 in Dataset 2 from BNCI Horizon 2020; the single-source OECIT and HomOTL-ODDM for each subject achieved average improvements of 5.85 and 3.69%, respectively. This indicates that MSOTL-SDS can effectively discover subjects with high similarity to the target subjects' characteristics by applying SDS when dealing with a multi-source online classification task, thus reducing individual differences and improving classification accuracy. Finally, as shown in Table 3, we compare the average online classification accuracy of some different transfer learning methods on the two datasets ^[35].

Table 3. Average online classification accuracies of different transfer learning approaches on Datasets Ⅱa and 2.

Algorithm	Dataset Ⅱa Acc. (%)	Dataset 2 Acc. (%)
EA-RCSP-LDA ^[35]	73.72	64.09
EA-RCSP-OwAR ^[35]	74.78	65.71
OECIT-Ⅰ ^[29]	75.73	66.51
OECIT-Ⅱ ^[29]	75.74	66.87
HomOTLMS ^[33]	76.54	67.72
HomOTL-ODDM ^[34]	76.89	69.01
MSOTL-SDS	79.29	70.86

| Show Table

DownLoad: CSV

Table 4 gives the computing time consumption of the five algorithms on different datasets. Dataset 2 is less time-consuming because it has fewer EEG signal channels than Dataset Ⅱa. OECIT handles the online single-source transfer task and thus takes the least amount of time. HomOTLMS and HomOTL-ODDM cost more time due to the increased complexity of the algorithm. MSOTL-SDS uses a multi-source online transfer framework, requiring less source domain data to train the classifier; so, it has an advantage in terms of time consumption. The reduction of computing time was 4.96 and 2.34 s on BCI Competition Ⅳ Dataset Ⅱa and Dataset 2 from BNCI Horizon 2020, respectively, relative to HomOTLMS. Additionally, compared with HomOTL-ODDM, MSOTL-SDS was 36.56 and 31.28 s faster on the two datasets, respectively.

Table 4. Comparison of time cost using different algorithms.

	Dataset Ⅱa		Dataset 2
	Mean (s)	Std (s)	Mean (s)	Std (s)
EA-CSP-LDA	9.9918	0.9250	-	-
RA-MDRM	172.7034	27.7032	-	-
CA-JDA	68.0582	1.2880	-	-
OECIT-Ⅰ	7.9428	0.8685	4.6735	0.7435
OECIT-Ⅱ	7.8275	0.6577	4.7823	0.6421
HomOTLMS	40.6568	1.9549	20.8283	1.8543
HomOTL-ODDM	72.2592	4.9532	49.7724	4.7286
MSOTL-SDS	35.6982	1.9368	18.4931	1.9254

| Show Table

DownLoad: CSV

Figure 2(a), (b) show the average online classification accuracy curves with the number of samples for Datasets Ⅱa and 2, respectively. The poor performance of the OECIT algorithm on a multi-source online classification task is mainly because it views multiple source subjects as a single individual during its training, which can cause negative transfer problems since there exist large individual differences among source subjects. For both datasets, MSOTL-SDS demonstrated the best performance in more than half of the subject accuracy variation plots. The HomOTLMS and HomOTL-ODDM algorithms present similar trends in most of the subject accuracy variation curves, which can be explained by their similar multi-source OTL framework.

Figure 2. Variation curves of mean online classification accuracy according to the number of samples for the two datasets.

DownLoad: Full-Size Img PowerPoint

In order to compare the differences between the proposed MSOTL-SDS algorithm and the other four algorithms, a paired t-test was applied to the classification accuracy of the figure, and the significance level was set as $\alpha = 0.05$ . The test results are shown in Table 5. The results show that MSOTL-SDS was significantly better than other algorithms on Dataset Ⅱa, but there was no significant difference between MSOTL-SDS and HomOTL-ODDM on Dataset 2. This indicates that the multi-source online transfer algorithm is effective in eliminating the effect of negative transfer when dealing with multiple source domains. Moreover, for multi-source online classification, reducing the differences in conditional and edge distributions of samples is an effective method.

Table 5. P-values of different algorithms according to paired t-test.

	MSOTL-SDS
	Dataset Ⅱa	Dataset 2
EA-CSP-LDA	0.0004	-
RA-MDRM	0.0081	-
CA-JDA	0.0001	-
OECIT-Ⅰ	0.0010	0.0003
OECIT-Ⅱ	0.0005	0.0004
HomOTLMS	0.0034	0.0021
HomOTL-ODDM	0.0018	0.0627

| Show Table

DownLoad: CSV

5. Conclusions

This methodology explores the negative transfer problem in online single-source transfer and tries to reduce the individual differences between subjects. We proposed a multi-source OTL method based on SDS, which was applied for the online classification study of motor imagery EEG signals. By utilizing some of the target domain sample labels in advance, the SDS method can select the source domain data that are similar to the target domain data while eliminating the source domain data that are different from the target domain data. Then, by combining the weighting coefficients with the online classifier, each selected source domain is trained into a separate classifier, which is employed to predict sample labels. The proposed method was validated on two motor imagery EEG datasets and compared with other online transfer methods. The experimental results show that our MSOTL-SDS algorithm achieved the best performance, the best accuracy and the fastest calculation speed in the online scenarios. However, there are still some limitations of this study. The difference in conditional distribution between the source domain and the target domain was not fully considered. Hence, in future work, we can utilize some advanced transfer learning algorithms, such as balanced distribution adaptation ^[37] and manifold embedded distribution alignment ^[37], to align the conditional distribution between domains and improve the proposed method.

Acknowledgments

This work was supported by Zhejiang Provincial Natural Science Foundation of China (Grant No. LZ22F010003) and the National Natural Science Foundation of China (Grant Nos. 61871427 and 62071161). The authors also gratefully acknowledge the research funding supported by the National Students' Innovation and Entrepreneurship Training Program (No. 202210336034).

Conflict of interest

The authors declare that there is no conflict of interest.

References

[1]	Y. Zhang, M. J. Wainwright, J. C. Duchi, Communication-efficient algorithms for statistical optimization, Adv. Neural Inf. Process. Syst., 25 (2012). https://doi.org/10.1109/CDC.2012.6426691 doi: 10.1109/CDC.2012.6426691
[2]	A. Kleiner, A. Talwalkar, P. Sarkar, M. Jordan, The big data bootstrap, arXiv preprint, (2012), arXiv: 1206.6415.
[3]	T. Zhao, G. Cheng, H. Liu, A partially linear framework for massive heterogeneous data, Ann. Stat., 44 (2016), 1400–1437. https://doi.org/10.1214/15-AOS1410 doi: 10.1214/15-AOS1410
[4]	Q. Xu, C. Cai, C. Jiang, F. Sun, X. Huang, Block average quantile regression for massive dataset, Stat. Pap. (Berl), 61 (2020), 141–165. https://doi.org/10.1007/s00362-017-0932-6 doi: 10.1007/s00362-017-0932-6
[5]	H. Battey, J. Fan, H. Liu, J. Lu, Z. Zhu, Distributed testing and estimation under sparse high dimensional models, Ann. Stat., 46 (2018), 1352. https://doi.org/10.1214/17-AOS1587 doi: 10.1214/17-AOS1587
[6]	J. Fan, D. Wang, K. Wang, Z. Zhu, Distributed estimation of principal eigenspaces, Ann. Stat., 47 (2019), 3009–3031. https://doi.org/10.1214/18-AOS1713 doi: 10.1214/18-AOS1713
[7]	J. D. Lee, Q. Liu, Y. Sun, J. E. Taylor, Communication-efficient sparse regression, J. Mach. Learn. Res., 18 (2017), 115–144.
[8]	A. Javanmard, A. Montanari, Confidence intervals and hypothesis testing for high-dimensional regression, J. Mach. Learn. Res., 15 (2014), 2869–2909.
[9]	X. Chen, M.-g. Xie, A split-and-conquer approach for analysis of extraordinarily large data, Stat. Sin., (2014), 1655–1684.
[10]	Y. Zhang, J. Duchi, M. Wainwright, Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates, J. Mach. Learn. Res., 16 (2015), 3299–3340.
[11]	F. Liang, Q. Song, K. Yu, Bayesian subset modeling for high-dimensional generalized linear models, J. Am. Stat. Assoc., 108 (2013), 589–606. https://doi.org/10.1080/01621459.2012.761942 doi: 10.1080/01621459.2012.761942
[12]	Q. Song, F. Liang, A split-and-merge bayesian variable selection approach for ultrahigh dimensional regression, J. R. Stat. Soc. Series B Stat. Methodol., 77 (2015), 947–972. https://doi.org/10.1111/rssb.12095 doi: 10.1111/rssb.12095
[13]	T. Park, G. Casella, The bayesian lasso, J. Am. Stat. Assoc., 103 (2008), 681–686. https://doi.org/10.1198/016214508000000337 doi: 10.1198/016214508000000337
[14]	R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Series B Stat. Methodol., 58 (1996), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x doi: 10.1111/j.2517-6161.1996.tb02080.x
[15]	M. Yuan, Y. Lin, Efficient empirical bayes variable selection and estimation in linear models, J. Am. Stat. Assoc., 100 (2005), 1215–1225. https://doi.org/10.1198/016214505000000367 doi: 10.1198/016214505000000367
[16]	C. Hans, Bayesian lasso regression, Biometrika, 96 (2009), 835–845. https://doi.org/10.1093/biomet/asp047 doi: 10.1093/biomet/asp047
[17]	H. Mallick, N. Yi, A new bayesian lasso, Stat. Interface, 7 (2014), 571–582. https://doi.org/10.4310/SII.2014.v7.n4.a12 doi: 10.4310/SII.2014.v7.n4.a12
[18]	F. Liang, Y. K. Truong, W. H. Wong, Automatic bayesian model averaging for linear regression and applications in bayesian curve fitting, Sta. Sin., 1005–1029. http://www.jstor.org/stable/24306895
[19]	G. Casella, M. Ghosh, J. Gill, M. Kyung, Penalized regression, standard errors, and bayesian lassos, Bayesian Anal., 5 (2010), 369–411. https://doi.org/10.1214/10-BA607 doi: 10.1214/10-BA607
[20]	M. Yuan, Y. Lin, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Series B Stat. Methodol., 68 (2006), 49–67. https://doi.org/10.1111/j.1467-9868.2005.00532.x doi: 10.1111/j.1467-9868.2005.00532.x
[21]	H. Zou, T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Series B Stat. Methodol., 67 (2005), 301–320. https://doi.org/10.1080/01621459.2014.881153 doi: 10.1080/01621459.2014.881153
[22]	S. Kundu, D. B. Dunson, Bayes variable selection in semiparametric linear models, J. Am. Stat. Assoc., 109 (2014), 437–447. https://doi.org/10.1080/01621459.2014.881153 doi: 10.1080/01621459.2014.881153
[23]	N. Meinshausen, P. Bühlmann, Stability selection, J. R. Stat. Soc. Series B Stat. Methodol., 72 (2010), 417–473. https://doi.org/10.1111/j.1467-9868.2010.00740.x doi: 10.1111/j.1467-9868.2010.00740.x
[24]	R. D. Shah, R. J. Samworth, Variable selection with error control: another look at stability selection, J. R. Stat. Soc. Series B Stat. Methodol., 75 (2013), 55–80. https://doi.org/10.1111/j.1467-9868.2011.01034.x doi: 10.1111/j.1467-9868.2011.01034.x
[25]	G. Casella, Empirical bayes gibbs sampling, Biostatistics, 2 (2001), 485–500. https://doi.org/10.1093/biostatistics/2.4.485 doi: 10.1093/biostatistics/2.4.485
[26]	A. Bhattacharya, D. Pati, N. S. Pillai, D. B. Dunson, Dirichlet-laplace priors for optimal shrinkage, J. Am. Stat. Assoc., 110 (2015), 1479–1490. https://doi.org/10.1080/01621459.2014.960967 doi: 10.1080/01621459.2014.960967
[27]	C. Leng, M.-N. Tran, D. Nott, Bayesian adaptive lasso, Ann. Inst. Stat. Math., 66 (2014), 221–244. https://doi.org/10.1007/s10463-013-0429-6 doi: 10.1007/s10463-013-0429-6
[28]	H. Mallick, N. Yi, Bayesian methods for high dimensional linear models, J. Biometrics Biostatistics, 1 (2013), 005. https://doi.org/10.4172/2155-6180.S1-005 doi: 10.4172/2155-6180.S1-005

This article has been cited by:

1.	Mengfan Li, Jundi Li, Zhiyong Song, Haodong Deng, Jiaming Xu, Guizhi Xu, Wenzhe Liao, EEGNet-based multi-source domain filter for BCI transfer learning, 2024, 62, 0140-0118, 675, 10.1007/s11517-023-02967-z
2.	Xiaowei Zhang, Zhongyi Zhou, Qiqi Zhao, Kechen Hou, Xiangyu Wei, Sipo Zhang, Yikun Yang, Yanmeng Cui, Discriminative Joint Knowledge Transfer With Online Updating Mechanism for EEG-Based Emotion Recognition, 2024, 11, 2329-924X, 2918, 10.1109/TCSS.2023.3314508
3.	Muhui Xue, Baoguo Xu, Lang Li, Jingyu Ping, Minmin Miao, Huijun Li, Aiguo Song, Mean-based geodesic distance alignment transfer for decoding natural hand movement from MRCPs, 2025, 247, 02632241, 116836, 10.1016/j.measurement.2025.116836

Reader Comments

Your name:*

Email:*
© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Electronic Research Archive

1.1 1.7

Metrics

Article views(1874) PDF downloads(66) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(3) / Tables(4)

Electronic Research Archive

Distributed Bayesian posterior voting strategy for massive data

Related Papers:

Abstract

1. Introduction

2. Materials and methods

2.1. Euclidean alignment (EA)

2.2. Common spatial pattern (CSP) feature extraction

2.3. SDS

2.4. Multi-source OTL

3. Experiments and result analysis

3.1. Data description

3.2. Data preprocessing

3.3. Comparison methods

4. Results and discussion

5. Conclusions

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Electronic Research Archive

Distributed Bayesian posterior voting strategy for massive data

Related Papers:

Abstract

1. Introduction

2. Materials and methods

2.1. Euclidean alignment (EA)

2.2. Common spatial pattern (CSP) feature extraction

2.3. SDS

2.4. Multi-source OTL

3. Experiments and result analysis

3.1. Data description

3.2. Data preprocessing

3.3. Comparison methods

4. Results and discussion

5. Conclusions

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog