Commentary Special Issues

Learning-Related Synaptic Reconfiguration in Hippocampal Networks: Memory Storage or Waveguide Tuning?

  • Received: 17 February 2015 Accepted: 27 February 2015 Published: 02 March 2015
  • A fundamental assumption of current hypotheses regarding hippocampal involvement in memory formation is that changes in synaptic connections between hippocampal neurons serve to encode information about recent events. An alternative possibility is that synaptic changes are “ruts” left by dynamic traveling waves involved in memory processes rather than a mechanism for storing memories of events. Specifically, traveling waves of activity in corticohippocampal circuits may sometimes modify those circuits as they propagate through them, thereby changing the paths and qualities of future wave patterns.

    Citation: Eduardo Mercado III. Learning-Related Synaptic Reconfiguration in Hippocampal Networks: Memory Storage or Waveguide Tuning?[J]. AIMS Neuroscience, 2015, 2(1): 28-34. doi: 10.3934/Neuroscience.2015.1.28

    Related Papers:

    [1] Anupama N, Sudarson Jena . A novel approach using incremental under sampling for data stream mining. Big Data and Information Analytics, 2018, 3(1): 1-13. doi: 10.3934/bdia.2017017
    [2] Cai-Tong Yue, Jing Liang, Bo-Fei Lang, Bo-Yang Qu . Two-hidden-layer extreme learning machine based wrist vein recognition system. Big Data and Information Analytics, 2017, 2(1): 59-68. doi: 10.3934/bdia.2017008
    [3] Tieliang Gong, Qian Zhao, Deyu Meng, Zongben Xu . Why Curriculum Learning & Self-paced Learning Work in Big/Noisy Data: A Theoretical Perspective. Big Data and Information Analytics, 2016, 1(1): 111-127. doi: 10.3934/bdia.2016.1.111
    [4] Yiwen Tao, Zhenqiang Zhang, Bengbeng Wang, Jingli Ren . Motality prediction of ICU rheumatic heart disease with imbalanced data based on machine learning. Big Data and Information Analytics, 2024, 8(0): 43-64. doi: 10.3934/bdia.2024003
    [5] Nickson Golooba, Woldegebriel Assefa Woldegerima, Huaiping Zhu . Deep neural networks with application in predicting the spread of avian influenza through disease-informed neural networks. Big Data and Information Analytics, 2025, 9(0): 1-28. doi: 10.3934/bdia.2025001
    [6] Xiangmin Zhang . User perceived learning from interactive searching on big medical literature data. Big Data and Information Analytics, 2017, 2(3): 239-254. doi: 10.3934/bdia.2017019
    [7] M Supriya, AJ Deepa . Machine learning approach on healthcare big data: a review. Big Data and Information Analytics, 2020, 5(1): 58-75. doi: 10.3934/bdia.2020005
    [8] Jason Adams, Yumou Qiu, Luis Posadas, Kent Eskridge, George Graef . Phenotypic trait extraction of soybean plants using deep convolutional neural networks with transfer learning. Big Data and Information Analytics, 2021, 6(0): 26-40. doi: 10.3934/bdia.2021003
    [9] Jian-Bing Zhang, Yi-Xin Sun, De-Chuan Zhan . Multiple-instance learning for text categorization based on semantic representation. Big Data and Information Analytics, 2017, 2(1): 69-75. doi: 10.3934/bdia.2017009
    [10] Jiaqi Ma, Hui Chang, Xiaoqing Zhong, Yueli Chen . Risk stratification of sepsis death based on machine learning algorithm. Big Data and Information Analytics, 2024, 8(0): 26-42. doi: 10.3934/bdia.2024002
  • A fundamental assumption of current hypotheses regarding hippocampal involvement in memory formation is that changes in synaptic connections between hippocampal neurons serve to encode information about recent events. An alternative possibility is that synaptic changes are “ruts” left by dynamic traveling waves involved in memory processes rather than a mechanism for storing memories of events. Specifically, traveling waves of activity in corticohippocampal circuits may sometimes modify those circuits as they propagate through them, thereby changing the paths and qualities of future wave patterns.


    1. Introduction

    Normal machine learning problems require learning model to learn information from all the achieved data and all the data are stored. However, in practice, the data are usually updated all the time and new information is necessary to be learned from the new data [19]. It is usually time consuming to learn new information with accessing to the previous data and storing the learned data is also expensive. In this situation, the learning model is required to have the ability of learning new information from new data and preserving the previously learned information without accessing the previous data. This learning model is called incremental learning [8], [21].

    In incremental learning, the whole data set is not available in a lump. In another word, we can only get a part of the whole data set every time. We suppose that the whole data set S is divided into T subsets, i.e., $S_{1},S_{2},…,S_{T}$. The rules (e.g., classification boundaries in classification problems) of $S$ and $S_{t}$ are denoted as $R$ and $R_{t}$ respectively. The aim of the learning model is to learn R by learning $R_{t}$ from $S_{t}$ respectively. The main difficulty is that the previously learned rules may be forgotten when the model learns new rules from new data subsets, especially when the rules of different data subsets are different. This phenomenon was called catastrophic forgetting. If $R_{1}=R_{2}=...=R_{T}$, the learning model can learn $R_{1}$ form $S_{1}$ and $R_{1}$ will not be forgotten when new data subsets are learned. In this case, incremental learning is not real challenging. However, in practice, $R_{t}$ are usually different between different data subsets, so the catastrophic forgetting may happen.

    In our assumption, even though the rules are different between different data subsets, the target rules (i.e., R) are not changed. This phenomenon was also called virtual concept drift [28] and it is different from real concept drift, in which the target concept is changed when new data subsets are available. Virtual concept drift was called sampling shift in [22] and it will be referred to in this paper. There are some additive models that can be easily adopted to learn incrementally when sampling shift occurs. For example, in Bayes Decision Theory, the rules can be represented by some parameters and the parameters of the whole data set can be combined by those of all the data subsets. In this way, the models can learn the data subsets respectively to form a same learner of learning the whole data set. However, these kinds of methods often require assumptions about the data distribution and the decision boundaries are always simple. Neural networks have strong abilities to learn complex classification boundary. Unfortunately, they are not additive. By training with new data subsets, the model tends to perform well on the new data subsets but poorly on the previous ones [8]. In other words, the model forgets the previously learned rules. Therefore, it is a challenge to employ neural networks to learn incrementally in this situation.

    To exploit neural networks for incremental learning, some ensemble based approaches have been proposed. In our previous work, i.e., Selective Negative Correlation Learning (SNCL) [26], selective ensemble method was employed to pre-vent the model from forgetting previously learned information. There are also other ensemble based methods for incremental learning, such as Fixed Size Negative Correlation Learning (FSNCL), Growing Negative Correlation Learning (GNCL) [15], and Learn++ methods [21], [17], [5]. In SNCL and FSNCL (size-fixed methods), the model was able to learn new information from new data subsets with the size of the model fixed. How-ever, the ability of preserving previously learned information was not as good as Learn++ methods and GNCL (size-grown methods), in which the sizes of the models grown larger as new data subsets were learned. Since the new data subsets always become available all the time in practice, the sizes of the models will become too large in Learn++ methods and GNCL. Therefore, it is worthy to design a method with the benefits of both size-fixed methods and size-grown methods.

    Besides sampling shift, there is another issue in incremental learning, i.e., class imbalance. In normal learning model, class imbalance problem has been studied by many researchers and there are plenty of literatures addressing class imbalance problems [11], [4], [9]. Class imbalance problems may also occur in incremental learning and this kind of issue has also been investigated [5], [6]. There are mainly two cases for class imbalance incremental learning:

    (1) If the class distribution of the whole data set $S$ is imbalanced, the class distribution of data subset $S_{t}$ will usually be imbalanced. Furthermore, it will be common that samples of the minority classes may be lost in some data subsets.

    (2) Even though the class distribution of $S$ is balanced, $S_{t}$ may also be class imbalanced. In typical case, all of the partial sets $S_{t}$ are class imbalanced but the combined data set $S$ is class balanced.

    In this paper, we focus on class imbalance cases, in which sampling shift also occurs. Specifically, when sampling shift occurs, new classes may come up in the new data subset and some previous classes may be lost in the new data subset. When class distribution of the whole data set is imbalanced, this phenomenon will be more likely to happen to the minority classes. This is also the main issue in this paper.

    The rest of the paper is organized as follows. In Section Ⅱ, we will briefly review some existing methods for incremental learning. Our methods, i.e., Selective Further Learning (SFL) will be described in Section Ⅲ. Then in Section Ⅳ, the experimental studies will be presented. Finally, we will conclude this paper and discuss the future work in Section Ⅴ.


    2. Related work

    Some neural network based methods, such as the Adaptive Resonance Theory modules map (ARTMAP) [3], [23], [29], [2] and Evolving Fuzzy Neural Network (EFuNN) [12] have been proposed for incremental learning. Both ARTMAP and EFuNN can learn new rules by changing the architecture of the models, such as self-organize new clusters (in ARTMAP) and create new neurons (in EFuNN) when new data are sufficiently different from previous ones. However, it is usually a non-trivial task to estimate the difference between new data and previous ones. Moreover, both of them are very sensitive to the parameters of the algorithms.

    Researches have shown the good performance of ART-MAP and EFuNN in incremental learning. However, their abili-ties of learning incrementally in class imbalance situation have not been well investigated. In [5], where Learn++.UDNC was proposed for class imbalance incremental learning, fuzzy ARTMAP [2] was presented with poor performance when class imbalance occurs. Learn++.UDNC is an ensemble based method. It is one of the Learn++ series methods [21], [20] which were based on AdaBoost [7]. Besides Learn++.UDNC, many versions of Learn++ methods have been proposed, such as Learn++.MT [16], Learn++.MT2 [16], Learn++.NC [18] and Learn++.SMOTE [6]. In these versions, Learn++.MT and Learn++.NC was proposed for handling the problem of "out-voting" when learning new classes. Learn++.MT2 was pro-posed for handling the imbalance of examples between data subsets. These versions did not consider class imbalance situations. Class imbalance in incremental learning was addressed only in Learn++.UDNC and Learn++.SMOTE. In Learn++.UDNC, it was assumed that no real concept drift will happen, while in Learn++.SMOTE, real concept drift in class imbalanced data was investigated. Therefore, the former matches the issue in this paper but the later dose not.

    Besides Learn++, another type of ensemble based methods, i.e., methods based on Negative Correlation Learning (NCL) [14], have also been proposed for incremental learning [26], [15]. NCL is a method to construct neural networks ensemble. It is capable of improving the generalization performance of the ensemble by decreasing the error of every neural network and increasing the diversities between neural networks simultaneously. In [15], tow NCL-based methods, i.e., FSNCL and GNCL were proposed. In FSNCL, the size of the ensemble is fixed and all of the neural networks are trained when new data subsets become available. In GNCL, the size of the ensemble grows as the data sets are incrementally learned and only new added neural networks are trained when new data subsets become available. In our previous work [26], SNCL was proposed. In SNCL, new neural networks are added and trained when new data subsets become available and then a pruning method was employed to prune the ensemble to make the size of the ensemble fixed. Comparing to Learn++ methods and GNCL, FSNCL and SNCL can make the size of the ensemble fixed as more and more data sets come up while their abilities of preserving previously learned information are poorer than Learn++ and GNCL.

    There are also some other methods with ability of incremental learning. Self-Organizing Neural Grove (SONG) [10] is an ensemble based method with Self-Generating Neural Net-works (SGNNs) [27] as the individual learners. Incremental Backpropagation Learning Networks (IBPLN) [8] employed neural networks for incremental learning by making the weights of the neural network bounded and adding new nodes. However, they did not consider the class imbalance in incremental learning.


    3. Our method


    3.1. Framework

    In this paper, class imbalance is considered in incremental learning. In the existing work, Learn++.UDNC [5] was pro-posed for addressing this issue and it has been shown more effective than other incremental learning methods which did not consider class imbalance situation. However, as a size-grown method, the size of the ensemble in Learn++.UDNC increases all the way as new data sets become available. The size of the ensemble may become too large. In our method, ensemble based method is also considered and at the same time, we aim at controlling the size of ensemble at an acceptable level.

    In our previous work, i.e., SNCL [26], selective ensemble was used to keep the size of the ensemble fixed. When new data subset comes up, it is used to train the copy of the previous ensemble. The two ensembles are combined and half of the individuals in the ensemble are pruned to keep the size of the ensemble fixed. However, in this model, previous information loss may easily occur due to the pruning process based on the latest data subset. The ensemble will be biased to the latest data subset. Furthermore, if the rules of the latest data subset is quite different from that of the previous data subset, i.e., high sampling shift occurs, all of the individuals of the previous ensemble might be pruned. On the other hand, since SNCL was designed without considering class imbalance situation, it might not be good at handling class imbalance incremental problems.

    To overcome the above drawbacks, we propose a new en-semble based approach for incremental learning, i.e., Selective Further Learning (SFL). In SFL, a hybrid ensemble with two kinds of base classifiers was used. First of all, a group of Multi-Layer Perceptrons (MLPs) are used. When new data subsets become available, half of the MLPs in the current ensemble are selected to be trained with the new data subsets. After training, the selected MLPs are laid back to the ensemble. No pruning process will be executed so that the risk of previous information forgetting is reduced. At the same time, as an additive model, Naive Bayes (NB) is used as a component of the ensemble to incrementally learn from new data subsets. In this way, the strong incremental learning ability of NB will help the ensemble to preserve the previous information if high sampling shift occurs.

    In addition, a group of weights (namely impact weights) are constructed for every individual (including MLPs and NB). The weights and the outputs of the $i$th individual are denoted as $\{w_{ik} |k=1,2,...,C\}$ and $\{o_{ik}k=1,2,...,C\}$, respectively, where C is the number of the classes. The impact weight $w_{ik}$ is designed to indicate the 'confidence' of the output produced by the $i$th individual on class $k$. At the testing stage, for an example, the output of the ensemble is calculated by the weighted average of all individuals:

    $ yk=Mi=1wikoik/Mi=1wik,k=1,2,...,C,
    $
    (1)

    where $y_{k}$ is the $k$th output of the ensemble and indicates the probability that the example belonging to class $k$, $M$ is the number of the individuals in the ensemble. Equation (1) is used only at the testing stage. At the training stage, the output of the ensemble is calculated by the arithmetical average of the individuals.

    $w_{ik}$ is initialized as 0 at the initial stage and updated during learning every new data subset. When updating $w_{ik}$, two issues should be considered.

    On one hand, the grade that the $i$th individual learn about the $k$th class is considered, i.e., the recall of the $i$th individual on class $k$ and the precision of the $i$th individual on class $k$. $w_{ik}$ should be high when both the recall and precision are high. To this end, the definition of F-measure for multi-class [24] is introduced:

    $ Fik=2RikPikRik+Pik
    $
    (2)

    where $F_{ik}$, $R_{ik}$ and $P_{ik}$ is the F-measure, recall and the precision of the $i$th individual on class $k$, respectively. According to [24], $R_{ik}$ and $P_{ik}$ are defined as

    $ Rik=N(i)kkCm=1N(i)km
    $
    (3)

    and

    $ Pik=N(i)kkCm=1N(i)mk
    $
    (4)

    where $N_{km}^{(i)}$ is the number of the examples of class $k$ that were classified as class m by the $i$th individual.

    On the other hand, since MLPs could be easily biased to the latest data subset, if some classes in the previous data subsets do not come up in the new data subset, the output of the MLPs that are selected to be trained with the new data subset should be suspectable. Therefore, a coefficient $\mu_{i}$ is defined for every individual $i$ to degrade the impact weights:

    $ μi=ntnc,
    $
    (5)

    where nt is the number of classes that are contained in the new data subset, nc is the number of classes in all the coming up data subsets.

    By considering both of the above issues, $w_{ik}$ is updated as:

    $ wik=Fikμi.
    $
    (6)

    For the model of NB, $\mu_i$ always equals to 1 since NB will not be biased to the latest data subset. For MLPs, $\mu_i$ is updated once the MLP is selected to be trained. For the MLPs which were not selected to be trained and the model of NB, $N_{km}$ from the new data subset can be accumulated to the previous one to update $R_{k}$ and $P_{k}$ and then update $w_{ik}$. In this way, $w_{ik}$ is updated not according to the current data subsets only and it would be helpful for preserving previous information.

    The pseudo-code for the approach is presented in Fig. 1. In the pseudo-code, Select is the selecting process for selecting MLPs from the ensemble to be trained with the new data sub-set, MLPs-Training and NB-Training were the training process for training the MLPs and NB in the ensemble. The details of these processes are described in the following subsection.

    Figure 1. The Pseudo-Code For SFL.

    3.2. Some details inside SFL


    3.2.1. Selecting Process

    The selecting process is based on the current data subset $S_{t}$. The individuals are added to $Ens_{sel}$ one by one by greedy strategy. Every time, every MLP in $Ens_{res}$ is temporarily added to $Ens_{sel}$ to estimate the performance (i.e., the arithmetical mean F-measures of all classes) on the current data subset. The MLPs in $Ens_{sel}$ are tested one by one and the MLP that makes $Ens_{sel}$ perform the worst will be finally added to $Ens_{sel}$

    If the current data subset does not contain some classes that have appeared in the previous data subsets, the selection process should ensure that not all the MLPs that have been trained with the data of the lost classes are added to $Ens_{sel}$. Therefore, when an MLP is added to $Ens_{sel}$, the following constraint should be satisfied for the MLPs in $Ens_{sel}$:

    $ ΠkL(iwik)0.
    $
    (7)

    where $L=\{k|''\mbox{class} \ k \ \mbox{is not contained in} \ S_{t}" \}$. If there is no MLP that can be added to $Ens_{sel}$, a new initialized MLP will be generated and added to $Ens_{sel}$.

    In this way, the MLPs that are not well trained are selected to be further trained. Besides, the MLPs that are reserved in $Ens_{res}$ could preserve the previously learned information.


    3.2.2. Training the model of Naive Bayes

    According to Bayes Decision Theory, the probability of an testing example $x=\{x_{i} |i=1,2,...,d\}$ belonging to class $k$ is

    $ P(k|x)=P(x|k)P(k)P(x)=P(x|k)P(k)Ck=1P(x|k)P(k)
    $
    (8)

    where $\mathcal{P}(kx)$ is the posterior probability of an examples $x$ belonging to class $k$, $C$ is the number of classes and $\mathcal{P}(k)$ is the prior probabilities of class $k$. In class imbalance situation, we assume that $\mathcal{P}(k)$ are equal for all of the classes. Besides, all the features of the examples are assumed to be independent to each other. Therefore, the probability of (8) becomes

    $ P(k|x)=Πdi=1P(xi|k)Ck=1Πdi=1P(xi|k).
    $
    (9)

    In incremental learning mode, $\mathcal{P}(x_{i} |k)$ is updated as every new data subset comes up.$\mathcal{P}(x_{i} |k)$ can be estimated in the form of $n(x_{i}k)/n(k)$, where $n(x_{i}k)$ is the number of examples that belongs to class $k$ and the value of its $i$th feature is $x_{i}$, $n(k)$ is the number of examples that belongs to class $k$. Both $n(x_{i}k)$ and $n(k)$ can be estimated in each data subset and then accumulated to estimate $\mathcal{P}(x_{i} |k)$. In this way, NB can learn from new data subsets without any loss of previous information.

    The estimation of $\mathcal{P}(kx)$ in (9) requires the values of features to be discrete. Specifically, for the features with continuous values, average partition is used to discretize the features for calculating $\mathcal{P}(x_{i}|k)$.


    3.2.3. Training the ensemble of MLPs

    We have proposed a Dynamic Sampling (DyS) method for class imbalance problems [13], which can be used for training the ensemble of MLPs. Similarly to the approach proposed in [13], the main process of DyS for an ensemble is presented as follows (in one epoch):

    step1. Randomly fetch an example $x$ from the training set;

    step2. Estimate the probability $p$ that the example should be used for updating the ensemble.

    step3. Generate a uniform random real number $\mu $ between 0 and 1.

    step4. If $\mu <p$, then use $x$ to update the ensemble using Negative Correlation Learning (NCL) [23] to make every MLP negatively correlated to other individuals (including the MLPs in $Ens_{res}$ and the model of NB).

    step5. Repeat steps 1 to 3 until there is no example in the training set.

    The above steps will be repeated until stop criterion is satisfied. The following shows the method for estimating $p$, which was the main issue in DyS.

    In a problem with $nc$ classes, we set $nc$ output nodes for all of the MLPs and for an example belonging to class $k$, we set the target output of the example as $t=\{t_{i} |t_{k}=1,t_{(j|\neq k)}=0\}$. The real output of the example is denoted as $y=\{y_{i} |i=1,2,...,nc\}$, so the node with the highest output designates the class. Both the hidden node functions and the output node functions of all MLPs are set as the logistic function $\varphi (x)=1/(1+e^{-x}), so\ that\ y_{i}\in (0,1)$.

    The same to [13], the probability that an example belonging to class $k$ will be used to update the ensemble is estimated as:

    $ p={1,if δ<0,eδrkmini{ri},otherwise
    $
    (10)

    where $\delta =y_{k}-max_{i\neq k}\{y_{i}\}$ is the confidence of the current ensemble correctly classifying the example. For more details of DyS, please refer to [13].

    By employing DyS, the MLPs in the ensemble are able to accommodate to class imbalance situations.


    3.3. The reason for the success of SFL

    In general, there are several essentials inside SFL that would make SFL successful, including the selective training of MLPs, the use of NB, the setting of impact weights for comb-ing the individuals in the ensemble, and the consideration of class imbalance in training process.

    To analyze the reason for the success of SFL, two especial cases in incremental learning are considered, i.e., new classes in the new data subsets and the loss of previous classes in the new data subsets. The ensemble are divided into three parts: $MLP_a, MLP_b$ and NB, where $MLP_a$ is the MLPs that are selected for learning the new data subset and $MLP_b$ is the rest MLPs.

    After learning a new data subset which contains new classes, $MLP_a$ and NB have learnt the new classes while $MLP_b$ have not. In SFL, the impact weights for $MLP_b$ is updated by accumulating the performance of $MLP_b$ on previous data subsets and the current data subset. The examples of the new classes will be misclassified as other classes by $MLP_b$. This will cause the degradation of precision of $MLP_b$ on those classes and finally degrade the impact weights of $MLP_b$ on those classes. In this way, the wrong prediction of $MLP_b$ on a testing example of new classes would impact less on the prediction of the whole ensemble. Therefore, $MLP_a$ and NB which have learnt the new classes will play a leading role in the ensemble when predicting the examples of new classes.

    After learning a new data subset which loses some previous classes, $MLP_a$ will be biased to the classes that are contained in the current data subset. However, the impact weights of $MLP_a$ will be degraded by a coefficient according to (5). Therefore, NB and the MLPs which is recently trained with the new classes will play a leading role in the ensemble when making the prediction.

    Besides, as we discuss before, NB is able to learn incrementally without forgetting previous information. The use of NB will help to prevent the ensemble from catastrophic forgetting. Furthermore, in the training of NB and MLPs, the situation of class imbalance is considered. Therefore, SFL is able to deal with class imbalance in the new data subsets.


    4. Experimental study

    To assess the performance of SFL, some synthetic data sets and real-world data sets were used to conduct the experiments. First of all, three types of synthetic data sets were generated to simulate the incremental learning process. Then, 5 real-world data sets with imbalanced class distributions from UCI repository [1] were used to simulate incremental learn-ing by randomly dividing the data sets. Finally, another 5 real-world data sets from UCI repository, including 3 class imbalanced data sets and 2 class balanced data sets, were used to simulate the incremental learning process by dividing the data sets. In this part, the dividing of the data sets considered new classes and the loss of previous classes in the new data subsets. The purpose of this part of experiment is to assess the ability of SFL learning form new classes and preserving previous in-formation when some classes are lost in the new data subsets. As a recently proposed approach which also addressed for class imbalanced incremental learning, Learn++.UDNC [5] was used for the comparison. Besides, in order to find out the efficiency of MLPs and NB to SFL, the model of ensembles with only MLPs (referred as SFL.MLP) and the model of NB are also compared with SFL. The recall of every class and the arithmetic mean values over recalls of all classes are used as the metric.

    Figure 2. This is Table 1.

    4.1. Experiments on synthetic data sets

    The synthetic data were generated as follows. Data of four 2-dimensional Normal Distributions were generated for four classes. The means were $\mu_1=(0,0), \mu_2=(0,1), \mu_3=(1,1) and \mu_4=(1,0)$, the two features are independent with variances $\sigma_1 =\sigma_2 =\sigma_3 =\sigma_4 =0.2$. Three types of synthetic data sets were generated. TABLE Ⅰ presents the class distributions of every data subset for the three types. In Type A, there are three majority classes (class 1 to 3) and one minority class (class 4). Class 4 comes up as a new class in $S_2$. Class 1 to 3 appears to be another minority class in training subsets $S_3$ to $S_5$, respectively. This experiment was conducted to see the performance of SFL on problems with multi majority classes (which appear to be minority classes sometimes) and single minority class (also comes up as new class). In Type B, there are one majority class and three minority classes. Class 2 comes up at the beginning but is lost in the last two training subsets. Class 3 comes up as a new class in $S_2$ and is lost in the last training subset. Class 4 comes up as a new class in $S_3$. This experiment was conducted to see the performance of SFL on problems with single majority class and multi minority classes, some of which come up as new classes and are lost in some data subsets. In Type C, the class distribution of the whole training set (i.e., the union of all the training subsets) is balanced. However, the training subsets are class imbalanced and every training subset contains only two classes. This experiment was conducted to see the performance of SFL on problems whose class distributions are balanced in total but imbalanced in data subsets. The distributions are quite different between the data subsets in all the three types.

    Figure 3. This is table2.
    Figure 4. This is table3.
    Figure 5. This is table4.
    Figure 6. This is table5.
    Figure 7. This is table6.

    10 MLPs with 20 hidden nodes of every MLP was used in SFL and SFL.MLP. The training stop error was 0.05 and the coefficient of the penalty term of NCL (referred as $\lambda$) was 0.5. The data sets were generated 30 times independently, and the means and standard deviations over 30 executions of the three types of data sets are presented in TABLE Ⅱ, TABLE Ⅲ and TABLE Ⅳ, respectively. Wilcoxon signed-rank test with the level of significance $\alpha =0.05$ was employed for the comparison between SFL and other methods. In the results of other methods, the values with underline (or bold) denote that SFL performed significantly better (or worse) than them on those values and the values with normal type denote that there are no significant differences. The results on Type A data set are presented in TABLE Ⅱ. Comparing to Learn++.UDNC, SFL gets better overall recalls (i.e., the average of the recalls of all classes). The recalls of SFL on class 1 to class 3, which are majority classes, are not as good as Learn++.UDNC. However, Learn++.UDNC is biased too much to the majority classes and performs very poor on the only minority class while SFL performs more balanced over all the classes. Therefore, SFL outperforms Learn++.UDNC in this data set. Comparing to SFL.MLP and NB, there is few statistical difference on average recalls, especially after training with $S_3, S_4$ and $S_5$. After training with $S_2$, where class 4 comes up as a new class, SFL learns better of class 4 than SFL.MLP and as good as NB. At the same time, SFL does not degrade as much performance on class 1 as NB does. Although SFL degrades more performance on class 1 and class 3 than SFL.MLP, it performs better than SFL.MLP on class 2. Therefore, after training with $S_2$, SFL performs better than both SFL.MLP and NB on the average recall. This observation indicates that SFL is capable of combining the advantages of both MLPs and NB to make a better model.

    The results on Type B data set are presented in TABLE Ⅲ. When comparing to Learn++.UDNC, the similar observations can be made and we can also conclude that SFL outperforms Learn++.UDNC in this data set. When comparing to SFL.MLP and NB, some values of SFL are between the values of SFL.MLP and NB (always closer to the larger ones), some values of SFL are significantly larger than both SFL.MLP and NB. Observing the results on Type C data set in TABLE Ⅳ, the similar observations can be made. All these results show that SFL outperforms Learn++.UDNC and is capable of combining the advantages of both MLPs and NB to make a better model.


    4.2. Experiments on real-world data sets

    The experiments on real-world data sets include three parts. First of all, 5 class imbalanced data sets were divided randomly to simulate the incremental learning process. Secondly, 3 class imbalanced data sets were divided with considering new classes and the loss of classes in the new data subsets. Finally, 2 class balanced data sets were divided into some class imbalanced subsets to simulate the incremental learning process. The situations of new classes and the loss of classes in the new data subsets were also considered.

    The class distributions of the 5 class imbalanced data sets that were randomly divided are presented in TABLE Ⅴ. Each one of these data sets was firstly stratified divided into training set ($80\%$) and testing set ($20\%$) and then the training set was randomly divided into 5 training subsets. The other real-world data sets, including 3 class imbalanced data sets and 2 class balanced data sets were divided according to predefined data distributions. The data distributions of all training subsets and testing sets were presented in TABLE Ⅵ. It can be observed from TABLE Ⅵ that for all the data sets, the data distributions between different training subsets are quite different and the situations of new classes and the loss of classes in the new data subsets occur in some training subsets.

    Figure 8. This is table7.

    For all the data sets, 10 MLPs with 20 hidden nodes of every MLP was used in SFL and the coefficient $\lambda$ of NCL was 0.5. An independent execution was implemented for every data set to set the stop criterion for training MLPs to ensure the convergence of the training process. All the data sets were divided 30 times independently and for every time, all the comparing methods were executed once. The means and standard deviations of the overall recalls after every coming up of data subset over 30 executions of all the real-world data sets are presented in TABLE Ⅶ. Wilcoxon signed-rank test with the level of significance $\alpha=0.05$ was employed for the comparison between SFL and other methods. In the results of other methods, the values with underline (or bold) denote that SFL performed significantly better (or worse) than them on those values and the values with normal type denote that there are no significant differences.

    Figure 9. This is table8.
    Figure 10. This is table9.

    It can be observed form TABLE Ⅶ that SFL can outperform Learn++.UDNC on most of the data sets, including Soybean, Splice, Thyroid-allrep, Car, Nursery, Optdigits and Vehicle. On the other data sets, SFL also does not perform significantly worse than Learn++.UDNC. When comparing with SFL.MLP and NB, the performance of SFL usually leans to the better one of SFL.MLP and NB and sometimes SFL outperforms both of them, such as the performance on Soybean, Nursery, Optdigits and Vehicle. These observations go a step further to support that SFL is capable of combing the advantages of both MLPs and NB to make a better model.

    Figure 11. This is table10.

    On Car, Nursery, Page-blocks, Optdigits and Vehicle, the data sets were divided according the distribution presented in TABLE Ⅵ, where coming up new classes or losing previous classes usually occurs in the new data subsets. It will be worthy to see the detailed results of each class on these data sets. Therefore, the detailed results on two of them, i.e., Nursery (class imbalanced) and Optdigits (class balanced) were further presented.

    The means and standard deviations over 30 executions of Nursery are presented in TABLE Ⅷ. Wilcoxon signed-rank test with the level of significance $\alpha=0.05$ was employed for the comparison between SFL and other methods. In the results of other methods, the values with underline (or bold) denote that SFL performed significantly better (or worse) than them on those values and the values with normal type denote that there are no significant differences. It can be observed from TABLE Ⅷ that, the performance of Learn++.UDNC on class 3 is much worse than that of SFL. Class 3 is a minority class. It comes up in $S_2$ as a new class and is lost in $S_4$. The observations indicate that SFL could handle this kind of problems. To see the effect of MLPs and NB in SFL, we pay more attentions to the comparison to SFL.MLP and NB. It can be observed that, MLPs could learn better than NB if there is not any class loss. However, when class 3 is lost in $S_4$, MLPs degrades much more recall on class 3 than NB. As the combination of MLPs and NB, SFL does not degrade too much recall on class 3 and at the same time, SFL learns better than NB on other classes, which leads to the better overall recalls. Therefore, in this data set, MLPs help SFL to learn better and NB helps SFL to preserve the previously learned information especially when class loss occurs.

    Figure 12. This is table11.
    Figure 13. This is table12.

    The means and standard deviations over 30 executions of Optdigits are presented in TABLE Ⅸ. Wilcoxon signed-rank test with the level of significance $\alpha =0.05$ was employed for the comparison between SFL and other methods. In the results of other methods, the values with underline (or bold) denote that SFL performed significantly better (or worse) than them on those values and the values with normal type denote that there are no significant differences. In $S_2$, class 4 first comes up as a minority class, class 8 first comes up as a majority class, class 1, class 6 and class 10 are lost. After learning from $S_2$, the recall of class 4 of SFL is larger than those of Learn++.UDNC and NB, but not as large as SFL.MLP; the recall of class 8 of SFL is the best of all others. At the same time, the degradation of class 1, class 6 and class 10 of SFL is much less than Learn++.UDNC and SFL.MLP and a bit more than NB. This observation indicates SFL can learn new classes with little performance degradation of other lost classes. Even though SFL.MLP can perform better on class 2 and class 4 when they first come up, it degrades much more on other classes. Therefore, it is not surprising that SFL gets the best overall recalls.

    The experimental results indicate that the performance of Learn++.UDNC usually leans to majority classes. Even though it sometimes performs better on minority classes, the performance on other classes are usually degraded too much. On contrary, SFL can usually get more balanced performance on different classes and get better overall performance. This is because of the different processing methods of SFL and Learn++.UDNC for handling class imbalance problems. In SFL, class imbalance is considered when training the model. The method for training MLPs has been shown to be effective for class imbalance problems. In Learn++.UDNC, the training process did not consider class imbalance and a transfer function with consideration of class imbalance was applied to the outputs. The effectiveness of the method has not been well proved. Even in the results presented in [5], the performance on minority classes was much worse than that of majority classes. Therefore, it is not surprising that SFL can outperform Learn++.UDNC on most of the data sets.


    4.3. Computational time

    The computational time of SFL and Learn++.UDNC on all the data sets is presented in TABLE X. It can be observed from TABLE X that SFL usually takes less computational time than Learn++.UDNC. In the experiments, the structures of MLPs were the same for SFL and Learn++.UDNC and the stop criterion were also the same. However, more MLPs were trained for Learn++.UDNC for every new data subset. On the other hand, the training process of SFL usually meet the stop criterion earlier than that of Learn++.UDNC. Therefore, SFL is usually faster than Learn++.UDNC.


    4.4. Analyses about the components of SFL

    In SFL, two kinds of base classifiers, i.e., MLPs and NB, are employed to construct the ensemble. The results have shown that SFL is capable of outperforming the models with only MLPs and the models with only NB. To find out the reason, the differences of SFL and its components (MLPs and NB) and the influences of the differences are investigated in detail.

    After every data subset is learned, four numbers are estimated on testing data set: the number of the examples that are correctly classified by only MLPs($\#_1$) or NB ($\#_2$), the number of the examples that are correctly classified by only MLPs or NB and correctly classified by SFL ($\#_3$). Then four ratios are estimated:

    $ {ρ1=(#1+#2)/#tρ2=#1/(#1+#2)ρ3=#2/(#1+#2)ρ4=#3/(#1+#2)
    $
    (11)

    where $\#_t$ is the number of examples in testing data set. The ratios are estimated for all the data sets and the average values over 30 executions are presented in TABLE XI. $\rho_1$ indecates the diversity (on making correct classification decisions) between MLPs and NB. $\rho_4$ indecates the benefits that SFL gets from the difference between MLPs and NB. It can be observed from TABLE XI that the values of $\rho_4$ are always closer to the larger one of $rho_2$ and $rho_3$ and sometimes exceed both of them. The observations partially show the reason that SFL always performs toward the better one of MLPs and NB and sometimes exceeds both of them.


    4.5. Analyses of parameters

    There are some parameters in SFL, including the number of MLPs, the number of hidden nodes in every MLP, the stop criterion for training MLPs and the coefficient $\lambda$ in NCL. In our experimental studies, the number of MLPs and the number of hidden nodes in every MLP were set by experience. An independent execution was implemented for every data set to set the stop criterion to ensure the convergence of the training process. The coefficient $\lambda$ in NCL is a parameter for controlling the diversities between the individuals in the ensemble (larger $\lambda$ will lead to larger diversities). In the study of NCL[14], $\lambda$ was suggested to be between 0 and 1. In our experimental studies, it was set to 0.5 for all the data sets. Since diversity is a very important issue for the success of ensemble learning methods [25], it is worthy to see the difference performance of SFL with different $\lambda$.

    Extra executions of SFL with $\lambda =0, 0.25, 0.75$ and 1 were conducted for all the used data sets. Wilcoxon signed-rank test with the level of significance $\alpha=0.05$ was employed for comparing the overall recalls after training with each data subset. The results of every setting of $\lambda$ were compared with the results of the other four settings of $\lambda$ and the number of windraw-lose was counted and presented in TABLE Ⅻ. It can be observed from TABLE Ⅻ that $\lambda$ affects the performance on most data sets. On some data sets, we can also observe the trend that the performance becomes better as $\lambda$ decreases, such as Synthetic Type A, Synthetic Type B, Nursery, Page-blocks, Optdigits and Vehicle. In SFL, $\lambda$ is not the only factor for encourage diversities. On one hand, the model built by NB may be quite different from the MLPs. On the other hand, in incremental learning, different MLPs may be trained with different data subsets, which will also result in diversities, especially when the data subsets are quite different. Therefore, large $\lambda$ (such as 1) may emphasize too much to produce diversities so that the performance may be degraded.


    5. Conclusions and future work

    This paper investigates incremental learning in class imbalance situation. An ensemble-based method, i.e., SFL, which is a hybrid of MLPs and NB, was proposed. A group of impact weights (with the number of the classes as the length) was updated for every individual of the ensemble to indicate the 'confidence' of the individual learning about the classes. The weights affect the outputs of the ensemble by weighted aver-age of all individuals outputs. The training of MLPs and NB considered class imbalance so that the ensemble can adapt the situation of class imbalance.

    The experimental studies on 3 synthetic data sets and 10 real-world data sets have shown that the performance of SFL was better than that of a recently proposed approach for class imbalance incremental learning, i.e. Learn++.UDNC[9]. The experimental results have also shown that SFL can combine the advantages of both MLPs and NB to make a better model.

    SFL has successfully combined MLPs and NB. The experimental studies have shown that combining additive models can make progress in incremental learning. However, this is just an ordinary trial. Other additive models, such as parame-ter estimation model might also help to improve SFL. This would be a direction of our future work.


    [1] Lomo T (1966) Frequency potentiation of excitatory synaptic activity in the dentate area of the hippocampal formation. Acta Physiol Scand 68: 128.
    [2] Shors TJ, Matzel LD (1997) Long-term potentiation: What's learning got to do with it? Behav Brain Sci 20: 597-614;
    [3] Baudry M, Bi X, Gall C, et al. (2011) The biochemistry of memory: The 26 year journey of a 'new and specific hypothesis'. Neurobiol Learn Mem 95: 125-133. doi: 10.1016/j.nlm.2010.11.015
    [4] Buzsaki G (2010) Neural syntax: Cell assemblies, synapsembles, and readers. Neuron 68:362-385. doi: 10.1016/j.neuron.2010.09.023
    [5] Squire LR, Wixted JT (2011) The cognitive neuroscience of human memory since H.M. Annu Rev Neurosci 34: 259-288. doi: 10.1146/annurev-neuro-061010-113720
    [6] Rissman J, Wagner AD (2012) Distributed representations in memory: Insights from functional brain imaging. Annu Rev Psychol 63: 101-128. doi: 10.1146/annurev-psych-120710-100344
    [7] Rudy JW (2014) The neurobiology of learning and memory. Sunderland, MA: Sinauer Associates.
    [8] Mitsushima D (2015) Contextual learning requires functional diversity at excitatory and inhibitory synapses onto CA1 pyramidal neurons. AIMS Neurosci 2: 7-17. doi: 10.3934/Neuroscience.2015.1.7
    [9] Mitsushima D, Sano A, Takahashi T (2013) A cholinergic trigger drives learning-induced plasticity at hippocampal synapses. Nat Commun 4: 2760.
    [10] Takeuchi T, Duszkiewicz AJ, Morris RG (2014) The synaptic plasticity and memory hypothesis: Encoding, storage and persistence. Philos Trans R Soc Lond B Biol Sci 369: 20130288.
    [11] Whitlock JR, Heynen AJ, Shuler MG, et al. (2006) Learning induces long-term potentiation in the hippocampus. Science 313: 1093-1097. doi: 10.1126/science.1128134
    [12] Bakin JS, Weinberger NM (1996) Induction of a physiological memory in the cerebral cortex by stimulation of the nucleus basalis. Proc Natl Acad Sci U S A 93: 11219-11224. doi: 10.1073/pnas.93.20.11219
    [13] Mercado E, III, Bao S, Orduna I, et al. (2001) Basal forebrain stimulation changes cortical sensitivities to complex sound. Neuroreport 12: 2283-2287. doi: 10.1097/00001756-200107200-00047
    [14] Froemke RC, Carcea I, Barker AJ, et al. (2013) Long-term modification of cortical synapses improves sensory perception. Nat Neurosci 16: 79-88.
    [15] Froemke RC, Merzenich MM, Schreiner CE (2007) A synaptic memory trace for cortical receptive field plasticity. Nature 450: 425-429. doi: 10.1038/nature06289
    [16] Marrosu F, Portas C, Mascia MS, et al. (1995) Microdialysis measurement of cortical and hippocampal acetylcholine release during sleep-wake cycle in freely moving cats. Brain Res 671:329-332. doi: 10.1016/0006-8993(94)01399-3
    [17] Newman EL, Gupta K, Climer JR, et al. (2012) Cholinergic modulation of cognitive processing: Insights drawn from computational models. Front Behav Neurosci 6: 24.
    [18] Rokers B, Mercado E, III, Allen MT, et al. (2002) A connectionist model of septohippocampal dynamics during conditioning: Closing the loop. Behav Neurosci 116: 48-62. doi: 10.1037/0735-7044.116.1.48
    [19] Manseau F, Goutagny R, Danik M, et al. (2008) The hippocamposeptal pathway generates rhythmic firing of GABAergic neurons in the medial septum and diagonal bands: An investigation using a complete septohippocampal preparation in vitro. J Neurosci 28: 4096-4107. doi: 10.1523/JNEUROSCI.0247-08.2008
    [20] Zhang H, Lin SC, Nicolelis MA (2010) Spatiotemporal coupling between hippocampal acetylcholine release and theta oscillations in vivo. J Neurosci 30: 13431-13440. doi: 10.1523/JNEUROSCI.1144-10.2010
    [21] Vandecasteele M, Varga V, Berenyi A, et al. (2014) Optogenetic activation of septal cholinergic neurons suppresses sharp wave ripples and enhances theta oscillations in the hippocampus. Proc Natl Acad Sci U S A 111: 13535-13540. doi: 10.1073/pnas.1411233111
    [22] Ajemian R, D'Ausilio A, Moorman H, et al. (2013) A theory for how sensorimotor skills are learned and retained in noisy and nonstationary neural circuits. Proc Natl Acad Sci U S A 110: E5078-5087. doi: 10.1073/pnas.1320116110
    [23] Routtenberg A (2013) Lifetime memories from persistently supple synapses. Hippocampus 23:202-206. doi: 10.1002/hipo.22088
    [24] Routtenberg A (2008) Long-lasting memory from evanescent networks. Eur J Pharmacol 585:60-63. doi: 10.1016/j.ejphar.2008.02.047
    [25] Routtenberg A (2008) The substrate for long-lasting memory: If not protein synthesis, then what? Neurobiol Learn Mem 89: 225-233. doi: 10.1016/j.nlm.2007.10.012
    [26] Richards BA, Frankland PW (2013) The conjunctive trace. Hippocampus 23: 207-212. doi: 10.1002/hipo.22089
    [27] Chen N, Sugihara H, Sharma J, et al. (2012) Nucleus basalis-enabled stimulus-specific plasticity in the visual cortex is mediated by astrocytes. Proc Natl Acad Sci U S A 109: E2832-2841. doi: 10.1073/pnas.1206557109
    [28] McKenzie IA, Ohayon D, Li H, et al. (2014) Motor skill learning requires active central myelination. Science 346: 318-322. doi: 10.1126/science.1254960
    [29] Patel J, Fujisawa S, Berenyi A, et al. (2012) Traveling theta waves along the entire septotemporal axis of the hippocampus. Neuron 75: 410-417. doi: 10.1016/j.neuron.2012.07.015
    [30] Lubenov EV, Siapas AG (2009) Hippocampal theta oscillations are travelling waves. Nature 459:534-539. doi: 10.1038/nature08010
    [31] Benchenane K, Peyrache A, Khamassi M, et al. (2010) Coherent theta oscillations and reorganization of spike timing in the hippocampal- prefrontal network upon learning. Neuron 66:921-936. doi: 10.1016/j.neuron.2010.05.013
    [32] Siapas AG, Lubenov EV, Wilson MA (2005) Prefrontal phase locking to hippocampal theta oscillations. Neuron 46: 141-151. doi: 10.1016/j.neuron.2005.02.028
    [33] Sirota A, Montgomery S, Fujisawa S, et al. (2008) Entrainment of neocortical neurons and gamma oscillations by the hippocampal theta rhythm. Neuron 60: 683-697. doi: 10.1016/j.neuron.2008.09.014
    [34] Buzsaki G (2005) Theta rhythm of navigation: Link between path integration and landmark navigation, episodic and semantic memory. Hippocampus 15: 827-840. doi: 10.1002/hipo.20113
    [35] Landfield PW, McGaugh JL, Tusa RJ (1972) Theta rhythm: A temporal correlate of memory storage processes in the rat. Science 175: 87-89. doi: 10.1126/science.175.4017.87
    [36] Miller R (1989) Cortico-hippocampal interplay: Self-organizing phase-locked loops for indexing memory. Psychobiol 17: 115-128.
    [37] Palmer JH, Gong P (2014) Associative learning of classical conditioning as an emergent property of spatially extended spiking neural circuits with synaptic plasticity. Front Comput Neurosci 8: 79.
    [38] Battaglia FP, Benchenane K, Sirota A, et al. (2011) The hippocampus: Hub of brain network communication for memory. Trends Cogn Sci 15: 310-318.
    [39] Lansdell B, Ford K, Kutz JN (2014) A reaction-diffusion model of cholinergic retinal waves.PLoS Comput Biol 10: e1003953. doi: 10.1371/journal.pcbi.1003953
    [40] Wester JC, Contreras D (2012) Columnar interactions determine horizontal propagation of recurrent network activity in neocortex. J Neurosci 32: 5454-5471. doi: 10.1523/JNEUROSCI.5006-11.2012
    [41] Pavlov IP (1927) Conditioned reflexes. London: Oxford University Press.
    [42] Griffen TC, Wang L, Fontanini A, et al. (2012) Developmental regulation of spatio-temporal patterns of cortical circuit activation. Front Cell Neurosci 6: 65.
    [43] Wang L, Fontanini A, Maffei A (2011) Visual experience modulates spatio-temporal dynamics of circuit activation. Front Cell Neurosci 5: 12.
    [44] Blankenship AG, Feller MB (2010) Mechanisms underlying spontaneous patterned activity in developing neural circuits. Nat Rev Neurosci 11: 18-29. doi: 10.1038/nrn2759
    [45] Palagina G, Eysel UT, Jancke D (2009) Strengthening of lateral activation in adult rat visual cortex after retinal lesions captured with voltage-sensitive dye imaging in vivo. Proc Natl Acad Sci U S A 106: 8743-8747. doi: 10.1073/pnas.0900068106
    [46] Ponulak F, Hopfield JJ (2013) Rapid, parallel path planning by propagating wavefronts of spiking neural activity. Front Comput Neurosci 7: 98.
    [47] Mercado E, III (2014) Relating cortical wave dynamics to learning and remembering. AIMS Neurosci 1: 185-209. doi: 10.3934/Neuroscience.2014.3.185
    [48] Heitmann S, Gong P, Breakspear M (2012) A computational role for bistability and traveling waves in motor cortex. Front Comput Neurosci 6: 67.
    [49] Gong P, van Leeuwen C (2009) Distributed dynamical computation in neural circuits with propagating coherent activity patterns. PLoS Comput Biol 5: e1000611. doi: 10.1371/journal.pcbi.1000611
  • Reader Comments
  • © 2015 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(5265) PDF downloads(886) Cited by(0)

Article outline

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog