Creating local estimates from a population health survey: practical application of small area estimation methods

Diane Hindmarsh; David Steel; Diane Hindmarsh; David Steel

doi:10.3934/publichealth.2020034

AIMS Public Health

2020, Volume 7, Issue 2: 403-424. doi: 10.3934/publichealth.2020034

Previous Article Next Article

Research article

Creating local estimates from a population health survey: practical application of small area estimation methods

Diane Hindmarsh ^{1,2
,
,},
David Steel ²

1.
Bureau of Health Information, Level 2, 1 Reserve Road St Leonards, NSW, Australia
2.
National Institute for Applied Statistics Research Australia, University of Wollongong, Wollongong, NSW, Australia

Received: 08 June 2020 Accepted: 16 June 2020 Published: 22 June 2020

Regular health surveys can produce reliable estimates at higher geographic levels but not for small areas. Alternatives are to aggregate data over several years or use model-based methods. We created and evaluated model-based estimates for four health-related outcomes by gender, for 153 Local Government Areas using data from the New South Wales Population Health Survey. The evaluation examined evidence on bias and determined the covariates available and appropriate for each outcome variable. The evaluation considered the likely precision of the resulting estimates. The bias and precision of results for single years (2006–2008) for each outcome variable using six covariate specifications were compared with direct survey estimates based on a single year's data and those obtained by aggregating over seven years. A practical issue is how to choose covariates to include in the models as the best covariate specification varies between outcome variables. Model-based results had median root mean squared errors between 3.3% and 5.5% (max 5.2% and 11.3% respectively) and median relative root mean squared errors between 6.8% and 24.5% (max 11.7% and 41.5% respectively). The model-based estimates were unbiased compared with direct estimates based on one or seven years of data and when aggregated to a point where direct estimates were reliable. The bias and reliability assessment process provides a way for policymakers to have confidence in model-based estimates.

Keywords:

Citation: Diane Hindmarsh, David Steel. Creating local estimates from a population health survey: practical application of small area estimation methods[J]. AIMS Public Health, 2020, 7(2): 403-424. doi: 10.3934/publichealth.2020034

Related Papers:

[1]	Chung-Te Ting, Yu-Sheng Huang, Cheng-Te Lin, Yun Hsieh . Measuring consumer' willingness to pay for food safety certification labels of packaged rice. AIMS Agriculture and Food, 2021, 6(4): 1000-1010. doi: 10.3934/agrfood.2021060
[2]	Valentina Maria Merlino, Filippo Brun, Alice Versino, Simone Blanc . Milk packaging innovation: Consumer perception and willingness to pay. AIMS Agriculture and Food, 2020, 5(2): 307-326. doi: 10.3934/agrfood.2020.2.307
[3]	Mohammad Kavoosi-Kalashami, Amir Pourfarzad, Siamak Ghaibi, Mohammad Sadegh Allahyari, Jhalukpreya Surujlal, Valeria Borsellino . Urban consumers’ attitudes and willingness to pay for functional foods in Iran: A case of dietary sugar. AIMS Agriculture and Food, 2017, 2(3): 310-323. doi: 10.3934/agrfood.2017.3.310
[4]	Joshua Anamsigiya Nyaaba, Kwame Nkrumah-Ennin, Benjamin Tetteh Anang . Willingness to pay for crop insurance in Tolon District of Ghana: Application of an endogenous treatment effect model. AIMS Agriculture and Food, 2019, 4(2): 362-375. doi: 10.3934/agrfood.2019.2.362
[5]	Emmanuel Oladeji Alamu, Busie Maziya-Dixon, Bukola Olaniyan, Ntawuruhunga Pheneas, David Chikoye . Evaluation of nutritional properties of cassava-legumes snacks for domestic consumption—Consumer acceptance and willingness to pay in Zambia. AIMS Agriculture and Food, 2020, 5(3): 500-520. doi: 10.3934/agrfood.2020.3.500
[6]	Emanuele Dolfi, Margherita Masi, Ernesto Simone Marrocco, Gizem Yeter, Martina Magnani, Yari Vecchio, Alessio Bonaldo, Felice Adinolfi . Indirect entomophagy: Consumer willingness to pay toward fish fed with insect-based feed. AIMS Agriculture and Food, 2025, 10(2): 266-292. doi: 10.3934/agrfood.2025014
[7]	Martha Tampaki, Georgia Koutouzidou, Katerina Melfou, Athanasios Ragkos, Ioannis A. Giantsis . The contrasting mosaic of consumers' knowledge on local plant genetic resources sustainability vis a vis the unawareness for indigenous farm animal breeds. AIMS Agriculture and Food, 2024, 9(2): 645-665. doi: 10.3934/agrfood.2024035
[8]	Valentina Maria Merlino, Simone Blanc, Stefano Massaglia, Danielle Borra . Innovation in craft beer packaging: Evaluation of consumer perception and acceptance. AIMS Agriculture and Food, 2020, 5(3): 422-433. doi: 10.3934/agrfood.2020.3.422
[9]	Wei Yang, Bryan Anh, Phuc Le . Do consumers care about environmentally sustainable attributes along the food supply chain? —A systematic literature review. AIMS Agriculture and Food, 2023, 8(2): 513-533. doi: 10.3934/agrfood.2023027
[10]	Afsaneh Merat, Mohammad Sadegh Allahyari, Alireza Seidavi, William Hubbard . Factors influencing the adoption of sericulture by farmers in Guilan Province, Iran. AIMS Agriculture and Food, 2018, 3(1): 26-40. doi: 10.3934/agrfood.2018.1.26

Abstract

1. Introduction

Prognostic Health Management (PHM) system has become a vital part in modern industry. The goals of PHM are to reduce the risks to avoid the dangerous situations and improve the safety and reliability of the smart equipment and the systems ^[1]. Over the past decades, various attempts have been made to design effective methods to achieve the superior diagnosis performance. With the development of the smart manufacturing, the machines and equipment are more automatic and complicate, the intelligent fault diagnosis of these smart machines and equipment became necessary ^[2]. The data from the machine are boosting, and it can be collected much faster and more widely than ever before, the data-driven fault diagnosis has attracted more and more attentions from both academic and engineering fields ^[3].

Traditional learning-based approaches need to extract features of signals from time, frequency, and time-frequency domains ^[4]. The feature extraction is an essential step and the upper-bound performances of the leaning methods rely on the feature extraction process ^[5]. However, the traditional handcrafted feature extraction techniques need considerable domain knowledge, and the feature extraction process is very time-consuming and labor-intensive ^[6]. In recent years, deep learning (DL) has achieved huge success in image recognition and speech recognition ^[7]. It can learn the feature-representation from raw data automatically, and the key aspect is that this process is not depended on human engineers, which can eliminate the experts' effect as more as possible. DL has been widely applied in the machine health-monitoring field ^[3].

Even though the applications of deep learning have achieved remarkable results in fault diagnosis, there are still some problems for the further improvements. Firstly, the deep learning models implemented by many researchers only have less than five hidden layers ^[8], which limits their final prediction accuracies. However, the well-trained deep learning can reach hundreds of layers on ImageNet. How to bridge the gap between the deep models in fault diagnosis and those in ImageNet could promote the performance of deep models in fault diagnosis. Secondly, the individual deep learning models for fault diagnosis still suffers from the generalization ability ^[9]. As stated by the no-free-lunch theorem ^[10,11,12], no single model can perform best on every dataset. To improve the generalization ability of deep learning method is essential.

To overcome these two drawbacks, a new ensemble version of deep learning method is proposed. Firstly, the transfer learning is applied to bridge the network gap between fault diagnosis and ImageNet. TL can learn a learning system from a dataset (source domain) and then applies this system to solve a new problem (target domain) more quickly and effectively. It should be noted that the new target domain can be irrelative with the source domain ^[13]. So the ResNet-50 which is pre-trained on the ImageNet can also perform well in fault diagnosis. The ResNet-50 has the depth of 50 layers, which is much deeper than traditional DL model applied in fault diagnosis, and it could improve the predication accuracy on fault diagnosis field. Secondly, the ensemble learning is also investigated in this research. Ensemble learning is an effective way to improve the generalization ability. Several classifiers are trained cooperatively using negative correlation learning (NCL), and then these classifiers are combined to form a powerful fault classifier. In this research, the transfer learning technique and the NCL technique are combined, and a new negative correlation transfer ensemble model (NCTE) is proposed for fault diagnosis.

The rest of this paper is organized as follows. Section 2 discusses literature review. Section 3 presents the methodologies of negative correlation learning. Section 4 presents the proposed NCTE. Section 5 presents the case studies. The conclusion and future researches are presented in Section 6.

2. Literature review

2.1. Data-driven fault diagnosis

With the development of smart manufacturing, the data-driven fault diagnosis has received more and more attentions. It is very suitable for the complicated industrial systems, since the data-driven fault diagnosis applied the learning-based approaches to learn from the historic data without the models about the system ^[14,15,16]. The learning-based approaches can be classified into statistical analysis, machining learning methods and their joint paradigm. Principal component analysis (PCA), partial least squares (PLS), and independent Component Correlation (ICA) have received considerable attentions on the industrial process monitoring ^[17]. The machine learning methods also achieved good applications in fault diagnosis, such as support vector machine (SVM) ^[18,19], artificial neural network (ANN) ^[20], Bayesian network ^[21].

Since deep learning (DL) methods can obtain the feature-representations of raw data in an automatically way, it has shown a great potential in machine health monitoring field ^[3,22]. Wang et al. ^[23] investigated an adaptive deep CNN model, and the main parameters were determined by particle swarm optimization. Shao et al. ^[2] studied deep belief network based fault diagnosis on rolling bearing. Wang et al. ^[24] studied a new type of bilateral long short-term memory model (LSTM) for the cycle time prediction of re-entrant manufacturing system. Pan et al. ^[25] proposed a LiftingNet for mechanical data analysis and the results showed that LiftingNet has a good performance on different rotating speeds. Li ^[26] studied IDSCNN with D-S evidence for bearing fault diagnosis. This method is also an ensemble CNN, and it has a good adaptability on different load conditions. Lu et al. ^[27] applied Convolutional Neural Network (CNN) to fault diagnosis, and the comparison experiments showed that the accuracy of greater than 90% was achieved with fewer computational resource. Zhang et al. ^[28] studied the intelligent fault diagnosis under varying working conditions using domain adaptive CNN method.

However, due to the fact that the volume of labeled samples in fault diagnosis is relatively small compared with ten million annotated images in ImageNet, the DL models for fault diagnosis are shallow compared with benchmark deep learning models in ImageNet. However, it is hard to train a deep model without the large amount of well-organized training dataset like ImageNet, so to train a very deep model on fault diagnosis field is almost impossible. To deal with this challenge, by applying transfer learning and taking the deep CNN model trained on ImageNet as the feature extractor, the deep learning model that trained on ImageNet can also perform well on small data in another domain.

2.2. Transfer learning

Transfer learning (TL) is a new paradigm in machine learning field. TL can learn a learning system from a dataset (source domain) and then applies this system to solve a new problem (target domain) more quickly and effectively. It should be noted that the new target domain can be irrelative with the source domain ^[13].

TL has been studied by many researchers. Donahue et al. ^[29] investigated the generic tasks, which may be suffered by insufficient labeled data for training a deep DL model, and they released DeCAF as generic image features across many visual recognition tasks. Based on DeCAF, Ren et al. ^[30] studied a feature transferring learning method using pre-trained DeCAF for Automated Surface Inspection, as shown in Figure 1. They tested the proposed methods on NEU surface defect database, weld defect database, wood defect database and micro-structure defect dataset, and the results showed that the proposed algorithm outperforms several best benchmarks in literature.

Figure 1. Structure of DeCAF Based automatically surface inspection method.

DownLoad: Full-Size Img PowerPoint

Many other famous CNN models that trained on ImageNet are also investigated for transfer learning, such as CifarNet, AlexNet, GoogleNet, ResNet and so on. Wehrmann et al. ^[31] studied a novel approach for adult content detection in videos and applied both pre-trained GoogleNet and ResNet architectures as the feature extractor. The results shown that the proposed method outperformed the current state-of-the-art methods for adult content detection. Shin et al. ^[32] applied CifarNet, AlexNet and GoogLeNet for the computer-aided detection in medical imaging tasks. They also investigated when and why transfer learning from pre-trained ImageNet (via fine-tuning) can be useful, and the results have achieved the state-of-the-art performance. Rezende et al. ^[33] investigated the transfer learning on ResNet-50 on the classification of malicious software, and the results showed that this approach can effectively classify the malware families with the accuracy of 98.62%.

Applying the pre-trained CNN models that trained on ImageNet to fault diagnosis has investigated by many researchers. Janssens et al. ^[34] selected the pre-trained VGG-16 as the feature extractor and fine-tuning all the weights of the network. The proposed transfer learning method has been applied to use the infrared thermal video to automatically determine the condition of the machine. Shao et al. ^[8] proposed a VGG-16 based deep transfer learning fault diagnosis and the structure of their method has been shown in Figure 2. The proposed method is applied on induction motors, gearboxes, and bearings dataset and the results showed that it has achieved a significant improvement by using the transfer learning technique. The application of transfer learning on fault diagnosis has great potential to improve the prediction accuracies.

Figure 2. The deep transfer learning using VGG-16 on fault diagnosis ^[8].

DownLoad: Full-Size Img PowerPoint

The advantage of TL on fault diagnosis can be summarized as two aspects. Firstly, the labeled data in fault diagnosis is also small, and it is hard to train deep models in fault diagnosis, which could limit the prediction of deep learning in fault diagnosis. With transfer learning, the deep models can extract better features on fault diagnosis and then improve the accuracy on fault diagnosis. Secondly, the deeper models has much more parameters than shallow models. The training of a deep model requires considerable computational and time resources as well as a large amount of labelled data. However, by using transfer learning, only the fine-tuning process is necessary, which could reduce the requirements on hardware and training process.

Even the great improvement has been achieved by the transfer learning on fault diagnosis field, the application of transfer learning on fault diagnosis is only at the beginning. The further investigation and improvement on the transfer learning is necessary. In this research, a new ensemble transfer learning by using negative correlation ensemble is proposed.

2.3. Ensemble method in fault diagnosis

Ensemble method is a learning pattern in which a group of base learners is trained for the same task, and they worked together as the committee to give the final results. As stated by the no-free-lunch theorem ^[10,35,36], no single model can perform best on every dataset. The ensemble learning becomes an effective way to improve the performance. The ensemble learning was proposed by Hansen and Salamons ^[37], and their results provided the solid support that the generalization ability of a neural network can be significantly improved through combining a number of neural networks.

Ensemble learning has been studied by many researchers, and these ensemble algorithms can be classified into three categories ^[38]. In the first category, each base learner is trained with a subset of training samples, and then these base learners are combined at advance. The typical ensemble algorithm is Bagging and its variants. In the second category, the weights are introduced on the training samples and the training samples that are misclassified by the former base learner would be paid more attention in the next training stage. The algorithms in the second categories include adaboosting and its variants. In the third category, the interaction and cooperation among the base learners are necessary to generate a more diverse group of base learner. One of the typical algorithm in the third category is the negative correlation learning (NCL). NCL emphasizes the cooperation and specialization among different base learners during the base learner design. It provides an opportunity for different base learner to interact with each other to solve one single problem. The accuracy and the diversity of the group of base learner, and the results of NCL has shown a good potential ^[39].

The ensemble learning in fault diagnosis has also been investigated. Hu et al. ^[40] proposed a new ensemble approach for the data-driven remaining useful life estimation. Their ensemble method is the first category, and the member algorithms are weighted to form the final ensemble algorithm. The accuracy-based weighting, diversity-based weighting and optimization-based weighting are applied and the results showed that the ensemble approach with any weighting scheme gives more accurate RUL predictions compared to any sole member algorithm. Wang et al. ^[9] studied the selective ensemble neural networks (PSOSEN) for the fault diagnosis of bearings and pumps. In their method, the adaptive particle swarm optimization (APSO) is developed for not only determining the optimal weights but also selecting superior base learners. The results demonstrated that PSOSEN has achieved desirable accuracies and robustness under the environmental noise and working condition fluctuations. Wu et al. ^[41] proposed the Easy-SMT ensemble algorithm based on synthesizing SMOTE-based data augmentation policy. The method is tested on the PHM 2015 challenge datasets and the results showed that it could achieve good performance on multi-class imbalance learning task.

However, even though the ensemble learning has achieved remarkable results in the fault diagnosis field, as far as I know, the NCL technique has not been applied on fault diagnosis. In this research, the NCL is combined with transfer learning to construct the high accuracy classifier for fault diagnosis.

3. Negative correlation learning

NCL introduces a correlation penalty term to the error function of each individual network in the ensemble so that all the networks can be trained interactively on the same training dataset. Given the training dataset $\{ {x_n}, {y_n}\} _{n = 1}^N$ , NCL combines M neural networks ${f_i}(x)$ to constitute the ensemble.

${f_{ens}}({x_n}) = \frac{1}{M}\sum\limits_{i = 1}^M {{f_i}({x_n})}$

(1)

To train network ${f_i}$ , the cost function ${e_i}$ for network i is defined by Eq 2. Where $\lambda$ is a weighting parameter on the penalty term ${p_i}$ as shown in Eq 3.

${e_i} = \sum\limits_{n = 1}^N {{{\left( {{f_i}({x_n}) - {y_n}} \right)}^2}} + \lambda {p_i}$

(2)

$\begin{gathered} {p_i} = \sum\limits_{n = 1}^N {\left\{ {\left( {{f_i}({x_n}) - {f_{ens}}({x_n})} \right)\sum\limits_{j \ne i} {\left( {{f_j}({x_n}) - {f_{ens}}({x_n})} \right)} } \right\}} \\ \;\;\;\; = - \sum\limits_{n = 1}^N {{{\left( {{f_i}({x_n}) - {f_{ens}}({x_n})} \right)}^2}} \\ \end{gathered}$

(3)

From Eq 2, it can be seen that NCL uses a penalty term in the error function to produce base learners whose errors tend to be negatively correlated. So the NCL model can cooperate the training of base learner and the whole ensemble model simultaneously. $\lambda$ control the degree of the negatively correlated. If set $\lambda = 0$ , then the error Eq 2 will become Eq 4, and each individual models would be trained separately. When set $\lambda = 1$ , then error Eq 2 can be trained as a whole ensemble model.

${e_i} = \sum\limits_{n = 1}^N {{{\left( {{f_i}({x_n}) - {y_n}} \right)}^2}}$

(4)

$\begin{gathered} {e_i} = \sum\limits_{n = 1}^N {{{\left( {{f_i}({x_n}) - {y_n}} \right)}^2}} - \sum\limits_{n = 1}^N {{{\left( {{f_i}({x_n}) - {f_{ens}}({x_n})} \right)}^2}} \\ \;\;\; = \sum\limits_{n = 1}^N {{{\left( {{f_{ens}}({x_n}) - {y_n}} \right)}^2}} \\ \end{gathered}$

(5)

In this research, the NCL technique is applied with transfer learning technique to obtain a new ensemble method for fault diagnosis.

4. Proposed negative correlation transfer ensemble model for fault diagnosis

In this section, a new negative correlation transfer ensemble model (NCTE) is proposed.

4.1. The flowchart of the proposed NCTE

The whole flowchart of the proposed NCTE consists of four parts, the data preprocessing part, the feature transferring part, the fine-tuning part and the hyper-parameter selection part.

(1) Data preprocessing part: Since the input of ResNet-50 is the RGB images, it is essential to convert the time-domain signals to 3D matrix in order to use the pre-trained ResNet-50 network.

(2) Feature transferring part: Establish the structure of ResNet-50, and keep the layers weights in ResNet-50 unchanged. Since the output of ResNet-50 is 2048, the feature obtained by ResNet-50 is also a 2048 vector.

(3) Training part: Adding the several separated fully-connected (FC) layers at the end of ResNet-50, and then training these FC layers using the NCL technique.

(4) Hyper-parameter selection part: It is vital to select the key parameter, $\lambda$ , in the NCL technique. In this research, the cross validation is applied to test the most proper $\lambda$ .

The flowchart of the proposed NCTE is presented in Figure 3. The details of these four parts are given as following:

Figure 3. The Flowchart of the proposed NCTE.

DownLoad: Full-Size Img PowerPoint

4.2. Data preprocessing

Data preprocessing is the essential part in the data-driven fault diagnosis. Since the input of ResNet-50 is the 3D natural image, so it is essential to transfer the time-domain signals to the 3D format. Chong ^[42] proposed the data preprocessing methods to convert the time-domain raw fault signals to 2D images. Wen et al ^[43] studied a new time domain signal to gray image method. Suppose the raw fault signals of all fault types are collected and then segmented to obtain the data samples. Let $m \times m$ denote the gray image size and L_i(a), i = 1…N, a = 1…m², denote the strength value of signal samples. N the number of samples. GP(j, k), j = 1...m, k = 1…m is matrix of 2D gray images. The time domain signals to gray images can be formulated by Eq 6.

$GP\left( {j, k} \right) = \frac{{L\left( {\left( {j - 1} \right) \times m + k} \right) - Min\left( L \right)}}{{Max\left( L \right) - Min\left( L \right)}} \times 255$

(6)

However, RGB image is 3D matrix format. Let RP(j, k, p), p = 1, 2, 3 presents this 3D matrix. The third elements of the RGB image are the strength of red (p = 1), green (p = 2) and blue (p = 3) channels. In this research, the data preprocessing method that transfers the time domain raw signals to 3D RGB images is presented as Eqs 7–10.

$N{M_i}\left( {j, k} \right) = \frac{{{L_i}\left( {\left( {j - 1} \right) \times M + k} \right) - Mi{n_{i, j, k}}\left( {{L_i}\left( {\left( {j - 1} \right) \times M + k} \right)} \right)}}{{Ma{x_{i, j, k}}\left( {{L_i}\left( {\left( {j - 1} \right) \times M + k} \right)} \right) - Mi{n_{i, j, k}}\left( {{L_i}\left( {\left( {j - 1} \right) \times M + k} \right)} \right)}}$

(7)

$R{P_i}\left( {j, k, 1} \right) = N{M_i}(j, k) \times 255$

(8)

$R{P_i}\left( {j, k, 2} \right) = N{M_i}(j, k) \times 255$

(9)

$R{P_i}\left( {j, k, 3} \right) = N{M_i}(j, k) \times 255$

(10)

The difference between Eq 6 and Eq 7 is that Eq 6 applies the maximum and minimum values of the data sample while Eq 7 selects the maximum and minimum values of the whole samples. Then scale the normalized matrix (NM(j, k)) to 0-255 and copy the scaled results to RP(j, k, p), as shown in Eq 8–10.

4.3. Feature transfer based on ResNet-50

Residual Networks (ResNet) ^[44] is a very famous Convolutional Neural Network developed in recent years. Since the vanishing/exploding gradient problem is also found in deep learning algorithms using gradient-based learning methods and backpropagation ^[45], the ResNet applied the shortcut connections to construct the deep networks to avoid this problem, and it has shown a great performance in image recognition.

ResNet-50 is a released version of ResNet, which has 50 layers. The input of ResNet-50 is 224 × 224, and the detail structure of ResNet-50 is shown in Figure 4. The output of ResNet-50 is 1000. In this research, the transfer learning is combined with ResNet-50 and the NCL technique is applied to train several newly constructed FC layers and softmax classifiers.

Figure 4. The Structure of ResNet-50 Network.

DownLoad: Full-Size Img PowerPoint

4.4. The training method of NCTE

Based on the ResNet-50, a new structure of NCTE is proposed. For most transfer learning method, there are only one softmax classifier. However, in this research, total M and softmax classifier are conducted in order to form the inherit ensemble version of transfer learning. As shown in Figure 5, along with the sofmax classifiers, one FC layer is also constructed for each softmax classifier, and the hidden neurons are 128 for all FC layers. FC layers of each softmax are separate and they have no interaction to each other.

Figure 5. The structure of the proposed NCTE.

DownLoad: Full-Size Img PowerPoint

Since there are M classifiers in the structure, the final output of the NCTE is the ensemble version of all M classifiers, and the bagging ensemble is applied, as shown in Eq 1. The training of these M classifiers are based on the NCL training process. For the training of each softmax classifier, there are two parts in the error function. The first part is the error function between the output of softmax classifier and the labels. The second part is the diversity term, and it tries to make M classifiers to be as diversity as possible. The second part worked as the penalty term in the loss function. The training method of NCTE is presented in Algorithm (1).

Algorithm (1), Training method for NCLE

Step 1: Let M be the final number of classifiers
Step 2: Take a training dataset

$\{ {x_n}, {y_n}\} _{n = 1}^N$ and the hyper-parameter

$\lambda$ .
Step 3: For the training dataset, repeat the following (a) to (d) steps until the maximal epochs is reached:
(a) Calculate the ensemble output of M softmax classifiers.

${f_{ens}}({x_n}) = \frac{1}{M}\sum\limits_{i = 1}^M {{f_i}({x_n})}$
(b) For each softmax classifiers, from i=1 to M, for each weight

${w_{ij}}$ in FC layer and softmax classifiers i, perform the update of the i-th FC layer and softmax classifiers:

${e_i} = \sum\limits_{n = 1}^N {{{\left({{f_i}({x_n}) - {y_n}} \right)}^2}} - \lambda \sum\limits_{n = 1}^N {{{\left({{f_i}({x_n}) - {f_{ens}}({x_n})} \right)}^2}}$

$\frac{{\partial {e_i}}}{{\partial {w_{ij}}}} = 2\sum\limits_{n = 1}^N {\left({{f_i}({x_n}) - {y_n}} \right)\frac{{\partial {f_i}}}{{\partial {w_{ij}}}}} - {\text{2}}\lambda \sum\limits_{n = 1}^N {\left({{f_i}({x_n}) - {f_{ens}}({x_n})} \right)\left({1 - \frac{1}{M}} \right)} \frac{{\partial {f_i}}}{{\partial {w_{ij}}}}$
(c) Calculate the new output of the i-th softmax classifiers.
(d) Repeat (a)-(c) until all M FC layer and softmax classifiers are updated.
Step 4: Combine all softmax classifiers to formulate the final ensemble classifiers.

4.5. Hyper-parameter selection using cross validation

As shown in Eq 2, hyper-parameter $\lambda$ control the degree of the negative correlate rate of the NCTE, so to select a proper hyper-parameter $\lambda$ is vital for NCTE. In this research, the $\lambda$ is selected according to its model performance. In many data-driven fault diagnosis methods, the performance is evaluated by the testing dataset, and the model that has the best performance on the testing dataset are selected. However, this model selection method has the following shortcomings: (1) It requires the testing dataset in addition to the training data. However, the testing dataset should be untouched during the training method and model selection period. (2) The selected standalone algorithm may not be robust, since no statistical analysis of the results are conducted. To overcome the above shortcoming, the cross validation technique is applied in these researches to obtain a reliable performance evaluation method for the model selection.

Cross validation (CV) is a popular technique to obtain a reliable model ^[46]. The CV technique divides the training dataset into two parts, and they are the training part and the validation part. The typical CV techniques includes Leave-one-out CV, Generalized CV, K-Fold CV and so on ^[47]. K-fold CV is the most popular technique of CV techniques. It divides the whole data into K subsamples with approximately equal cardinality N/K samples. Each subsample successively plays the role of validation part, while the rest K-1 subsamples are used for train part. However, the selection of K has no theoretical analysis ^[48], and the popular value of K are set to be 3, 5 and 10. In this research, the five-fold cross validation is applied.

Suppose ${Y_v}$ and ${\widehat Y_v}$ denote the actual and prediction labels on the validate part, and ${N_v}$ is the sample number of validate dataset. The accuracy of CV ( $Ac{c_{cv}}$ ) is the mean of five-fold accuracy, and it can be shown by Eq 11.

$Ac{c_{cv}} = \sum\limits_{k = 1}^K {\frac{1}{{{N_v}}}\left( {\sum\limits_{i = 1}^{{N_v}} {1\left\{ {{Y_v} = = {{\widehat Y}_v}} \right\}} } \right)}$

(11)

The $Ac{c_{cv}}$ is applied to the selection of the proper $\lambda$ . After finishing this selection, the obtained fault diagnosis classifier would be tested on a separated testing dataset, and the accuracy of testing dataset is the final results (Acc) of NCTE for comparison.

5. Case studies: KAT bearing dataset

5.1. Data description

The KAT bearing damage dataset provided by KAT datacenter in Paderborn University ^[45]. The hardware of this experiment is shown in ^[45], and there are 15 datasets and they can be categorized as three healthy classifications as shown in Table 1. The K0-series (K001–K005) are the healthy condition, the KA-series (KA04, KA15, KA16, KA22, KA30) are the outer bearing ring with damage and the KI-series (KI04, KI14, KI16, KI18, KI21) are the inner bearing ring with damage. The experiments are conducted with four different operating parameters, and the operating parameters are shown in Table 2. Each experiment is conducted 20 repeated and the vibrations signals are collected for analysis, and the sampling rate is 64 kHz. It should be noted that the damage of this dataset is real damages caused by accelerated lifetime test.

Table 1. Categorization of datasets.

Healthy (Class 1)	Outer ring damage (Class 2)	Inner ring damage (Class 3)
K001	KA04	KI04
K002	KA15	KI14
K003	KA16	KI16
K004	KA22	KI18
K005	KA30	KI21

| Show Table

DownLoad: CSV

Table 2. Four operation parameters.

No.	Rotational speed	Load torque	Radial force
0	1500	0.7	1000
1	900	0.7	1000
2	1500	0.1	1000
3	1500	0.7	400

| Show Table

DownLoad: CSV

5.2. Hyper-parameter selection using CV technique

During the experiments, the algorithm is written in python 3.5 using Tensorflow. The hidden neurons in the FC layers are set to be 128, the L2 regulations rate is 1e-5, m is set to be 64. The learning rate scheduler is the momentum optimizer and the initial learning rate is 0.005 and the momentum value is 0.9. The batch size is 200, and the total epoch is 40. In this research, the five-fold cross validation is applied for selection the proper $\lambda$ . The tested $\lambda$ are from 0 to 1 with the increment of 0.1.

During the cross validation process, the number of the softmax classifiers are set to be 2, and the effect of the $\lambda$ on the cross validation process is presented in Table 3 and Figure 6. From Table 3, it can be seen that the mean (mean), the minimum (min) and the stand deviation (std) of the Acc_cv is the best on all the values of $\lambda$ . Since the results of $\lambda {\text{ = }}0.4$ have the best mean and std, the selection of $\lambda$ is 0.4 in this round. presents the mean value of Acc_cv along with the increase of $\lambda$ . It can be seen that the whole curve like an inverse 'U' type, and the peak of this curve is also at $\lambda {\text{ = }}0.4$ .

Table 3. The results of cross validation (Acc_cv) on the hyper-parameter

$\lambda$ .

$\lambda$	0	0.1	0.2	0.3	0.4	0.5
max	98.67%	98.62%	98.71%	98.68%	98.68%	98.66%
mean	98.52%	98.56%	98.49%	98.52%	98.62%	98.55%
min	98.14%	98.46%	98.13%	98.21%	98.59%	98.44%
std	0.0022	0.0006	0.0024	0.0018	0.0004	0.0009
$\lambda$	0.6	0.7	0.8	0.9	1.0
max	98.68%	98.63%	98.64%	98.64%	98.67%
mean	98.60%	98.52%	98.47%	98.29%	98.56%
min	98.50%	98.27%	97.98%	97.76%	98.48%
std	0.0008	0.0016	0.0028	0.0039	0.0007

| Show Table

DownLoad: CSV

Figure 6. The effect of

$\lambda$ on the cross validation process.

DownLoad: Full-Size Img PowerPoint

The convergence of two classifiers and the final ensemble classifier (NCTE) are plotted in Figure 7. From the results, it can be seen that both two classifiers have similar convergence speed, and the final ensemble classifier outperforms the two classifiers at most time. These results validate that the ensemble of these two classifiers can promote the performance than the individual single classifiers.

Figure 7. The convergence of two classifiers and the final ensemble classifier (%).

DownLoad: Full-Size Img PowerPoint

5.3. Sensibility analysis of classifier number

The number of the classifiers is also an important hyper-parameter for NCTE. In this subsection, the effect of the number of classifiers on the final results is analyzed. The number of classifiers in the experiments are set to be 2, 3, 5, 7, 9, 11, 13, and 15. The NCTE with the large number of classifiers are discussed as well, and the number of classifiers are 20, 30 and 50. The baseline method is using the NCTE with only one number of the classifiers.

The results in this experiment are presented in and . The best $\lambda$ of cross validation, Acc_cv, Acc and the training time are presented in . For each results, only the best $\lambda$ value and Acc_cv is presented. From the results, it can be seen that the best number of the classifiers is 13. And the Acc_cv is 98.73% while the performance of this version of NCTE in the testing dataset Acc is also the best among these methods, and it is as high as 98.72%.

Table 4. The results of cross validation (Acc_cv) on the number of the classifiers.

Number of classifiers	1 (Baseline)	2	3	5	7	9
$\lambda$ value	-	0.4	0.8	0.5	0.4	1.0
Acc_cv	98.41%	98.62%	98.65%	98.64%	98.68%	98.70%
Acc	98.38%	98.62%	98.64%	98.63%	98.67%	98.66%
Time	261.31	429.27	608.82	930.67	1320.73	1670.69
Number of classifiers	11	13	15	20	30	50
$\lambda$ value	0.8	0.4	0.1	0.2	0.2	0
Acc_cv	98.69%	98.73%	98.71%	98.69%	98.69%	98.69%
Acc	98.67%	98.72%	98.69%	98.67%	98.67%	98.68%
Time	1932.04	2389.01	2626.02	3447.68	4706.05	8082.40

| Show Table

DownLoad: CSV

Figure 8. The training times of NCTE with different number of the classifiers (second).

DownLoad: Full-Size Img PowerPoint

On the other side, the training time increases sharply along with the number of the classifiers, as shown in Figure 8. From the Figure 8, it can be seen that the number of the classifiers should be keep in a proper size. A large number of classifiers don't help to increase the final accuracy while it would increase the computation resource largely. However, taking the baseline into consideration, the Acc of baseline is only 98.41%, all NCTE variants are better than this result.

5.4. The analysis on TL and NCL

In this subsection, the NCTL is compared with traditional bagging method and the ResNet-50. The bagging is select as the k-fold bagging ^[1,50]. The ResNet-50 are random initialized and there are used to show the effect of TL. The comparison results are shown in TABLE 5. It should be noted that the bagging method is also based on TL, and it replace the ensemble method from NCL to Bagging. The ResNet-50 uses the same data-preprocessing process with NCTL, but it trained from the raw data without TL.

Table 5. The analysis of NCTE on TL and NCL (%).

Methods	Mean Accuracy
NCTE	98.73
Bagging	98.62
ResNet-50	72.31

| Show Table

DownLoad: CSV

From the results, it can be seen that the accuracy of Bagging is 98.62%, which is inferior to NCTL slightly. The results of ResNet-50 is 72.31%. The results show that the NCTL has better performance than the random initialized ResNet-50. These results show that transfer learning using the pre-trained ResNet-50 could provide better results than to train a new random initialized ResNet-50.

5.5. The results and comparison

In order to validate the performance of the proposed NCTE, the version of NCTE with 13 classifiers are compared with other published methods. The comparison of NCTE with traditional machine learning methods ^[49] are presented in Table 5, and the comparison of NCTE with deep learning methods are presented in Table 6.

Table 6. The comparison of NCTE with traditional machine learning methods (%).

Methods	Mean Accuracy
NCTE	98.73
Ensemble	98.3
CART	98.3
RF	98.3
BT	83.3
SVM-PSO	75.8
KNN	62.5
ELM	60.8
NN	44.2

| Show Table

DownLoad: CSV

In Table 6, the comparison methods are classification and regression trees (CART), random forests (RF), Boosted Trees (BT), neural networks (NN), support vector machines with parameters optimally tuned using particle swarm optimization (SVM-PSO), extreme learning machine (ELM), k-nearest neighbors (KNN) and their ensemble algorithms using majority voting (Ensemble). The details of these methods can be found in ^[49], and here their results are directly taken from ^[49]. From the results, it can be seen that NCTE has achieved a good result, and it outperforms all these traditional machine learning methods.

Table 7 presents the comparison of NCTE with other deep learning methods. These deep learning methods are deep inception net with atrous convolution (ACDIN), Convolution Neural Networks with Training Interference (TICNN), Deep Convolutional Neural Networks with Wide First-layer Kernels (WDCNN), AlexNet, ResNet and convolutional neural network based on a capsule network with an inception block (ICN). Their results can be found in ^[51] and ^[52]. The results show that the prediction accuracy of ACDIN, TICNN, WDCNN, AlexNet, ResNet and ICN are 94.5%, 54.09%, 54.55%, 79.92%, 77.52% and 82.05% respectively. These results validate the performance of NCTE.

Table 7. The comparison of NCTE with deep learning methods (%).

Methods	Mean Accuracy
NCTE	98.73
ACDIN 51	94.5
TICNN 51	54.09
WDCNN 51	54.55
AlexNet 52	79.92
ResNet 52	77.52
ICN 52	82.05

| Show Table

DownLoad: CSV

6. Conclusion

This research presents a new negative correlation ensemble transfer learning for fault diagnosis based on convolutional neural network (NCTE). The main contribution of this paper are as following: 1) On the structure aspect, the transfer learning is applied for fault diagnosis to build a deeper structure than traditional DL method for fault diagnosis; 2) On the training method aspect, the transfer learning is trained using negative correlation learning (NCL), and several softmax classifiers are added and trained cooperatively based on the transfer learning.3) The hyper-parameter of NCTE are determined by cross validation, and it could help to obtain a more reliable fault classifier. The proposed NCTE is conducted on the KAT Bearing Dataset, and the results show that NCTE has achieved good results compared with other machine learning and deep learning methods. However, the time consumption of NCTE increases sharply with the increase of the number of softmax classifiers. So it is better to keep the number of the classifiers in a proper size.

The limitations of the proposed method may include as followings: Firstly, the time consumption of NCTE increases sharply with the increase of the number of softmax classifiers. Secondly, the imbalance of the fault data and normal data in fault diagnosis is ignored in this research. Based on these limitations, the future researches can be done in the following ways. Firstly, an improve version of NCTE can be investigated to reduce the time consumption. Secondly, the imbalance data handle techniques can be combined with NCTE.

Acknowledgments

This work was supported in part by the Natural Science Foundation of China (NSFC) under Grants 51805192, National Natural Science Foundation for Distinguished Young Scholars of China under Grant No.51825502, China Postdoctoral Science Foundation under Grant 2017M622414, Guangdong Science and Technology Planning Program under Grant 2015A020214003 and Supported by Program for HUST Academic Frontier Youth Team under Grant 2017QYTD04.

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This study was funded as part of an ARC Linkage Grant between the Centre for Statistical and Survey Methodology at University of Wollongong, NSW Health, NZ Ministry of Health, the Australian Bureau of Statistics (ABS) and Australian Bureau of Agriculture and Resource Economics (ABARE).
Publication of this article was funded by National Institute of Applied Statistics Research Australia.
We acknowledge Dr Carole Birrell and Professor Ray Chambers for assistance with methodological issues and collaboration. In addition, our thanks go to the staff involved in developing and running the NSW Population Health Survey at the Centre for Epidemiology and Evidence, NSW Ministry of Health, and the respondents for participating in the survey.

Conflict of interest

The authors declare that they have no competing interests.

References

[1]	(2000) National Research CouncilSmall-area estimates of school-age children in poverty: evaluation of current methodology. US Panel on Estimates of Poverty for Small Geographic Areas. National Academy Press.
[2]	Statistics Canada (2015) Regional Health Indicators. Available from: http://www.statcan.gc.ca/pub/82-221-x/2012002/quality-qualite/qua1-eng.htm#a11.
[3]	Queensland Health (2011) Self reported health status: 2009–2010: Local Government Area summary report. Available from: http://www.health.qld.gov.au/ph/Documents/epi/srhs0910lgasummary.pdf.
[4]	CDC (2007) 2007 SMART BRFSS MMSA Methodology. Available from: http://www.cdc.gov/brfss/smart/2007.htm.
[5]	Rao J (2003) Small Area Estimation (Methods and Applications) Hoboken, NJ: John Wiley & Sons. doi: 10.1002/0471722189
[6]	Centre for Epidemiology and Evidence (2017) Overview of Survey. Available from: http://www.health.nsw.gov.au/surveys/adult/Pages/overview-of-survey.aspx.
[7]	Barr M, Baker D, Gorringe M, et al. (2008) NSW Population Health Survey: Description of Methods. Available from: http://www.health.nsw.gov.au/resources/publichealth/surveys/health_survey_methods.pdf.
[8]	Centre for Epidemiology and Research (2012) Health Statistics New South Wales. Available from: http://www.healthstats.nsw.gov.au.
[9]	Public Health Division (2000) Report on the 1997 and 1998 NSW Health Surveys. Available from: http://www.health,nsw.gov.au/publichealth/surveys/hsa/9798/methods.htm.
[10]	Barr ML, Ferguson RA, Hughes PJ, et al. (2014) Developing a weighting strategy to include mobile phone numbers into an ongoing population health survey using an overlapping dual-frame design with limited benchmark information. BMC Med Res Methodol 14: 102-111. doi: 10.1186/1471-2288-14-102
[11]	Australian Bureau of Statistics (2006) Statistical Geography Volume 1-Australian Standard Geographical Classification. Available from: http://www.ausstats.abs.gov.au/ausstats/subscriber.nsf/0/3E15ACB95DA01A65CA2571AA0018369F/$File/12160_2006.pdf.
[12]	Australian Bureau of Statistics (2007) 2006 Census community profile series: Basic community profile. Available from: http://www.Censusdata.abs.gov.au/ABSNavigation/prenav/PopularAreas?collection=Census/period=2006.
[13]	Australian Bureau of Statistics (2010) National Regional Profile, 2005 to 2009 (CSV file). Available from: http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/1379.0.55.012005\%20to\%202009?OpenDocument.
[14]	Glover J, Tennant S (2010) Social Health Atlas of New South Wales (including ACT) Local Government Areas, 2010. Available from: http://www.publichealth.gov.au/interactive-mapping/a-social-health-atlas-of-australia_-2010.html.
[15]	Hindmarsh DM (2013) Small area estimation for health surveys. Doctor thesis. University of Wollongong: School of Mathematics and Applied Statistics.
[16]	Saei A, Chambers R (2003) Small Area Estimation Under Linear and Generalized Linear Mixed Models with Time and Area Effects. Southampton Statistical Sciences Research Institute Methodology Working Paper, M03 .
[17]	Chambers R, Skinner CJ (2003) Analysis of Survey Data. Wiley Series in Survey Methodology Chichester, UK: Wiley.
[18]	Pereira L, Coehlo P (2010) Assessing different uncertainty measures of EBLUP: a resampling-based approach. J Stat Computation Simul 80: 713-725. doi: 10.1080/00949650902766860
[19]	Rodriguez G, Elo I (2003) Intra-class correlation in random-effects models for binary data. Stata J 3: 32-46. doi: 10.1177/1536867X0300300102
[20]	Nagelkerke NJ (1991) A note on a general definition of the coefficient of determination. Biometrika 78: 691-692. doi: 10.1093/biomet/78.3.691
[21]	Gelman A, Goodrich B, Gabry J, et al. (2019) R-Squared for Bayesian Regression Models. Am Stat 73: 307-309. doi: 10.1080/00031305.2018.1549100
[22]	Scholes S, Pickering K, Rayat P (2007) Healthy lifestyle behaviours: model based estimates for Middle Layer Super Output Areas and Local Authorities in England, 2003–2005. User Guide. Information Centre and National Centre for Social Research, Colchester Available from: www.ic.nhs.uk/statistics-and-data-collections/population-and-geography/neighbourhood-statistics/neighbourhood-statistics:-model-based-estimates-of-healthy-lifestyles-behaviours-2003-05.
[23]	Brown G, Chambers R, Heady P, et al. (2001) Evaluation of small area estimation methods - an application to unemployment estimates from the UK LFS. Available from: https://www.researchgate.net/publication/237775200.
[24]	Vlassoff C, Garcia Moreno C (2002) Placing gender at the centre of health programming: challenges and limitations. Soc Sci Med 54: 1713-1723. doi: 10.1016/S0277-9536(01)00339-2
[25]	Schneider KL, Lapane K, Clark MA, et al. (2009) Using small-area estimation to describe county-level disparities in mammography. Prev Chronic Dis 6.
[26]	Singh BB, Shukla GK, Kundu D (2005) Spatio-temporal models in small area estimation. Surv Methodol 31: 183-195.
[27]	Srebotnjak T, Mokdad AH, Murray CJ (2010) A novel framework for validating and applying standardised small area measurement strategies. Popul Health Metr 8: 26. doi: 10.1186/1478-7954-8-26
[28]	Lawson A, Browne W, VidalRodeiro C (2003) Disease Mapping with WinBUGS and MLwiN Chichester, England: John Wiley and Sons Ltd. doi: 10.1002/0470856068
[29]	ABS (2006) A Guide to Small Area Estimation-Version 1.1. Available from: http://www.nss.gov.au/nss/home.nsf/pages/Small+Areas+Estimates.
[30]	Zhang X, Holt JB, Lu H, et al. (2014) Multilevel regression and poststratification for small-area estimation of population health outcomes: a case study of chronic obstructive pulmonary disease prevalence using the behavioral risk factor surveillance system. Am J Epidemiol 179: 1025-1033. doi: 10.1093/aje/kwu018
[31]	Loux T (2020) Multilevel regression with poststratification for local estimation: example and lessons learned. Presented at Conference on Statistical Practice Sacramento, USA: Available from: https://ww2.amstat.org/meetings/csp/2020/onlineprogram/AbstractDetails.cfm?AbstractID=303940.
[32]	Gomez-Rubio V, Best N, Richardson S, et al. (2010) Bayesian Statistics for Small Area Estimation. Technical report (unpublished), Imperial College London Available from: http://eprints.ncrm.ac.uk/1686/1/BayesianSAE.pdf.
[33]	Mukhopadhyay PK, McDowell A (2011) Small Area Estimation for survey data analysis using SAS software. Available from: http://support.sas.com/resources/papers/proceedings11/336-2011.pdf.
[34]	Zhao Y, Staudenmayer J, Coull BA, et al. (2006) General design Bayesian generalised linear mixed models. Stat Sci 21: 35-51. doi: 10.1214/088342306000000015

publichealth-07-02-034-s001.pdf
publichealth-07-02-034-s002.xlsx

This article has been cited by:

1.	Kanokwan Chancharoenchai, Wuthiya Saraithong, Investigating Consumers’ Preference for Acrylamide-Free Cassava Snacks, 2021, 10, 2304-8158, 2721, 10.3390/foods10112721
2.	Nadia Palmieri, Walter Stefanoni, Francesco Latterini, Luigi Pari, Factors Influencing Italian Consumers’ Willingness to Pay for Eggs Enriched with Omega-3-Fatty Acids, 2022, 11, 2304-8158, 545, 10.3390/foods11040545
3.	John Aliu, Ayodeji Emmanuel Oke, Onoriode Austin Odia, Prince O. Akanni, Feyisetan Leo-Olagbaye, Clinton Aigbavboa, Exploring the barriers to the adoption of environmental economic practices in the construction industry, 2024, 1477-7835, 10.1108/MEQ-01-2024-0053

Reader Comments

Your name:*

Email:*
© 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Public Health

2.7 5.3

Metrics

Article views(4675) PDF downloads(183) Cited by(3)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(1) / Tables(11)

AIMS Public Health

Creating local estimates from a population health survey: practical application of small area estimation methods

Related Papers:

Abstract

1. Introduction

2. Literature review

2.1. Data-driven fault diagnosis

2.2. Transfer learning

2.3. Ensemble method in fault diagnosis

3. Negative correlation learning

4. Proposed negative correlation transfer ensemble model for fault diagnosis

4.1. The flowchart of the proposed NCTE

4.2. Data preprocessing

4.3. Feature transfer based on ResNet-50

4.4. The training method of NCTE

4.5. Hyper-parameter selection using cross validation

5. Case studies: KAT bearing dataset

5.1. Data description

5.2. Hyper-parameter selection using CV technique

5.3. Sensibility analysis of classifier number

5.4. The analysis on TL and NCL

5.5. The results and comparison

6. Conclusion

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

AIMS Public Health

Creating local estimates from a population health survey: practical application of small area estimation methods

Related Papers:

Abstract

1. Introduction

2. Literature review

2.1. Data-driven fault diagnosis

2.2. Transfer learning

2.3. Ensemble method in fault diagnosis

3. Negative correlation learning

4. Proposed negative correlation transfer ensemble model for fault diagnosis

4.1. The flowchart of the proposed NCTE

4.2. Data preprocessing

4.3. Feature transfer based on ResNet-50

4.4. The training method of NCTE

4.5. Hyper-parameter selection using cross validation

5. Case studies: KAT bearing dataset

5.1. Data description

5.2. Hyper-parameter selection using CV technique

5.3. Sensibility analysis of classifier number

5.4. The analysis on TL and NCL

5.5. The results and comparison

6. Conclusion

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog