1.
Introduction
According to statistics from the World Health Organization (WHO), approximately 17.9 million people die from cardiovascular diseases every year [1]. Arrhythmia is a common clinical manifestation of cardiovascular diseases, and electrocardiogram (ECG) monitoring is an effective method for detecting and recording arrhythmias. Therefore, real-time and high-precision automatic ECG monitoring can effectively predict arrhythmias.
Nowadays, traditional machine learning methods and deep learning-based methods are the most widely used for arrhythmia classification methods. Traditional machine learning methods are usually implemented for ECG classification using sample feature-based learning methods, such as support vector machine (SVM) [2], random forest [3], k-nearest neighbor classifier (k-NN) [4], decision tree [5], Bayesian classifier [6], etc. For example, Faziludeen and Sabiq [7] proposed a new method that utilizes Daubechies 4 wavelet decomposition for heartbeat feature selection, and 25 features were extracted from each heartbeat through wavelet analysis, and then SVM method was used to classify them. In addition, Li and Zhou [8] used the entropy of wavelet packet decomposition (WPD) coefficients as representative features and subsequently employed random forest (RF) for ECG classification. Venkatesan et al. [9] proposed an adaptive filter with the delay error normalized least mean squares (LMS) algorithm to address the issue of prolonged processing time for ECG signals, which achieved high speed and good classification performances. However, in these traditional machine learning methods, the classification performance often heavily depends on the quality of feature engineering, and the normal beat (N) and supraventricular premature beat (S) classes often represents similar wave form with similar features. Therefore, it is difficult to classify them accurately using conventional approaches.
Among deep learning-based methods, one-dimensional convolutional neural network (1-D CNN) is widely used for the classification of cardiac arrhythmias. Acharya et al. [10] proposed a 9-layer deep convolutional neural network (CNN) which can successfully classified 5 different categories of ECG signals by eliminating high-frequency noise and data augmentation. Hannun et al. [11] conducted a comprehensive evaluation of end-to-end deep learning methods for ECG classification, and achieved high diagnostic performance similar to that of cardiologists. Mousavi and Afghah [12] proposed a novel and effective ECG-based automatic heartbeat classification method, which addresses the challenge of ECG data imbalance by utilizing a sequence-to-sequence deep learning method and an oversampling technique algorithm. Sabut et al. [13] proposed a deep neural network (DNN) approach based on mixed time-frequency features, which can enhance the diagnostic efficiency of ventricular tachycardia (VT) and ventricular fibrillation (VF). In addition, recurrent neural networks (RNN) [14] was a promising way to capture the forward and backward dependencies of time series. Lynn et al. [15] proposed a bidirectional gated recurrent unit (GRU) method to implement ECG classification effectively. Yildirim [16] proposed deep bidirectional long short-term memory (LSTM) networks and wavelet sequences for ECG classification, achieving good classification performances.
Most ECG classification methods focused on the evaluation under an intra-patient assessment paradigm. This intra-patient evaluation paradigm often neglected the differences of ECG signals from different patients, thereby limiting its generalization ability. The inter-patient assessment, as used in the clinical application scenarios, often meets these problems such as data imbalance and inter-individual variability, which can deteriorate the classification accuracy and generalization capability of the model. To address the above issues, we propose a novelty method for ECG classification as following:
1) A multi-path parallel deep convolutional neural network is proposed for arrhythmia classification by extracting the features of P wave, QRS wave group, and T wave simultaneously, and a global average RR interval is introduced to address the issue of data similarity between normal beat (N) and supraventricular premature beat (S).
2) A weighted loss function has been developed to dynamically adjust weights based on the proportion of each class in the input batch, so as to overcome the data imbalance problem.
3) Both intra-patient and inter-patient evaluation paradigms are used to validate the classification performances of the proposed method, and the experimental results showed that the proposed method can achieve the most accurate classification performances among these state-of-the-art methods.
The rest of this paper is organized as follows: Section 2 introduces the ECG datasets and the proposed method. Section 3 outlines the experimental setup and comparison between the proposed method and state-of-the-art methods. Section 4 discusses the performance of the proposed model and conducts ablation experiments to verify the effectiveness of the model. Section 5 summarizes and looks forward to this paper.
2.
Materials and methods
2.1. ECG database
In this paper, the MIT-BIH arrhythmia database [17] is used to validate the proposed method for ECG classification. The MIT-BIH Arrhythmia Database is an open-source dataset provided by the Massachusetts Institute of Technology (MIT) and recommended by the Association for the Advancement of Medical Instrumentation (AAMI) [18]. It contains 48 dual-channel ECG records collected from 47 patients, with a duration of about half an hour. Each recording consisted of two leads (i.e., Leads Ⅰ and Ⅱ), sampled in the range of 10 mV per channel at a sampling frequency of 360 Hz, and each recording was individually annotated by two cardiologists (total 110,000 notes). In this paper, Lead Ⅱ (MLII) ECG datasets was used for training and testing the proposed method.
According to the standards from AAMI, the beats of ECG signals can be categorized into five categories for the classification of arrhythmias, as shown in Table 1. Because AAMI defines the Q-type cardiac beat as an unknown category, which is for reference only and has no specific application value, in this paper, we primarily utilize the first four categories as experimental data.
In this paper, intra-patient and inter-patient experiments are used to validate the classification performances of the proposed method. In the intra-patient evaluation approach, the training and testing sets are collected together with a total of 100,501 heartbeats from different patients, and the training and testing samples are partitioned based on heartbeats. Additionally, we utilize a 10-fold cross-validation method to validate the model. We divide all heartbeats into 10 equal parts, sequentially selecting 9 parts as the training set and the rest 1 part as the testing set. This process is repeated 10 times. However, there are some limitations to this evaluation method, because some heartbeat samples may come from the same patient in both the training and testing sets. In the inter-patient evaluation approach, we followed De Chazal et al. [19] definition of dataset partitions and divided it into DS1 and DS2 datasets, as shown in Table 2. Furthermore, DS1 was used for model training, while DS2 was used for model testing. In this scheme, different numerical identifiers represent different patients' ECG records. It fully accounts for individual diversity and can effectively evaluate the universality and robustness of the proposed method.
2.2. ECG data preprocessing
For a given ECG signal, the ECG preprocessing procedure includes as follows:
1) ECG heartbeat segmentation. According to the annotations made by experts on the R-peak and heartbeat categories of the ECG signal, the R-peak position of each heartbeat in the data is obtained. Since the sampling frequency of the ECG acquisition equipment is 360 Hz, and the cardiac cycle is about 0.8 s, we use a fixed window of length 255 to capture the heartbeat waveform. For each heartbeat, we take the preceding 128 points and the following 127 points from the R peak as a sample.
2) Eliminating data instability. Considering that wearing and taking off the instrument will cause severe heartbeat fluctuations, we removed the first 10 and last 10 heartbeats to ensure the stability of the heartbeat.
2.3. Dynamic weighted focal loss
The distribution of beats in different categories of the MIT-BIH arrhythmia database is presented in Table 3. It can be found that the heartbeat distribution of different categories is seriously imbalanced. Moreover, unbalanced data often leads to overfitting in categories with a large number of samples. Therefore, a dynamic weighted focal loss is proposed to address the data imbalance problem, which can adjust the weight dynamically using the number of heartbeats of different categories in the input batch.
First, we define a label set Batch_labeli as show in Eq (1), representing X heartbeats contained in the i-th training batch, where yj denote the j-th beat label in Batch_labeli, yj∈{N,S,V,F}, and each category corresponds to a different one-hot encoding, such as N=[1,0,0,0], S=[0,1,0,0].
Therefore, we define the weight of the i-th batch as αi, as shown in Eq (2), assigning different weights based on the number of different categories in each batch. The larger the proportion of a certain category in each batch, the lower the weight obtained. In addition, the smaller the proportion, the greater the weight obtained, of which 0.05 is to prevent the weight from reaching 0.
In addition, the i-th batch weighted focal loss is defined as Li, as shown in Eq (3), where ˆy represents the predicted probability of each class and y is the one-hot encoding of the ground truth label.
2.4. Global average RR interval feature extraction
When performing classification tasks, convolutional networks often struggle to learn the differences between N-class and S-class waveforms due to their wave similarities. Additionally, N-class data is much more abundant than S-class data, which can further degrade the generalization of model. However, compared to N-class data, S-class data typically exhibit incomplete compensatory intervals, indicating shorter forward RR intervals. The forward RR interval represents the time interval between the current heartbeat's R-wave and the R-wave of the previous heartbeat. Therefore, in this paper, we introduce RR intervals to assist the network in classification. The definition of the forward RR interval is illustrated in Eq (4), where i represents the i-th RR interval and j represents the j-th patient.
In this paper, we adopted an overall average normalization method, given that the ventricular beat rate varies among different patients, resulting in variable RR intervals. First, the average of all RR intervals for j-th patient is defined as RRaver,j, as shown in Eq (5). With j-th patient's RRaver,j as the standard for this patient, the forward RR interval divided by the patient's RRaver,j is defined as RRglobal,j(i). The RRglobal,j(i) represents the global average RR interval, as shown in Eq (6).
The global average normalization method standardizes RR intervals across different patients, facilitating data comparison. Additionally, due to the limited number of features, four transformations are used to expand the global average RR interval feature space, as defined in Eqs (7)–(10). In this study, five features (RRglobal,j(i), RR1, RR2, RR3, RR4) are used to train the network for ECG classification tasks.
In order to present the data distribution more intuitively, we use Gaussian kernel density estimation to analyze the relationship between the global average RR interval and variants, as shown in Figure 1. In the figure, the horizontal axis represents the global average RR interval. The vertical axis represents the probability density of occurrence. Figure 1 illustrates that in both the training set and the testing set, the global average RR interval can make the N class and the S class be linearly separable.
2.5. The multi-path parallel deep convolutional neural network
We propose a multi-path parallel deep convolutional neural network with strong generalization ability, as shown in Figure 2. The network is mainly composed of 20 convolutional layers, 20 activation function layers, 20 batch normalization layers [20], 8 dropout layers, 3 MaxPool layers, 2 fully connected layers, 1 LSTM layer, and 1 softmax layer. The following are details about the model structure.
Input: In this paper, we divided the ECG signal into 3 segments, each with a length of 85, and fed them into three identical parallel networks to extract the features of the P wave, QRS wave group, and T wave, respectively. Therefore, the network can focus on features extraction from different periods simultaneously. In addition, we adopted the global average RR interval feature and its four variants as the fourth input of the network to help the network perform classification. We use a convolutional network for feature extraction on this feature, mainly to increase the dimension of the feature, thereby increasing its weight in network decision-making and classification.
Convolutional Blocks: In our network, we define the initial three layers as convolutional blocks for extracting initial features at different time periods. This module consists of a convolutional layer, a batch normalization layer, and a ReLU activation function layer. The convolutional layer consists of 16 convolutions with a kernel size of 5 × 1.
Layer structure: Compared to the traditional convolutional neural network (CNN), the residual network achieves higher classification accuracy and overcomes the issue of information loss [21]. In this paper, we design a residual structure. The structure initially conducts feature extraction through 32 1-D convolutions with a convolution length of 5 × 1 and a stride of 1. Then, batch normalization is applied to the extracted features. The ReLU activation function is used to prevent gradient vanishing and saturation. Next, a dropout layer [22] randomly deactivates 20% of the neurons to prevent overfitting. Finally, feature extraction is performed again through 32 1-D convolutions with a convolution length of 7 × 1 and a stride of 1. The extracted features are then normalized and activated. It is worth noting that the residual block used to extract RR interval features, the convolution length as 1 × 1.
LSTM layer: LSTM boasts a distinctive gate mechanism internally, endowing it with long-term memory capabilities and addressing concerns related to gradient vanishing and exploding [23]. Hence, following the concatenation of the extracted features, we employed LSTM to discern critical information among features, with the LSTM units set to 64.
Fully connected layer: After learning the dependencies between sequences, we utilized a fully connected layer with 64 units and ReLU activation function for feature mapping. Subsequently, we concatenate the RR interval features with another fully connected layer containing 5 units to enhance its classification decision-making capability. Finally, we mapped the obtained 10 features to a fully connected layer with 4 units and utilized the softmax activation function to compute the probability of each class.
3.
Experimental results
3.1. Experiment setting
In this paper, the experiment was conducted on a server with an NVIDIA GeForce RTX 3090 graphics card and Ubuntu 20.04 operating system. The loss function adopts dynamic weighted focal loss.
The gradient descent optimizer uses the Adam optimizer [24] with a learning rate of 0.001. The batch size of ECG data is set to 16 in the multi-path parallel deep convolutional neural network.
3.2. Metrics
To evaluate the performance of the proposed method, we applied several widely used metrics. The detailed definitions are given as follows:
where TP is the number of positive samples predicted to be positive. TN is the number of negative samples predicted as negative. FN is the number of positive samples predicted as negative. FP is the number of negative samples predicted to be positive.
3.3. Experiment results
Most of arrhythmia classification methods use the MIT-BIH ECG database as the research object. However, some researchers use only an intra-patient evaluation paradigm, neglecting the differences in ECG signals among individuals. In this paper, both intra-patient and inter-patient methods are used to evaluate the proposed method for ECG classification.
3.3.1. Intra-patient based model evaluation
In the intra-patient evaluation method, the training and testing sets involve non-repetitive random sampling of heartbeats from all patients. In the training and testing sets, there may be heartbeats of the same category from the same patient, and a 10-fold cross-validation method is used to evaluate the model's generalization ability. The 100,501 heartbeats in the DS1 + DS2 dataset were randomly shuffled and divided into 10 parts, with 9 parts ware used for model training, and 1 part was used for model testing. After repeating this process 10 times, the results of all model classification performances were aggregated. The resulting confusion matrix is presented in Table 4.
From Table 5, compared with state-of-the-art methods using the intra-patient assessment paradigm, it can be found that the proposed method outperforms other methods in terms of classification performances, especially in the minority categories (S, V, F). Acharya et al. [10] achieved the best performance in the F category, because data augmentation method was used to generate the F samples. In the methods of Romdhane and Pr [26], Pandey and Janghel [27], and Shoughi and Dowlatshahi [28], these methods can get better performances in the N category than the proposed method. However, the sensitivities of the proposed method in the S, V, and F categories were larger than those of the other three methods, which has more important value in the clinical Arrhythmia Classification. According to the experimental results, it can be found that the weighted focal loss method can address the data imbalance problem effectively and improve the classification performances of the minority.
3.3.2. Inter-patient based model evaluation
In the inter-patient evaluation method, the training set and the testing set are divided based on different patients. The training set and testing set extracted by this method will be completely independent from different patients. Due to data imbalance and wave similarities between the N class and S class, the ECG classification becomes more challenging using inter-patient than using intra-patient assessment.
Due to the difficulty of classification in the inter-patient evaluation method, most approaches only focus on the classification of normal beats (N), supraventricular ectopic beats (S), and ventricular ectopic beats (V), without considering the minority class of fusion beats (F). To compare other methods using this evaluation paradigm, we just consider for implementing classification experiments of N, S, and V. The confusion matrix resulting from the experiments is shown in Table 6.
As shown in Table 7, compared with other methods, the proposed method achieves the best performances in the minority S and V classes, which can detect the abnormal ECG signals accurately. In terms of the overall accuracy, the methods proposed by Garcia et al. [29] and Siouda et al. [32] get slightly better results than the proposed method, because these methods achieve high sensitivity in the N category, resulting in overall good accuracy. However, these methods exhibit a high misdiagnosis rate for the S and V categories. In all, the proposed method can address the data imbalance and wave similarities between the N class and S class problem effectively, so as to improve the classification performances in the minorities, such as the S and V categories.
4.
Discussion
To validate the effectiveness of the global average RR interval feature and the dynamic weighted Focal Loss function, we designed relevant ablation experiments in the inter-patient paradigm. The experiments utilized the global average RR interval feature, Focal Loss (FL), dynamic weighted Focal Loss function (DFL), and Cross-Entropy Loss function (CE) as experimental variables to investigate the ablation study. We conducted a total of six comparison experiments, denoted as: 1) Using CE loss function, 2) using global average RR interval feature with CE loss function, 3) using FL loss function, 4) using global average RR interval feature with FL loss function, 5) using FL loss function, and 6) using global average RR interval feature with DFL loss function.
As shown in Table 8, when combining DFL and the global average RR interval features, the proposed method can solve the overfitting problem effectively caused by the imbalance between data and the similarity between N and S classes. Although in terms of overall accuracy, the proposed method is not the best one with 91.22%, smaller than the method with CE loss with 94.35%. Because N class accounts for 90.63% of the total training data, the network tends to overfit to N class, leading to a high accuracy rate for N class, which directly impacts the overall accuracy. However, it is not useful to help detect the abnormal ECG signals, resulting in a lower classification accuracy for S and V class.
From Table 8, it can be found that the classification performances of S classes improve obviously using the global average RR interval features, which demonstrates that the global average RR interval features aid to distinguish between classes N and S effectively. When compared with CE loss function, it can be found that the FL and DFL can address data imbalance problem effectively with better classification accuracy. However, the parameter selection of FL method is a tricky problem by parameter sweeping. In contrast, the DFL can adjust the weight factors dynamically based on the number of different classes in the input without multiple experiments. When the FL method achieved the optimal wight factors, as [0.25, 8, 1], it can get approximate classification accuracy as DFL method. In contrast, the dynamic weighted focal loss (DFL) can effectively suppress the data imbalance problem and pay a key role to a few categories with larger weight factors.
To better illustrate the advantages of DFL and RR intervals, we plotted the corresponding electrocardiograms as shown in Figure 3. In Figure 3(a), we can clearly observe that the waveforms of the N class and S class are highly similar, and the difference between the N class and S class is small. Therefore, during the classification task, without RR intervals, the network often finds it challenging to learn the differences between them. As shown in Figure 3(b), compared to the N class, the waveform of the V class tends to exhibit morphological abnormalities. However, due to data imbalance, the classification network is predominated by the N class. After using the DFL method, the proposed method can address the data imbalance effectively, so as to detect the V class.
To evaluate the impact of global average RR interval features on classification performance under different input conditions, we conducted relevant ablation experiments. As shown in Figure 2, we utilized two distinct input methods simultaneously for the global average RR interval features: 1) Extracting global average RR interval features using convolutional networks, and 2) concatenating global average RR interval features directly at the final fully connected layer. Consequently, we defined three experimental conditions: A1 without global average RR interval features; A2 with the global average RR interval features using convolutional networks; A3 with global average RR interval features directly at the final fully connected layer; and the proposed method used both the global average RR interval features.
It can be seen from Table 9 that under different input methods, using the global average RR interval feature can improve the decision-making ability of the network. Using two input methods at the same time can significantly improve the sensitivity, accuracy, and specificity of the S-class classification. Therefore, using global average RR interval features can solve the problem of similarity between N-class and S-class data effectively.
5.
Conclusions
In this paper, we propose a multi-path parallel deep convolutional neural network (MPP-CNN) and a dynamically weighted focal loss function for arrhythmia classification. In addition, the global average RR interval features are proposed to solve the problem of data similarity through two input methods. Furthermore, a dynamic weighted loss function has been developed to solve the imbalance problem by adjusting weights dynamically based on the proportion of each class in the input batch. Intra-patient and inter-patient experimental results show that the proposed method can achieve good classification performance, especially for the minority. In future work, the proposed method will be put into clinical diagnosis for arrhythmia classification.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgements
This work is supported in part by the National Key R & D Program of China (2023YFE0205600), the National Natural Science Foundation of China (62272415), the Key Research and Development Program of Zhejiang Province (2023C01041), and the Key Research and Development Program of Ningxia Province (2023BEG02065).
Conflict of interest
Ling Xia is a guest editor for [Mathematical Biosciences and Engineering] and was not involved in the editorial review or the decision to publish this article. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.