
Shiga toxin-producing E. coli (STEC) are diarrheagenic strains that can cause bloody diarrhea and hemolytic-uremic syndrome. Their main virulence factor, the Shiga toxin (Stx), is encoded by phages integrated into the bacterial chromosome. Stx phages are widely diverse and carry many genes with limited or unknown function. As the toxin subtype Stx2a is associated with highly pathogenic strains, this study was mainly focused on the characterization of the stx flanking region of Stx2a phages. Of particular interest was a sialate O-acetylesterase (NanS-p), which has been described previously to be encoded downstream stx in some phage genomes and may confer a growth advantage for STEC. Complete DNA sequences of Stx2a phages and prophages were retrieved from the GenBank database, and the genomic regions from anti-terminator Q to holin S genes were bioinformatically analyzed. Predicted NanSp sequences from phages encoding other Stx subtypes were also studied. Additionally, expression of nanS-p was quantified by qPCR in strains selected from our laboratory collection. The analysis of Stx2a phage genomes showed that all carried the Q, stx2a, nanS-p and S genes, but with allele diversity and other sequence differences. In particular, sequence differences were detected in each of the three domains of NanS-p esterases encoded by Stx2a phages and other Stx phages; however, nanS-p was not identified in the Stx2e, Stx2f and Stx2g phages analyzed. The expression of nanS-p increased in most stx2a-positive strains under phage inducing conditions, as was previously shown for stx2a. As the present work showed diversity at the Q-S region among Stx phages, and particularly in the encoded NanS-p enzyme, future studies will be necessary to evaluate if NanS-p variants differ in their activity and to assess the impact of the absence of nanS-p in certain Stx phages.
Citation: Stefanía B. Pascal, Ramiro Lorenzo, María Victoria Nieto Farías, John W.A. Rossen, Paula M. A. Lucchesi, Alejandra Krüger. Characterization of the flanking region of the Shiga toxin operon in Stx2a bacteriophages reveals a diversity of the NanS-p sialate O-acetylesterase gene[J]. AIMS Microbiology, 2023, 9(3): 570-590. doi: 10.3934/microbiol.2023030
[1] | Tetiana Biloborodova, Lukasz Scislo, Inna Skarga-Bandurova, Anatoliy Sachenko, Agnieszka Molga, Oksana Povoroznyuk, Yelyzaveta Yevsieieva . Fetal ECG signal processing and identification of hypoxic pregnancy conditions in-utero. Mathematical Biosciences and Engineering, 2021, 18(4): 4919-4942. doi: 10.3934/mbe.2021250 |
[2] | Jiongxuan Zheng, Joseph D. Skufca, Erik M. Bollt . Heart rate variability as determinism with jump stochastic parameters. Mathematical Biosciences and Engineering, 2013, 10(4): 1253-1264. doi: 10.3934/mbe.2013.10.1253 |
[3] | Weidong Gao, Yibin Xu, Shengshu Li, Yujun Fu, Dongyang Zheng, Yingjia She . Obstructive sleep apnea syndrome detection based on ballistocardiogram via machine learning approach. Mathematical Biosciences and Engineering, 2019, 16(5): 5672-5686. doi: 10.3934/mbe.2019282 |
[4] | Nalin Fonseka, Jerome Goddard Ⅱ, Alketa Henderson, Dustin Nichols, Ratnasingham Shivaji . Modeling effects of matrix heterogeneity on population persistence at the patch-level. Mathematical Biosciences and Engineering, 2022, 19(12): 13675-13709. doi: 10.3934/mbe.2022638 |
[5] | Michele Barbi, Angelo Di Garbo, Rita Balocchi . Improved integrate-and-fire model for RSA. Mathematical Biosciences and Engineering, 2007, 4(4): 609-615. doi: 10.3934/mbe.2007.4.609 |
[6] | Luca Dedè, Francesco Regazzoni, Christian Vergara, Paolo Zunino, Marco Guglielmo, Roberto Scrofani, Laura Fusini, Chiara Cogliati, Gianluca Pontone, Alfio Quarteroni . Modeling the cardiac response to hemodynamic changes associated with COVID-19: a computational study. Mathematical Biosciences and Engineering, 2021, 18(4): 3364-3383. doi: 10.3934/mbe.2021168 |
[7] | Fengjuan Liu, Binbin Qu, Lili Wang, Yahui Xu, Xiufa Peng, Chunling Zhang, Dexiang Xu . Effect of selective sleep deprivation on heart rate variability in post-90s healthy volunteers. Mathematical Biosciences and Engineering, 2022, 19(12): 13851-13860. doi: 10.3934/mbe.2022645 |
[8] | Diguo Zhai, Xinqi Bao, Xi Long, Taotao Ru, Guofu Zhou . Precise detection and localization of R-peaks from ECG signals. Mathematical Biosciences and Engineering, 2023, 20(11): 19191-19208. doi: 10.3934/mbe.2023848 |
[9] | Wei Zhong, Li Mao, Wei Du . A signal quality assessment method for fetal QRS complexes detection. Mathematical Biosciences and Engineering, 2023, 20(5): 7943-7956. doi: 10.3934/mbe.2023344 |
[10] | Biplab Dhar, Praveen Kumar Gupta, Mohammad Sajid . Solution of a dynamical memory effect COVID-19 infection system with leaky vaccination efficacy by non-singular kernel fractional derivatives. Mathematical Biosciences and Engineering, 2022, 19(5): 4341-4367. doi: 10.3934/mbe.2022201 |
Shiga toxin-producing E. coli (STEC) are diarrheagenic strains that can cause bloody diarrhea and hemolytic-uremic syndrome. Their main virulence factor, the Shiga toxin (Stx), is encoded by phages integrated into the bacterial chromosome. Stx phages are widely diverse and carry many genes with limited or unknown function. As the toxin subtype Stx2a is associated with highly pathogenic strains, this study was mainly focused on the characterization of the stx flanking region of Stx2a phages. Of particular interest was a sialate O-acetylesterase (NanS-p), which has been described previously to be encoded downstream stx in some phage genomes and may confer a growth advantage for STEC. Complete DNA sequences of Stx2a phages and prophages were retrieved from the GenBank database, and the genomic regions from anti-terminator Q to holin S genes were bioinformatically analyzed. Predicted NanSp sequences from phages encoding other Stx subtypes were also studied. Additionally, expression of nanS-p was quantified by qPCR in strains selected from our laboratory collection. The analysis of Stx2a phage genomes showed that all carried the Q, stx2a, nanS-p and S genes, but with allele diversity and other sequence differences. In particular, sequence differences were detected in each of the three domains of NanS-p esterases encoded by Stx2a phages and other Stx phages; however, nanS-p was not identified in the Stx2e, Stx2f and Stx2g phages analyzed. The expression of nanS-p increased in most stx2a-positive strains under phage inducing conditions, as was previously shown for stx2a. As the present work showed diversity at the Q-S region among Stx phages, and particularly in the encoded NanS-p enzyme, future studies will be necessary to evaluate if NanS-p variants differ in their activity and to assess the impact of the absence of nanS-p in certain Stx phages.
Heart rate variability (HRV) is the measure of fluctuations between successive heartbeats, reflecting the interplay between the sympathetic and parasympathetic branches of the autonomic nervous system (ANS) [1]. These fluctuations are critical markers of the physiological factors influencing heart rhythm, serving as indicators of an individual's health status [2]. Studies have shown that higher HRV levels are indicative of robust health, suggesting a well-balanced autonomic regulation. In contrast, reduced HRV is associated with an imbalance in autonomic function, leaning towards either diminished parasympathetic activity or heightened sympathetic drive, which correlates with an increased risk of adverse health outcomes [3,4]. The relevance of HRV extends to its application in diagnosing and monitoring cardiovascular conditions, which are leading contributors to global mortality [5]. This underscores the importance of HRV as an important parameter in the evaluation of both physiological well-being and the presence of pathological conditions, particularly those related to cardiovascular health.
HRV data analysis can be implemented through various signal processing techniques. In the time domain, HRV is assessed by calculating statistical measures such as the average of normal-to-normal (NN) intervals, the root mean square of successive differences between NN intervals (RMSSD), and the standard deviation of NN intervals (SDNN) [6]. These metrics provide insights into the heart rhythm and its variability over time. On the other hand, frequency domain analysis of HRV offers a perspective on the distribution of power across different frequency bands within the HRV data, serving as a more detailed reflection of autonomic nervous system activity. Spectral analysis has proven effective in distinguishing between sympathetic and parasympathetic nervous system influences on heart rate variability [7]. Specifically, the low-frequency (LF) component of the HRV spectrum is influenced by both sympathetic and parasympathetic activities, whereas the high-frequency (HF) component is predominantly a result of parasympathetic activity. Consequently, the LF/HF ratio is commonly utilised as a measure of the balance between sympathetic and parasympathetic influences, offering valuable insights into autonomic nervous system dynamics [8].
The most commonly used method to derive HRV data is via electrocardiogram (ECG) signals. With appropriate QRS detectors, R-peaks in the ECG morphology can be identified, and then the HRV data can be obtained by computing the RR intervals [9]. Other than using ECGs, some studies suggest using photoplethysmography (PPG) signals to derive the HRV data, where PPG is a non-invasive technique that utilises optical principles to obtain pulse waves from the microcirculation in peripheral tissue [10]. In particular, the utilisation of PPG signals for detecting pulse-wave related HRV, known as pulse rate variability (PRV), has attracted considerable attention in recent years [11]. This is primarily attributed to the convenience and affordability of acquiring PPG signals through wearable devices.
While electrocardiogram (ECG) or photoplethysmography (PPG) signals are commonly employed for HRV analysis, ensuring the reliability of HRV data extracted from these signals presents significant challenges. These difficulties are largely due to uncertainties inherent in data collection and computational methodologies, which may compromise data integrity. In particular, motion artefacts represent a substantial challenge, often overlapping in frequency with legitimate HRV signals (0.5 to 5 Hz) as they typically manifest within a 0.01 to 10 Hz range [12]. This overlap complicates the task of artefact removal without inadvertently altering the PPG signal. Moreover, the processes involved in HRV data analysis, such as the identification of R-peaks, interpolation of RR intervals, and data resampling, introduce additional complexity. These steps are crucial for accurate HRV measurement but can potentially distort the HRV data, highlighting the intricate balance required in processing and analysing HRV signals.
A variety of computational methods have been devised to mitigate uncertainties or inaccuracies in HRV data analysis, including adaptive filtering, blind source separation, and advanced deep learning techniques [13,14,15]. Despite the sophistication of these approaches, they often overlook the unique properties of the HRV spectrum's frequency bands, which limits their effectiveness in accurately estimating uncertainties. For instance, the low-frequency (LF) component of the HRV spectrum is influenced by both sympathetic and parasympathetic nervous system activities, whereas the high-frequency (HF) component is primarily regulated by the parasympathetic nervous system (PNS) [8]. Research has shown that measurement noise impacts these frequency bands; for example, the LF component exhibits substantial sensitivity to increasing noise levels, while the impact is not obvious in the HF component of the HRV spectrum [16]. This differential impact highlights the necessity of accounting for the distinct characteristics of HRV frequency bands in the development and application of computational techniques for HRV data analysis.
In this study, we introduce a novel approach for estimating uncertainties in the HRV spectrum by leveraging a refined low-rank matrix completion (MC) technique, which incorporates the distinct characteristics of HRV frequency components to improve the precision of uncertainty estimation. The concept of low-rank MC emerges as a powerful strategy for reconstructing missing or imprecise entries in observed data, grounded in the principle that many real-world datasets exhibit a low-rank structure due to their inherent low-dimensional characteristics or underlying patterns [18]. The application of low-rank properties to data is widely recognised in various domains, indicating the existence of underlying trends or dimensions within extensive datasets []. The MC method has demonstrated its efficacy in various fields, including enhancing image quality in compressive sampling photoacoustic microscopy [20], optimising signal channel selection in electroencephalogram analysis [21], and facilitating the identification of protein-protein interactions [22]. While there have been initial explorations of MC techniques in analysing HRV data [23,24], its potential in healthcare remains largely untapped, with a significant gap in detailed model development and comprehensive validation of its performance in medical applications.
By leveraging the low-rank characteristic of HRV data, we demonstrated the effective inference of uncertainties within the HRV spectrum using partial entries from the data matrix. This approach is further enhanced by a detailed analysis of HRV data characteristics, particularly focusing on the low-frequency (LF) and high-frequency (HF) components of the spectrum. Drawing on these insights, we developed a refined matrix completion (RMC) framework specifically designed to improve the estimation of uncertainties. This framework prioritises critical data entries within the HRV matrix, thereby enhancing the method's efficiency and robustness in estimating uncertainties. To assess the efficacy of our proposed RMC method in HRV spectrum estimation, we conducted comprehensive experimental studies across five benchmark datasets. Our method was evaluated against both the traditional MC approach and five regression-based machine learning models. The findings from these experiments and comparative analyses underscore the superior performance and robustness of our RMC method in HRV spectrum estimation. The key contributions of this study include:
● The introduction of an innovative method for estimating uncertainties in the HRV spectrum, utilising the matrix completion technique to exploit the low-rank nature of the HRV data.
● The development of an advanced matrix completion framework that enhances estimation accuracy and computational efficiency by focusing on the modelled spectrum of HRV data.
● The implementation of extensive experimental validation on five renowned benchmark datasets to evaluate the effectiveness of our model in HRV spectrum estimation.
● A series of comparative analyses with various neural network models and the traditional MC method, illustrating the significant advantages of our approach in the context of uncertainty estimation.
The remainder of this paper is organised as follows. Section 2 formulates the problem of HRV spectrum estimation. Section 3 develops the MC method and refinement. Section 4 performs the analysis of HRV data. The results of spectrum estimation and comparison study are presented in Section 5 and, finally, conclusions are drawn in Section 6.
In this paper, we denote the set of real numbers by R and the set of natural numbers by N. A matrix is represented by a bold uppercase letter A, which consists of elements ai,j for i=1,…,n and j=1,…,p, and belongs to the space Rn×p, indicating it has n rows and p columns. The Frobenius norm of matrix A is expressed as ||A||F=(∑ni=1∑pj=1|aij|2)1/2, encapsulating the square root of the sum of the absolute squares of its elements. The nuclear norm of matrix A is defined as ||A||∗=∑iσi(A), where σi(A) represents the ith singular value of A. The 2-norm of A is denoted as ||A||2=σmax(A), with σmax being its largest singular value.
Assuming that z(t)∈R, for t∈[0,Ts], represents the preprocessed HRV data over a time span of Ts, its frequency components can be delineated as follows:
P(f,˜f)=F(z(t)), | (2.1) |
where ˜f=[fl,fr] specifies the frequency range of interest within the HRV data, with fl and fr denoting the left and right boundaries, respectively. Here, F(⋅) symbolises the spectrum operation tailored to the selected frequency range ˜f, and P(f,˜f) represents the power spectral density (PSD) function of the HRV data, where various frequency-related HRV metrics can be extracted. More precisely,
hk(f,˜fk)=ϕk(˜fk,P(f,˜f)), | (2.2) |
where k=1,2,⋯,Nk indicates different types of HRV variables, such as the power of LF or HF, or the ratio between the LF and HF. The nonlinear mapping ϕk(⋅,⋅) computes frequency variable hk from the spectrum P(⋅,⋅) for the kth type of HRV variables in the ˜fk frequency range.
It is important to note that the accuracy of HRV measurements is influenced by various uncertainties, including noise and missing values. Additional factors, such as respiration rate, circadian rhythms, and physiological states, can also affect the data quality, therefore impacting the frequency analysis of HRV data. Consequently, these uncertainties pose challenges in precisely estimating the HRV spectrum.
Given data collected from Ns subjects, we construct a matrix H={hi,k}i=1,…,Ns,k=1,…,Nk∈RNs×Nk, derived from the measurements, where each element is defined by
hi,k(f,˜fk)=ϕk(˜fk,Pi(f,˜f)), | (2.3) |
where Pi is the PSD of the ith subject.
The PSD often reveals predominant frequency components, with many of the non-dominant frequencies having samll amplitudes that are considered negligible, resulting in a sparse matrix. In addition, the HRV spectrum matrix is derived from the time series of intervals between heartbeats. As the ECG waveform has similar repeating patterns [25], the different spectral components derived from ECG data would not be completely independent. This leads to the rank of the matrix being lower than the dimensions of the matrix, suggesting the low-rank characteristic of the HRV spectrum matrix. This observation underpins our motivation to apply low-rank MC for approximating the matrix and estimating uncertainties. This motivates us to introduce the RMC approach aiming at more effectively estimating the uncertainties within the spectrum. This step represents an advancement in our methodology, leveraging the foundational concept of low-rank MC while incorporating adjustments to enhance the precision and efficiency of uncertainty estimation in HRV spectrum analysis.
This section delves into the application of low-rank MC technique for handling sparse matrices. The core principle revolves around identifying an effective approximation for the matrix that aligns with certain predefined cost functions. One common approach involves seeking an optimal approximation that minimises the matrix rank. The section starts with a revisit of low-rank MC, followed by the matrix approximation with the interested zone, and then the introduction of the RMC approach.
Assuming matrix H∈RNs×Nk is sparse, i.e., only a limited number of entries in the matrix are non-zero, we denote the set Ω as the collection of non-zero entries of the matrix H, which is represented as
Ω={(i,k)|hi,k≠0,i=1,…Ns,k=1,…,Nk}. | (3.1) |
The spectra H, derived from segmenting HRV data sequences, often contain redundant information due to the recursive slicing approach used to generate these segments. This redundancy suggests that not all information within the matrix is equally valuable for analysis. To address this, we employ a low-rank matrix completion technique, which is particularly adept at preserving essential information while minimising redundancy [17,26]. It can be formulated by approximating matrix H in terms of the smallest rank, i.e.,
minrank(X)s.t.xi,k=hi,k,for(i,k)∈Ω, | (3.2) |
where matrix X={xi,k}i=1,…,Ns,k=1,…,Nk∈RNs×Nk is an optimal approximation matrix that can keep the majority information of H.
The optimisation of rank-minimization problem in Eq (3.2) is generally computationally intractable. Instead of solving the original problem of the rank norm, a convex relaxation based on the nuclear norm is given by [17],
min||X||∗s.t.xi,k=hi,k,for(i,k)∈Ω, | (3.3) |
where, ||⋅||∗ indicates the nuclear norm of matrix X, which is defined as the summation of its singular values. In particular, it is demonstrated that a smaller nuclear norm indicates the lower rank of the matrix [17]. However, it is generally too harsh to force the estimation of all observed entries in the matrix, so alternatively, a more practical approximation of Eq (3.3) can be formulated by relaxing the constraint [27,28].
min||X||∗s.t.||AΩ,H(X)−AΩ,H(H)||F≤δ, | (3.4) |
where ||⋅||F denotes the Frobenius norm, δ is a small value to constrain the estimation, and AΩ,H(⋅) is a projection operator that is defined as
AΩ,H(X)={xi,k=hi,k(i,k)∈Ω0otherwise. | (3.5) |
It is noted that Eq (3.4) can be formulated as a penalised least-squares convex program [29], which can be solved by the singular value thresholding (SVT) technique as described in [30].
The method previously outlined operates under the assumption that the observed entries, including those missing, are random variables that follow a uniform distribution [17]. This assumption does not fully account for the effects of measurement uncertainties on different frequency components of the data, which often deviate from a uniform distribution pattern [16]. Therefore, the traditional low-rank MC techniques mentioned above could not be applied directly. Alternatively, the interest-zone matrix-approximation (IZMA) method [26] is proposed to solve the low-rank MC problem when some columns/sub-columns or rows/sub-rows are missing. More specifically, the IZMA tries to solve the following optimisation problem,
min||AΩ,H(X)−AΩ,H(H)||Fs.t.p(X)≤0, | (3.6) |
where the nonlinear mapping p(⋅):RNs×Nk→Rn is a smooth function for some n∈N. For example, p(X) can be some constraints with respect to the estimated matrix, such as p(X)=||X||2−λ,λ>0. Another example of this nonlinear mapping is p(X)=||X||∗−λ for some λ>0, as described in [26]. The optimisation problem in Eq (3.6) can be solved by the following steps.
It first approximates the following optimisation for the constrained full matrix,
min||X−H||Fs.t.p(X)≤0. | (3.7) |
The solution to the optimisation in Eq (3.7) with the spectral norm constraint can be estimated by
{˜X=U˜ΣV∗,˜Σ=diag(˜σi), | (3.8) |
where matrices U and V are calculated by H=UΣV∗, with Σ=diag(σ1,σ2,⋯,σn). Next, the diagonal matrix is obtained as ˜σi=min(σi,λ), i=1,2,⋯,k, and the approximation ˜X can be obtained by the data reconstruction.
Then, the estimation of matrix H with entries in the interested zone can be updated as follows,
H(X):=I(˜X)−AΩ,H(˜X)+AΩ,H(H), | (3.9) |
where I is an identity operator, and H(⋅) denotes the updating process, which combines the known entries in matrix H and the estimated entries in matrix ˜X.
Next, the estimated matrix is replaced with H(X) and the iteration is repeated starting from Eq (3.7). The estimation is evaluated by ϵ(X)≜||AΩ,H(X)−AΩ,H(H)||F, and missing entries in matrix H can be approximated until ϵ(X) converges. For convenience, we denote the whole iteration process of the constrained approximation as T(H,λ), which will be used in the next section for the HRV spectrum estimation.
It is noted that the matrix approximation in Eq (3.6) is different from the low-rank MC as presented in Eq (3.2), where the low-rank MC requires fixing the known entries for the estimation, while the matrix approximation in Eq (3.6) minimises the objective function with Frobenius norm to the constraint function. In particular, the technique of matrix approximation with an interested zone can also be extended as matrix completion, which develops a new iteration process separate from the traditional SVT technique; more details of the iteration process can be found in [26] and will also be discussed in Section 3.3.
The process of approximating missing entries in this context is driven by the data itself through an iterative approach. It is important to acknowledge that in HRV data analysis, certain mathematical models can be leveraged to represent the spectrum effectively. One such model, the Gaussian model [31], has been highlighted for its capability to accurately characterise HRV spectra and elucidate the relationships between various spectrum components. In the subsequent subsection, we will apply the Gaussian model to refine the RMC process by constructing a new matrix that possesses significantly reduced dimensions. This approach is anticipated to not only enhance the efficiency of the estimation process, but also to improve its overall performance by providing a more focused and computationally manageable framework for addressing missing data within the HRV spectrum analysis.
The HRV data spectrum is characterised by distinct frequency bands, each with unique physiological implications. As illustrated in Figure 1, the HF band captures the influence of respiration on heart rate, a phenomenon known as respiratory sinus arrhythmia (RSA); conversely, the LF band encompasses oscillations associated with blood pressure regulation and vasomotor tone, often referred to as Mayer waves. These distinctions provide a foundation for modelling the HRV spectrum, allowing for a nuanced understanding of the underlying physiological processes. With these considerations, we model the spectrum of HRV data as follows.
We assume the matrix S={si,ℓ}i=1,…,nr,ℓ=1,…,nfr∈Rnr×nfr consists of nr spectra that are derived from the HRV data, where si=[si,1,⋯,si,nfr]T∈Rnfr is a vector of the spectrum. In particular, for each si∈S, it can be modelled with Gaussian functions [31], and the entry si,ℓ can be represented as,
smi,ℓ(f,Ai,ℓ,σi,ℓ,fi,ℓ)=Ai,ℓ√2πσi,ℓexp(−(f−fi,ℓ)22σ2i,ℓ), | (3.10) |
where f∈[fl,i,fr,i] indicates the interval of the frequency band of interests, i.e., LF or HF spectrum. Consequently, Ai,ℓ>0 is the weight of amplitude, fi,ℓ>0 is the peak position of frequencies in the spectrum, and σi,ℓ>0 is its standard deviation for this frequency. For simplicity of notations, smi,ℓ is used to represent smi,ℓ(f,Ai,ℓ,σi,ℓ,fi,ℓ) when no confusion arises, and ˜smi={smi,ℓ}ℓ=1,…,nfr indicates the modelled spectrum.
It is noted that the HRV spectrum can be characterised with different frequency components, and they can be simulated by changing the frequency intervals of the modelling, after which the complete spectrum can be obtained by combining these different frequency components. Next, we use the modelled information to define a new matrix for the estimation of missing entries. For the convenience of presentation, it is assumed that some entries of matrix S are missing, and the set Sm={sj}j=1,…,nq contains the rows in S with missing values.
For any sj∈Sm, we model the spectrum and obtain ˜smj with Eq (3.10), and then calculate the correlation between ˜smj and ˜smq spectra, where ˜smq is the modelled information of spectrum in S, and the correlation is computed as follows,
γ(smj,ℓt,smq,ℓt)=N1∑ℓt=1(smj,ℓt−ˉsmj)(smq,ℓt−ˉsmq)√N1∑ℓt=1(smj,ℓt−ˉsmj)2√N1∑ℓt=1(smq,ℓt−ˉsmq)2, | (3.11) |
where ˉsmj=1N1∑N1ℓt=1smj,ℓt and ˉsmq=1N1∑N1ℓt=1smq,ℓt are means of the two spectra, respectively, ℓt=1,…,N1, N1≤nfr is the index of known components of the spectrum, and γj,q=γ(smj,ℓt,smq,ℓt) indicates the relationship between the two spectra.
After calculating γj,q for the sj spectrum, a new vector tj=[γj,1⋯γj,nr−1] can be obtained, we sort tj and select the K highest-ranked elements for sj, and then a set that contains the identified indices can be formed as,
GK(sj)={gk|tj,gk∈tj, gk∈J, k=1,…,K}, | (3.12) |
where J=1,…,nr−1 denotes the index of spectrum in matrix S.
Next, a refined matrix SGK∈RK×nfr can be formulated by combining sj and the identified HRV spectra. Consequently, the optimisation in Eq (3.6) can be updated as
min||A˜Ω,SGK(XGK)−A˜Ω,SGK(SGK)||Fs.t.p(XGK)≤0, | (3.13) |
where ˜Ω is the set of observed entries in the refined matrix, A˜Ω,SGK(⋅) is the updated operator to model the missing values, XGK indicates the estimated matrix, and the spectral norm p(XGK)=||XGK||2−λ can be used for the constraint.
For the estimation of missing entries in the refined matrix, we first approximate the following constrained full matrix,
min||XGK−SGK||Fs.t.p(XGK)≤0, | (3.14) |
where the soft imputing method as described in Eq (3.8) can be used to solve the estimation in Eq (3.14). Then, by updating the parameter λ in the constraint function, missing entries in the refined matrix can be estimated using the iteration process as described in Section 3.2. Algorithm 1 presents the pseudo code of the proposed RMC method. The hyperparameters tolerance etol and λtol are set as 10-8 and 10 according to the configuration in [26].
Algorithm 1: Estimation of HRV spectrum using our developed RMC method. |
Input: Matrix S∈Rnr×nfr, AΩ,H, Sm, |
tolerance etol and λtol. |
Output: Estimated spectrum matrix X. |
1 Initialisation: sj∈Sm, λ←0, λprev←106; |
2 for i←1 to nr−1 do |
![]() |
5 end |
6 Sort γ(smi,ℓt,smj,ℓt) and obtain GK(sj); |
7 Update S←SGK, and AΩ,H←A˜Ω,GK; |
8 Assign S←AΩ,HS, λmin←0, λmax←||S||∗; |
9 while ϵ(X)>etol or |λ−λprev|>λtol do |
![]() |
19 end |
20 Output the estimated matrix X. |
As an illustration, Figure 1 demonstrates the concept of estimating HRV spectrum using the RMC technique, where the matrix is constructed with m spectra that are derived from HRV data segments. The x-axis of the matrix indicates the frequency band of interests, and y-axis shows the indices of data segments. Suppose the HF band of s3 spectrum is affected by measurement uncertainties, which will be estimated using the LF band. By using the RMC model, we model the spectra in the matrix, such as the Mayer wave and the RSA component and use the LF spectrum as a reference to identify relevant elements in the matrix. Then, these identified spectra, such as vectors s2, s5, and sm, along with s3 form a new matrix, where the HF band of s3 can be estimated using the RMC method described in the above iteration procedures.
The developed RMC method for HRV spectrum estimation is used to analyse a variety of datasets, as discussed in the next section. To evaluate the model performance on spectrum estimation across different subjects and datasets, we use the normalised root mean square error (NRMSE) as the performance indicator. For the estimation of missing entries in the sj spectrum, the NRMSE index can be calculated as follows [23],
NRMSE=√1Nf∑Nfk=1(ˆsj,k−sj,k)21Nf−1∑Nfk=1(sj,k−ˉsj)2, | (3.15) |
where, sj,k∈S is the original spectrum, j=1,2,⋯,nr is the index of the spectrum, and k = 1,2,⋯,Nf indicates the data points of missing entries in the spectrum. ˆsj,k is the estimated spectrum, and ˉsj=1Nf∑Nfk=1sj,k is the mean value of the spectrum. The calculation of this index normalises the root mean square error with the standard deviation of the spectrum, which enables to evaluate the model performance across different ECG datasets using an indicator with the same scale.
Generally, HRV data can be derived from ECG signals, and in the current study, we retrieved five widely used benchmark ECG datasets to evaluate the performance of our developed RMC model. These datasets are summarised as follows.
(ⅰ) Combined measurement of ECG, breathing and seismocardiograms (CEBSDB) [32]: This dataset consists of ECG recordings sampled from 20 healthy volunteers. After excluding a low-quality recording ('m018') due to the issue of electrode contact, the ECG signals collected from 19 subjects were used for the study, which have a sampling duration of approximately 50 minutes.
(ⅱ) Supraventricular arrhythmia database (SADB) [33]: This dataset includes 78 ECG recordings sampled from subjects with supraventricular arrhythmias, and each signal has a recording time duration of 30 minutes.
(ⅲ) St Petersburg INCART 12-lead Arrhythmia Database (SPIADB) [34]: This dataset consists of 75 ECG recordings with a duration of 30 minutes, which were collected from patients undergoing tests for coronary artery disease.
(ⅳ) European ST-T Database (ESTDB) [35]: This database is widely used for the evaluation of ST and T-wave changes in the ECG morphologies, and includes 90 annotated excerpts of ambulatory ECG recordings with two-hour sampling durations.
(ⅴ) Apnea-ECG Database (APEDB) [36]: This data consists of 70 ECG recordings with a set of apnea annotations, and each of the recordings has an approximate duration of 7 to 10 hours.
These datasets include ECG recordings with diverse characteristics, such as data sampled from healthy volunteers, patients with arrhythmias, coronary artery disease, ST and T-wave changes, and apnea. Therefore, the five ECG datasets will generate different types of HRV data, and in combination allow a comprehensive evaluation of our developed method for spectrum estimation. It is noted that the ECG signals in these datasets have varied lengths of sampling durations, and we use the first hour of the signal only in instances of a long-time sampling duration.
To generate HRV data for this study, we first identify R-peaks from these ECG recordings by analysing the QRS complex. Figure 2(a) shows the identified R-peaks in the ECG recording using the QRS detector as developed in [37]. After identifying the R-peaks, a sequence of RR intervals can be calculated from timestamps of the consecutive peak values. Next, we implement outlier detection in the derived RR intervals using an open-source benchmarked toolbox [38], where the outliers are defined as data points that are too close together, or large RR intervals caused by gaps, or artefact annotations of the signal. Figure 2(c) illustrates the detected outliers, which are processed by removal and interpolated to generate the preprocessed sequence.
We note that although a variety of techniques have been developed to detect the R-peaks [39,40], it is still challenging to accurately identify the peak values for different types of ECG recordings. For example, as shown in Figure 2(a), the R-peaks are correctly identified for the ECG signal that is sampled from a healthy subject. However, as shown in Figure 2(b), the R peaks in waves A and B are wrongly identified for the ECG signal that is collected from an abnormal subject. The comparison of R-peak detection indicates that it may not always be reliable to use the technique to identify peak values for different types of ECGs.
To obtain high-quality HRV data, this study utilises two widely recognised detectors for identifying peak values: the 'jqrs' detector [37] and the 'wqrs' detector [40]. Subsequently, a signal quality index (SQI) is computed for each detected R-peak by contrasting annotations from both detectors [41]. Following this, the average SQI value of R-peaks within 5-minute HRV data segments is calculated, serving as a quality indicator for each segment. For instance, a segment is deemed high-quality if its SQI indicator exceeds 0.9; otherwise, it is considered low-quality. As illustrated in Figure 2(d), two data segments exhibit SQI values of 0.813 and 0.737, categorising them as low-quality. By calculating the average SQI values, the low-quality and high-quality data segments can be efficiently identified, and the low-quality segments will not be used for further analysis.
The preprocessed HRV data is then resampled with a frequency of 4 Hz, and a 4th order Butterworth filter is used to remove noises with the pass band of 0.03 to 0.9 Hz [42]. The filtered data is used to derive HRV spectrum in the frequency domain, and Welch's algorithm with an overlap of 50% is used for the calculation [42]. Next, the HRV spectrum matrix can be obtained by stacking all the calculated spectra. Figure 3(a) shows the three-dimensional plots of PSD spectra that are derived from different data segments. It can be seen from Figure 3(a) that the PSD spectra have similar patterns in the waveforms, i.e., LF and HF components, indicating the low-rank property of the HRV matrix derived from the PSD spectra.
In a further step, we calculate the singular values of the derived HRV matrix, and compute the ratio between singular values and the nuclear norm of the spectrum matrix, which is used as an indicator of the rank property of the matrix [23]. As shown in Figure 3(b), we demonstrate the low-rank property of the HRV data by calculating the cumulative ratios of singular values. It can be seen from Figure 3(b) that the summation of the top ten singular values accounts for approximately 80% of the nuclear norm, indicating the low-rank properties of the HRV spectrum matrices that are derived from these five ECG datasets.
The derived HRV matrix is then used to evaluate the performance of our developed model for uncertainty estimation. We note that the uncertainties or distortions of HRV spectrum can result from many different factors, such as noises, artefacts, missing values, and a variety of computational approaches (e.g., QRS detection and data interpolation). Without loss of generality, we simulate the uncertainties as missing values in the HRV spectrum, and evaluate the performance of our developed method for the uncertainties estimation. As an example in Figure 4, we show the estimation of uncertainty in HRV spectrum, which is represented as missing values and is mostly in the HF of the spectrum. Then, we use the rest of the spectrum (i.e., LF band) to estimate the HF band. It can be seen from Figure 4 that the estimated HF spectrum matches well with the original PSD spectrum, which efficiently estimated the waveform of the HF in the spectrum, but the estimation with the MC model introduced large variations around the frequency of 0.25 Hz.
We use the developed RMC method to estimate uncertainties in the spectrum. We simulate the spectrum with the model described in Eq (3.10), and then use the modelled signal to identify relevant data segments in the HRV matrix. As an illustration in Figure 4, we show the signal modelling for the PSD spectrum of 31st data segment of the matrix. It can be seen that the modelled signal efficiently represents the characteristics of the spectrum, such as the LF and HF.
Next, we calculate correlation coefficients between different data segments using the modelled signals, and select the data with high coefficients to reconstruct the HRV matrix. Figure 5(a) shows the calculated coefficients for segments in the original data matrix, and Figure 5(b) illustrates estimation errors for different combinations of selected data segments. It can be seen from Figure 5(b) that the RMC model obtains the least estimation error of 0.222 when using nine data segments, and it is smaller than the estimation error of 0.830 using the traditional MC method. As shown in Figure 5(a), we highlight the selected data segments with square markers. It can be seen from Figure 5(a) that the selected important data elements include five neighbours of the target data segment; this is consistent with the rational approach that missing values can be generally imputed using nearest neighbours. We also note that the model identifies four segments as important data, which are not the neighbours of the target segment, indicating that some data with far distances in the sequence may also provide valuable information for the estimation of missing entries.
In a further step, we evaluate the model performance on uncertainty estimation in different conditions, and we apply different masking ratios on the HRV spectrum. As shown in Figures 6 and 7, our developed RMC method efficiently estimates missing values for the PSD spectra with masking ratios of 30% (Figure 6(g)), and 70% (Figure 7(g)), which have estimation errors of 0.221 and 0.426 separately. The two experiments show the robustness of our developed RMC method for uncertainty estimation in the HRV spectrum.
The current study focused on estimating uncertainties of the HRV spectrum, and we compared our developed methods with different types of regression machine learning models; in particular, the deep recurrent neural networks have been demonstrated with excellent performance on prediction or regression analysis. To be specific, we used the gated recurrent units (GRU), the long short-term memory (LSTM) model, and the bidirectional LSTM (BiLSTM) for the comparison study. We note that convolutional neural networks (CNN) have shown efficiency in extracting features from data sequences [43]. Therefore, other than the GRU, LSTM, and BiLSTM, we used two hybrid models for the comparison study.
For the three types of recurrent neural networks, i.e., the GRU, LSTM, and BiLSTM, they have 120 hidden units for the regression analysis [44]. For the two types of hybrid models, the hyperparameters are tuned with trial-and-error search; the first one (CNN_LSTM_3) uses three 1D-CNN layers with 32, 64, and 64 filters, each filter with a size of 3 [45]. Each convolutional layer is followed by a rectified linear unit (ReLU) function, and a bath normalisation layer, which is then flattened and connected with a LSTM layer for the regression analysis. The second type of hybrid models (CNN_LSTM_5) consists of five CNN layers, which have 32, 32, 64, 64, and 64 filters, and also uses a LSTM layer for the regression analysis. The deep learning models are trained with Adam optimizer with an initial learning rate of 0.005, the maximum epoch size is 100, and the batch size is set as 20. For the prediction analysis, we use data points from previous steps, i.e., 10 steps [46], to predict the current value in the data sequence.
The estimation results of these regression models are demonstrated in Figures 6 and 7, which correspond to the masking ratios of 30% and 70%, respectively. As a comparison study, we also include the estimation results from the traditional MC method. It can be seen from Figures 6 and 7 that these machine learning models have small errors on the spectrum estimation; in particular, the BiLSTM and hybrid models efficiently estimate the peak values around 0.3 Hz for the two masking ratios. Compared with the deep learning models and the MC method, our developed RMC method has more efficient estimation performance, which obtains the least estimation errors of 0.221 and 0.426 for the 30% and 70% masking ratios, respectively.
We implement the estimation of missing entries for all ECG recordings in the five benchmark datasets using our developed RMC method and the regression models. Tables 1–3 show the mean value and standard deviation of estimation errors for the five benchmark datasets with 30%, 50%, and 70% masking ratios. The minimum estimation error for each dataset is bold-faced. It can be seen from Tables 1–3 that, compared with the five machine learning models and the traditional MC method, our developed RMC method obtains the least estimation error for all of the five ECG datasets, and the increased masks ratios occurred alongside an increase in the estimation errors, which have the values of 0.286 ± 0.194, 0.312 ± 0.203, and 0.379 ± 0.228 for the CEBSDB with the three masking ratios.
Datasets | GRU | LSTM | BiLSTM | CNN_LSTM_3 | CNN_LSTM_5 | MC | RMC |
CEBSDB | 1.379 ± 1.112 | 1.346 ± 0.970 | 0.671 ± 0.355 | 0.672 ± 0.459 | 0.770 ± 0.692 | 0.393 ± 0.291 | 0.286 ± 0.194 |
SADB | 1.389 ± 1.751 | 1.318 ± 1.280 | 0.733 ± 0.901 | 0.746 ± 0.902 | 0.902 ± 1.808 | 0.332 ± 0.253 | 0.304 ± 0.199 |
SPIADB | 1.241 ± 0.776 | 1.211 ± 0.609 | 0.647 ± 0.289 | 0.655 ± 0.375 | 0.709 ± 0.378 | 0.361 ± 0.539 | 0.305 ± 0.327 |
ESTDB | 1.317 ± 1.096 | 1.297 ± 1.043 | 0.687 ± 0.430 | 0.691 ± 0.456 | 0.753 ± 0.549 | 0.494 ± 0.464 | 0.329 ± 0.251 |
APEADB | 1.283 ± 0.740 | 1.268 ± 0.646 | 0.644 ± 0.263 | 0.663 ± 0.337 | 0.727 ± 0.392 | 0.426 ± 0.357 | 0.290 ± 0.228 |
Datasets | GRU | LSTM | BiLSTM | CNN_LSTM_3 | CNN_LSTM_5 | MC | RMC |
CEBSDB | 1.061 ± 0.234 | 1.091 ± 0.275 | 0.621 ± 0.163 | 0.619 ± 0.164 | 0.663 ± 0.175 | 0.432 ± 0.192 | 0.312 ± 0.203 |
SADB | 1.067 ± 0.149 | 1.093 ± 0.190 | 0.670 ± 0.153 | 0.673 ± 0.129 | 0.714 ± 0.136 | 0.386 ± 0.254 | 0.338 ± 0.226 |
SPIADB | 1.076 ± 0.168 | 1.097 ± 0.201 | 0.657 ± 0.150 | 0.662 ± 0.115 | 0.706 ± 0.128 | 0.388 ± 0.529 | 0.310 ± 0.290 |
ESTDB | 1.113 ± 0.707 | 1.138 ± 0.692 | 0.680 ± 0.306 | 0.704 ± 0.427 | 0.745 ± 0.443 | 0.590 ± 0.446 | 0.336 ± 0.276 |
APEADB | 1.059 ± 0.292 | 1.085 ± 0.334 | 0.642 ± 0.216 | 0.648 ± 0.168 | 0.687 ± 0.175 | 0.547 ± 0.422 | 0.312 ± 0.256 |
Datasets | GRU | LSTM | BiLSTM | CNN_LSTM_3 | CNN_LSTM_5 | MC | RMC |
CEBSDB | 1.350 ± 0.591 | 1.446 ± 0.676 | 0.926 ± 0.563 | 0.906 ± 0.451 | 1.030 ± 0.622 | 0.633 ± 0.202 | 0.379 ± 0.228 |
SADB | 1.131 ± 0.361 | 1.247 ± 0.422 | 0.983 ± 0.652 | 0.819 ± 0.238 | 0.873 ± 0.284 | 0.539 ± 0.318 | 0.397 ± 0.262 |
SPIADB | 1.179 ± 0.398 | 1.325 ± 0.544 | 1.019 ± 0.674 | 0.828 ± 0.230 | 0.890 ± 0.308 | 0.502 ± 0.406 | 0.338 ± 0.247 |
ESTDB | 1.193 ± 0.418 | 1.335 ± 0.543 | 1.028 ± 0.743 | 0.849 ± 0.304 | 0.903 ± 0.369 | 0.728 ± 0.356 | 0.337 ± 0.252 |
APEADB | 1.144 ± 0.368 | 1.266 ± 0.470 | 0.958 ± 0.689 | 0.811 ± 0.256 | 0.864 ± 0.315 | 0.694 ± 0.409 | 0.315 ± 0.246 |
Additionally, we provide violin plots as shown in Figure 8 to demonstrate the distributions of estimation errors for the seven models. It can be seen from Figure 8 that our developed RMC method obtains the least estimation errors with median values of 0.330, 0.337, 0.260, 0.259, and 0.235 for the five benchmark datasets. In particular, we note that compared with the five machine learning models and the traditional MC method, the RMC method has much smaller variations of the estimation errors for all the ECG datasets, which indicate the stability and robustness of our developed RMC method for the spectrum estimation.
We calculate the computation cost for missing entry estimation using the RMC method and the other six methods. All of the estimation tasks are implemented with Matlab R2022a, Intel(R) Core(TM) i7-1165G7@2.8 GHz, and 32 GB RAM. We calculate the mean value and standard deviation of computation time for all HRV data segments in the benchmark dataset. Table 4 presents the comparison of computation cost for all the estimation methods. We note that the five deep learning models have an increasing trend of computation cost with the increasing model complexity. From Table 4 it can be seen that among these deep learning models, the GRU model has the least computation time with 1.117 ± 0.460 s. Compared with the deep learning models and the MC method, our developed RMC method has the least computation time with 0.105 ± 0.096 s. The results demonstrate the efficiency of our developed RMC method for HRV spectrum estimation.
Methods | Computation cost (s) |
GRU | 1.117 ± 0.460 |
LSTM | 1.357 ± 0.704 |
BiLSTM | 2.298 ± 2.091 |
CNN_LSTM_3 | 2.484 ± 2.218 |
CNN_LSTM_5 | 2.889 ± 0.909 |
MC | 0.669 ± 0.340 |
RMC | 0.105 ± 0.096 |
Spectrum metrics of HRV data are important indicators for monitoring physiological and pathological conditions. However, HRV data are sensitive to a variety of uncertainties in the measurements, such as motion artefacts and missing values. Although numerous methods have been developed to address uncertainties of HRV analysis, it is difficult to remove artefacts without affecting the signal waveforms. We propose to estimate uncertainties of HRV data from a new view in the spectrum analysis; in particular, we consider the characteristics of LF and HF bands of HRV spectrum. With extensive experimental studies, we show that the PSD data demonstrate similarities between spectrum waveforms, indicating the low-rank property of the spectrum matrix, which enables us to use the MC technique for the uncertainty estimation.
A variety of computational techniques have been developed for missing value estimation, for example, different types of interpolation methods [47] and machine learning models [48], including linear interpolation, k-nearest neighbour (kNN), multiple imputation by chained equation (MICE), neural networks, and random forest. However, we note that deep neural networks (DNNs), particularly with recent advancements, have shown superior performance in estimating sequential data [49]. Therefore, we compared our proposed RMC method with different types of DNNs. We also acknowledge more advanced predictive models that have been developed recently, such as Transformer [50], Informer [51], and TimesNet [52], which have not been used for HRV uncertainty estimation, motivating us to implement more comprehensive comparisons in our future study.
It is a challenging task of estimating missing entries for sparse matrices. We note the performance of low-rank MC method in handling sparse matrices, which has been demonstrated in previous literature [53,54,55]. In this study, we proposed a refined version of the traditional MC method, and our developed RMC method was used for HRV uncertainty estimation. The results indicated that compared with different types of DNN models, our RMC method demonstrated promising performance across all benchmark datasets. The improved model performance can be attributed to the incorporation of specific characteristics of the HRV spectrum into our modelling approach. For instance, the LF of the HRV spectrum exhibits a peaked Mayer waveform, while the HF displays a dominant RSA waveform. These peaked waveforms, observed within a limited number of data points, can pose challenges for DNN models in uncertainty estimation, and therefore lead to reduced performance.
There are many different types of uncertainties in the analysis of HRV data, such as measurement noises and artefacts in data acquisition and computational uncertainties in data processing (e.g., data interpolation, and data resampling). Without loss of generality, we simulate the uncertainties of HRV spectrum as missing values, which can also be used to indicate the distorted entries affected by motion artefacts or data interpolation. We show the advantages and robustness of our developed RMC method for uncertainty estimation on a range of benchmark datasets. To perform a fair comparison, the NRMSE indicator was used to evaluate the performance of different models on five ECG datasets; we will explore other metrics for evaluating different types of uncertainties in our future studies. In addition, we note that HRV spectrum is derived from NN intervals of ECG signals. However, NN interval outliers, caused by missed or false beat detections, could impact the HRV spectrum. Our next step research aims to thoroughly investigate these uncertainties and develop robust MC with other advanced techniques, such as ℓp minimisation [27] and adaptive weighting strategy [56] to mitigate the effects of outliers.
This paper presents a new approach to address the challenges of uncertainty estimation in the HRV spectrum using the low-rank MC method. In particular, we propose a new RMC method that takes into account the characteristics of frequency components in the HRV spectrum for more accurate uncertainty modelling. We evaluate the performance of our developed RMC method on five benchmark datasets with varying masking ratios. To assess its effectiveness, we compare our method with five deep regression models and the traditional MC method. The results demonstrate that our RMC method achieves the lowest estimation error while maintaining the minimal computational cost, which highlights the advantages and efficiency of our developed model for HRV spectrum estimation.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
DAC was supported by the Pandemic Sciences Institute at the University of Oxford; the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC); an NIHR Research Professorship; a Royal Academy of Engineering Research Chair; and the InnoHK Hong Kong Centre for Centre for Cerebro-cardiovascular Engineering (COCHE). TZ was supported by the Royal Academy of Engineering under the Research Fellowship scheme. This research was funded in whole, or in part, by the Wellcome Trust (217650/Z/19/Z). For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.
The authors declare there are no conflicts of interest.
[1] |
Kaper JB, Nataro JP, Mobley HLT (2004) Pathogenic Escherichia coli. Nat Rev Microbiol 2: 123-140. https://doi.org/10.1038/nrmicro818 ![]() |
[2] |
Brzuszkiewicz E, Gottschalk G, Ron E, et al. (2009) Adaptation of pathogenic E. coli to various niches: genome flexibility is the key. Microbial Pathogenomics . Basel: KARGER 110-125. https://doi.org/10.1159/000235766 ![]() |
[3] |
EFSA BIOHAZ Panel, Koutsoumanis K, Allende A, et al. (2020) Pathogenicity assessment of Shiga toxin-producing Escherichia coli (STEC) and the public health risk posed by contamination of food with STEC. EFSA J 18: e05967. https://doi.org/10.2903/j.efsa.2020.5967 ![]() |
[4] |
Lee MS, Koo S, Jeong D, et al. (2016) Shiga toxins as multi-functional proteins: induction of host cellular stress responses, role in pathogenesis and therapeutic applications. Toxins (Basel) 8: 77. https://doi.org/10.3390/toxins8030077 ![]() |
[5] | Melton-Celsa AR (2014) Shiga toxin (Stx) classification, structure, and function. Microbiol Spectr 2. https://doi.org/10.1128/microbiolspec.EHEC-0024-2013 |
[6] | Krüger A, Lucchesi PMA (2015) Shiga toxins and stx phages: Highly diverse entities. Microbiol (United Kingdom) 161: 451-462. https://doi.org/10.1099/mic.0.000003 |
[7] |
Pinto G, Sampaio M, Dias O, et al. (2021) Insights into the genome architecture and evolution of Shiga toxin encoding bacteriophages of Escherichia coli. BMC Genomics 22: 366. https://doi.org/10.1186/s12864-021-07685-0 ![]() |
[8] |
Mondal SI, Islam MR, Sawaguchi A, et al. (2016) Genes essential for the morphogenesis of the Shiga toxin 2-transducing phage from Escherichia coli O157:H7. Sci Rep 6: 39036. https://doi.org/10.1038/srep39036 ![]() |
[9] |
Llarena AK, Aspholm M, O'Sullivan K, et al. (2021) Replication region analysis reveals non-lambdoid Shiga toxin converting bacteriophages. Front Microbiol 12: 640945. https://doi.org/10.3389/fmicb.2021.640945 ![]() |
[10] |
Rodríguez-Rubio L, Haarmann N, Schwidder M, et al. (2021) Bacteriophages of Shiga toxin-producing Escherichia coli and their contribution to pathogenicity. Pathogens 10. https://doi.org/10.3390/pathogens10040404 ![]() |
[11] |
Smith DL, Rooks DJ, Fogg PC, et al. (2012) Comparative genomics of Shiga toxin encoding bacteriophages. BMC Genomics 13: 311. https://doi.org/10.1186/1471-2164-13-311 ![]() |
[12] |
Krüger A, Burgán J, Friedrich AW, et al. (2018) ArgO145, a Stx2a prophage of a bovine O145:H- STEC strain, is closely related to phages of virulent human strains. Infect Genet Evol 60: 126-132. https://doi.org/10.1016/j.meegid.2018.02.024 ![]() |
[13] |
Steyert SR, Sahl JW, Fraser CM, et al. (2012) Comparative genomics and stx phage characterization of LEE-negative Shiga toxin-producing Escherichia coli. Front Cell Infect Microbiol 2: 133. https://doi.org/10.3389/fcimb.2012.00133 ![]() |
[14] |
Schmidt H (2001) Shiga-toxin-converting bacteriophages. Res Microbiol 152: 687-695. https://doi.org/10.1016/s0923-2508(01)01249-9 ![]() |
[15] |
Wagner PL, Neely MN, Zhang X, et al. (2001) Role for a phage promoter in Shiga toxin 2 expression from a pathogenic Escherichia coli strain. J Bacteriol 183: 2081-2085. https://doi.org/10.1128/jb.183.6.2081-2085.2001 ![]() |
[16] |
Yin S, Rusconi B, Sanjar F, et al. (2015) Escherichia coli O157:H7 strains harbor at least three distinct sequence types of Shiga toxin 2a-converting phages. BMC Genomics 16: 733. https://doi.org/10.1186/s12864-015-1934-1 ![]() |
[17] |
Delannoy S, Mariani-Kurkdjian P, Webb HE, et al. (2017) The mobilome; a major contributor to Escherichia coli stx2-positive O26:H11 strains intra-serotype diversity. Front Microbiol 8: 1625. https://doi.org/10.3389/fmicb.2017.01625 ![]() |
[18] |
Nakamura K, Murase K, Sato MP, et al. (2020) Differential dynamics and impacts of prophages and plasmids on the pangenome and virulence factor repertoires of Shiga toxin-producing Escherichia coli O145:H28. Microb Genomics 6. https://doi.org/10.1099/mgen.0.000323 ![]() |
[19] |
Rasko DA, Rosovitz MJ, Myers GSA, et al. (2008) The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J Bacteriol 190: 6881-6893. https://doi.org/10.1128/JB.00619-08 ![]() |
[20] | Su LK, Lu CP, Wang Y, et al. (2010) Lysogenic infection of a Shiga toxin 2-converting bacteriophage changes host gene expression, enhances host acid resistance and motility. Mol Biol 44. https://doi.org/10.1134/s0026893310010085 |
[21] |
Holt GS, Lodge JK, McCarthy AJ, et al. (2017) Shigatoxin encoding Bacteriophage φ24B modulates bacterial metabolism to raise antimicrobial tolerance. Sci Rep 7: 40424. https://doi.org/10.1038/srep40424 ![]() |
[22] |
Veses-Garcia M, Liu X, Rigden DJ, et al. (2015) Transcriptomic Analysis of Shiga-Toxigenic Bacteriophage Carriage Reveals a Profound Regulatory Effect on Acid Resistance in Escherichia coli. Appl Environ Microbiol 81: 8118-8125. https://doi.org/10.1128/AEM.02034-15 ![]() |
[23] |
Berger P, Kouzel IU, Berger M, et al. (2019) Carriage of Shiga toxin phage profoundly affects Escherichia coli gene expression and carbon source utilization. BMC Genomics 20: 504. https://doi.org/10.1186/s12864-019-5892-x ![]() |
[24] |
Mitsunaka S, Sudo N, Sekine Y (2018) Lysogenisation of Shiga toxin-encoding bacteriophage represses cell motility. J Gen Appl Microbiol 64: 34-41. https://doi.org/10.2323/jgam.2017.05.001 ![]() |
[25] |
Saile N, Voigt A, Kessler S, et al. (2016) Escherichia coli O157:H7 strain EDL933 harbors multiple functional prophage-associated genes necessary for the utilization of 5- N -acetyl-9- O -acetyl neuraminic acid as a growth substrate. Appl Environ Microbiol 82: 5940-5950. https://doi.org/10.1128/AEM.01671-16 ![]() |
[26] |
Saile N, Schwarz L, Eißenberger K, et al. (2018) Growth advantage of Escherichia coli O104:H4 strains on 5- N -acetyl-9- O -acetyl neuraminic acid as a carbon source is dependent on heterogeneous phage-Borne nanS-p esterases. Int J Med Microbiol 308: 459-468. https://doi.org/10.1016/j.ijmm.2018.03.006 ![]() |
[27] |
McGinnis S, Madden TL (2004) BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res 32. https://doi.org/10.1093/nar/gkh435 ![]() |
[28] | Arndt D, Grant JR, Marcu A, et al. (2016) PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res 4. https://doi.org/10.1093/nar/gkw387 |
[29] |
Joensen KG, Scheutz F, Lund O, et al. (2014) Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli. J Clin Microbiol 52: 1501-1510. https://doi.org/10.1128/JCM.03617-13 ![]() |
[30] |
Siguier P, Perochon J, Lestrade L, et al. (2006) ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res 34: D32-D36. https://doi.org/10.1093/nar/gkj014 ![]() |
[31] |
Marchler-Bauer A, Derbyshire MK, Gonzales NR, et al. (2015) CDD: NCBI's conserved domain database. Nucleic Acids Res 43: D222-D226. https://doi.org/10.1093/nar/gku1221 ![]() |
[32] |
Jones P, Binns D, Chang HY, et al. (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30: 1236-1240. https://doi.org/10.1093/bioinformatics/btu031 ![]() |
[33] |
Saier MHJ, Tran C V, Barabote RD (2006) TCDB: the Transporter Classification Database for membrane transport protein analyses and information. Nucleic Acids Res 34: D181-D186. https://doi.org/10.1093/nar/gkj001 ![]() |
[34] |
Darling AE, Mau B, Perna NT (2010) progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5: e11147. https://doi.org/10.1371/journal.pone.0011147 ![]() |
[35] |
Kumar S, Stecher G, Tamura K (2016) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33: 1870-1874. https://doi.org/10.1093/molbev/msw054 ![]() |
[36] |
Sievers F, Wilm A, Dineen D, et al. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7: 539. https://doi.org/10.1038/msb.2011.75 ![]() |
[37] |
Burgán J, Krüger A, Lucchesi PMA (2020) Comparable stx2a expression and phage production levels between Shiga toxin-producing Escherichia coli strains from human and bovine origin. Zoonoses Public Health 67: 44-53. https://doi.org/10.1111/zph.12653 ![]() |
[38] |
de Sablet T, Bertin Y, Vareille M, et al. (2008) Differential expression of stx2 variants in Shiga toxin-producing Escherichia coli belonging to seropathotypes A and C. Microbiology 154: 176-186. https://doi.org/10.1099/mic.0.2007/009704-0 ![]() |
[39] |
Pfaffl MW (2001) A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res 29: e45. https://doi.org/10.1093/nar/29.9.e45 ![]() |
[40] |
Scheutz F, Teel LD, Beutin L, et al. (2012) Multicenter evaluation of a sequence-based protocol for subtyping Shiga toxins and standardizing Stx nomenclature. J Clin Microbiol 50: 2951-2963. https://doi.org/10.1128/JCM.00860-12 ![]() |
[41] |
Tyler JS, Beeri K, Reynolds JL, et al. (2013) Prophage induction is enhanced and required for renal disease and lethality in an EHEC mouse model. PLoS Pathog 9: e1003236. https://doi.org/10.1371/journal.ppat.1003236 ![]() |
[42] |
Balasubramanian S, Osburne MS, BrinJones H, et al. (2019) Prophage induction, but not production of phage particles, is required for lethal disease in a microbiome-replete murine model of enterohemorrhagic E. coli infection. PLoS Pathog 15: e1007494. https://doi.org/10.1371/journal.ppat.1007494 ![]() |
[43] | EFSA Panel BH (BIOHAZ).Scientific Opinion on VTEC-seropathotype and scientific criteria regarding pathogenicity assessment. EFSA J (2013) 11: 3138. https://doi.org/10.2903/j.efsa.2013.3138 |
[44] |
Fuller CA, Pellino CA, Flagler MJ, et al. (2011) Shiga toxin subtypes display dramatic differences in potency. Infect Immun 79: 1329-1337. https://doi.org/10.1128/IAI.01182-10 ![]() |
[45] |
Forde BM, McAllister LJ, Paton JC, et al. (2019) SMRT sequencing reveals differential patterns of methylation in two O111:H- STEC isolates from a hemolytic uremic syndrome outbreak in Australia. Sci Rep 9: 9436. https://doi.org/10.1038/s41598-019-45760-5 ![]() |
[46] |
Fagerlund A, Aspholm M, Węgrzyn G, et al. (2022) High diversity in the regulatory region of Shiga toxin encoding bacteriophages. BMC Genomics 23: 230. https://doi.org/10.1186/s12864-022-08428-5 ![]() |
[47] |
Shaaban S, Cowley LA, McAteer SP, et al. (2016) Evolution of a zoonotic pathogen: investigating prophage diversity in enterohaemorrhagic Escherichia coli O157 by long-read sequencing. Microb Genomics 2: e000096. https://doi.org/10.1099/mgen.0.000096 ![]() |
[48] |
Scheutz F (2014) Taxonomy Meets Public Health: The case of Shiga toxin-producing Escherichia coli. Microbiol Spectr 2. https://doi.org/10.1128/microbiolspec.EHEC-0019-2013 ![]() |
[49] |
Lejeune JT, Abedon ST, Takemura K, et al. (2004) Human Escherichia coli O157:H7 genetic marker in isolates of bovine origin. Emerg Infect Dis 10: 1482-1485. https://doi.org/10.3201/eid1008.030784 ![]() |
[50] |
Olavesen KK, Lindstedt BA, Løbersli I, et al. (2016) Expression of Shiga toxin 2 (Stx2) in highly virulent Stx-producing Escherichia coli (STEC) carrying different anti-terminator (q) genes. Microb Pathog 97: 1-8. https://doi.org/10.1016/j.micpath.2016.05.010 ![]() |
[51] |
Løbersli I, Haugum K, Lindstedt BA (2012) Rapid and high resolution genotyping of all Escherichia coli serotypes using 10 genomic repeat-containing loci. J Microbiol Methods 88: 134-139. https://doi.org/10.1016/j.mimet.2011.11.003 ![]() |
[52] |
Cahill J, Young R (2019) Phage lysis: multiple genes for multiple barriers. Adv Virus Res 103: 33-70. https://doi.org/10.1016/bs.aivir.2018.09.003 ![]() |
[53] |
Pang T, Savva C, Fleming K, et al. (2009) Structure of the lethal phage pinhole. Proc Natl Acad Sci U S A 106: 18966-18971. https://doi.org/10.1073/pnas.0907941106 ![]() |
[54] |
Nübling S, Eisele T, Stöber H, et al. (2014) Bacteriophage 933W encodes a functional esterase downstream of the Shiga toxin 2a operon. Int J Med Microbiol 304: 269-274. https://doi.org/10.1016/j.ijmm.2013.10.008 ![]() |
[55] |
Unkmeir A, Schmidt H (2000) Structural analysis of phage-borne stx genes and their flanking sequences in Shiga toxin-producing Escherichia coli and Shigella dysenteriae Type 1 Strains. Infect Immun 68: 4856-4864. https://doi.org/10.1128/iai.68.9.4856-4864.2000 ![]() |
[56] |
Vimr ER (2013) Unified theory of bacterial sialometabolism: how and why bacteria metabolize host sialic acids. ISRN Microbiol 2013: 1-26. https://doi.org/10.1155/2013/816713 ![]() |
[57] |
Amigo N, Zhang Q, Amadio A, et al. (2016) Overexpressed proteins in hypervirulent clade 8 and clade 6 Strains of Escherichia coli O157:H7 compared to E. coli O157:H7 EDL933 clade 3 Strain. PLoS One 11: e0166883. https://doi.org/10.1371/journal.pone.0166883 ![]() |
[58] |
Polzin S, Huber C, Eylert T, et al. (2013) Growth media simulating ileal and colonic environments affect the intracellular proteome and carbon fluxes of Enterohemorrhagic Escherichia coli O157:H7 strain EDL933. Appl Environ Microbiol 79: 3703-3715. https://doi.org/10.1128/AEM.00062-13 ![]() |
[59] |
Herold S, Siebert J, Huber A, et al. (2005) Global expression of prophage genes in Escherichia coli O157:H7 strain EDL933 in response to norfloxacin. Antimicrob Agents Chemother 49: 931-944. https://doi.org/10.1128/aac.49.3.931-944.2005 ![]() |
[60] |
Feuerbaum S, Saile N, Pohlentz G, et al. (2018) De-O-Acetylation of mucin-derived sialic acids by recombinant NanS-p esterases of Escherichia coli O157:H7 strain EDL933. Int J Med Microbiol 308: 1113-1120. https://doi.org/10.1016/j.ijmm.2018.10.001 ![]() |
[61] |
Franke B, Veses-Garcia M, Diederichs K, et al. (2020) Structural annotation of the conserved carbohydrate esterase vb_24B_21 from Shiga toxin-encoding bacteriophage Φ24B. J Struct Biol 212: 107596. https://doi.org/10.1016/j.jsb.2020.107596 ![]() |
[62] | Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8: 275-282. https://doi.org/10.1093/bioinformatics/8.3.275 |
[63] |
Kumar S, Stecher G, Li M, et al. (2018) MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol 35: 1547-1549. https://doi.org/10.1093/molbev/msy096 ![]() |
[64] | Schwartz RM (1978) Matrices for detecting distant relationships. Atlas protein Seq Struct : 353-359. |
![]() |
![]() |
Datasets | GRU | LSTM | BiLSTM | CNN_LSTM_3 | CNN_LSTM_5 | MC | RMC |
CEBSDB | 1.379 ± 1.112 | 1.346 ± 0.970 | 0.671 ± 0.355 | 0.672 ± 0.459 | 0.770 ± 0.692 | 0.393 ± 0.291 | 0.286 ± 0.194 |
SADB | 1.389 ± 1.751 | 1.318 ± 1.280 | 0.733 ± 0.901 | 0.746 ± 0.902 | 0.902 ± 1.808 | 0.332 ± 0.253 | 0.304 ± 0.199 |
SPIADB | 1.241 ± 0.776 | 1.211 ± 0.609 | 0.647 ± 0.289 | 0.655 ± 0.375 | 0.709 ± 0.378 | 0.361 ± 0.539 | 0.305 ± 0.327 |
ESTDB | 1.317 ± 1.096 | 1.297 ± 1.043 | 0.687 ± 0.430 | 0.691 ± 0.456 | 0.753 ± 0.549 | 0.494 ± 0.464 | 0.329 ± 0.251 |
APEADB | 1.283 ± 0.740 | 1.268 ± 0.646 | 0.644 ± 0.263 | 0.663 ± 0.337 | 0.727 ± 0.392 | 0.426 ± 0.357 | 0.290 ± 0.228 |
Datasets | GRU | LSTM | BiLSTM | CNN_LSTM_3 | CNN_LSTM_5 | MC | RMC |
CEBSDB | 1.061 ± 0.234 | 1.091 ± 0.275 | 0.621 ± 0.163 | 0.619 ± 0.164 | 0.663 ± 0.175 | 0.432 ± 0.192 | 0.312 ± 0.203 |
SADB | 1.067 ± 0.149 | 1.093 ± 0.190 | 0.670 ± 0.153 | 0.673 ± 0.129 | 0.714 ± 0.136 | 0.386 ± 0.254 | 0.338 ± 0.226 |
SPIADB | 1.076 ± 0.168 | 1.097 ± 0.201 | 0.657 ± 0.150 | 0.662 ± 0.115 | 0.706 ± 0.128 | 0.388 ± 0.529 | 0.310 ± 0.290 |
ESTDB | 1.113 ± 0.707 | 1.138 ± 0.692 | 0.680 ± 0.306 | 0.704 ± 0.427 | 0.745 ± 0.443 | 0.590 ± 0.446 | 0.336 ± 0.276 |
APEADB | 1.059 ± 0.292 | 1.085 ± 0.334 | 0.642 ± 0.216 | 0.648 ± 0.168 | 0.687 ± 0.175 | 0.547 ± 0.422 | 0.312 ± 0.256 |
Datasets | GRU | LSTM | BiLSTM | CNN_LSTM_3 | CNN_LSTM_5 | MC | RMC |
CEBSDB | 1.350 ± 0.591 | 1.446 ± 0.676 | 0.926 ± 0.563 | 0.906 ± 0.451 | 1.030 ± 0.622 | 0.633 ± 0.202 | 0.379 ± 0.228 |
SADB | 1.131 ± 0.361 | 1.247 ± 0.422 | 0.983 ± 0.652 | 0.819 ± 0.238 | 0.873 ± 0.284 | 0.539 ± 0.318 | 0.397 ± 0.262 |
SPIADB | 1.179 ± 0.398 | 1.325 ± 0.544 | 1.019 ± 0.674 | 0.828 ± 0.230 | 0.890 ± 0.308 | 0.502 ± 0.406 | 0.338 ± 0.247 |
ESTDB | 1.193 ± 0.418 | 1.335 ± 0.543 | 1.028 ± 0.743 | 0.849 ± 0.304 | 0.903 ± 0.369 | 0.728 ± 0.356 | 0.337 ± 0.252 |
APEADB | 1.144 ± 0.368 | 1.266 ± 0.470 | 0.958 ± 0.689 | 0.811 ± 0.256 | 0.864 ± 0.315 | 0.694 ± 0.409 | 0.315 ± 0.246 |
Methods | Computation cost (s) |
GRU | 1.117 ± 0.460 |
LSTM | 1.357 ± 0.704 |
BiLSTM | 2.298 ± 2.091 |
CNN_LSTM_3 | 2.484 ± 2.218 |
CNN_LSTM_5 | 2.889 ± 0.909 |
MC | 0.669 ± 0.340 |
RMC | 0.105 ± 0.096 |
Datasets | GRU | LSTM | BiLSTM | CNN_LSTM_3 | CNN_LSTM_5 | MC | RMC |
CEBSDB | 1.379 ± 1.112 | 1.346 ± 0.970 | 0.671 ± 0.355 | 0.672 ± 0.459 | 0.770 ± 0.692 | 0.393 ± 0.291 | 0.286 ± 0.194 |
SADB | 1.389 ± 1.751 | 1.318 ± 1.280 | 0.733 ± 0.901 | 0.746 ± 0.902 | 0.902 ± 1.808 | 0.332 ± 0.253 | 0.304 ± 0.199 |
SPIADB | 1.241 ± 0.776 | 1.211 ± 0.609 | 0.647 ± 0.289 | 0.655 ± 0.375 | 0.709 ± 0.378 | 0.361 ± 0.539 | 0.305 ± 0.327 |
ESTDB | 1.317 ± 1.096 | 1.297 ± 1.043 | 0.687 ± 0.430 | 0.691 ± 0.456 | 0.753 ± 0.549 | 0.494 ± 0.464 | 0.329 ± 0.251 |
APEADB | 1.283 ± 0.740 | 1.268 ± 0.646 | 0.644 ± 0.263 | 0.663 ± 0.337 | 0.727 ± 0.392 | 0.426 ± 0.357 | 0.290 ± 0.228 |
Datasets | GRU | LSTM | BiLSTM | CNN_LSTM_3 | CNN_LSTM_5 | MC | RMC |
CEBSDB | 1.061 ± 0.234 | 1.091 ± 0.275 | 0.621 ± 0.163 | 0.619 ± 0.164 | 0.663 ± 0.175 | 0.432 ± 0.192 | 0.312 ± 0.203 |
SADB | 1.067 ± 0.149 | 1.093 ± 0.190 | 0.670 ± 0.153 | 0.673 ± 0.129 | 0.714 ± 0.136 | 0.386 ± 0.254 | 0.338 ± 0.226 |
SPIADB | 1.076 ± 0.168 | 1.097 ± 0.201 | 0.657 ± 0.150 | 0.662 ± 0.115 | 0.706 ± 0.128 | 0.388 ± 0.529 | 0.310 ± 0.290 |
ESTDB | 1.113 ± 0.707 | 1.138 ± 0.692 | 0.680 ± 0.306 | 0.704 ± 0.427 | 0.745 ± 0.443 | 0.590 ± 0.446 | 0.336 ± 0.276 |
APEADB | 1.059 ± 0.292 | 1.085 ± 0.334 | 0.642 ± 0.216 | 0.648 ± 0.168 | 0.687 ± 0.175 | 0.547 ± 0.422 | 0.312 ± 0.256 |
Datasets | GRU | LSTM | BiLSTM | CNN_LSTM_3 | CNN_LSTM_5 | MC | RMC |
CEBSDB | 1.350 ± 0.591 | 1.446 ± 0.676 | 0.926 ± 0.563 | 0.906 ± 0.451 | 1.030 ± 0.622 | 0.633 ± 0.202 | 0.379 ± 0.228 |
SADB | 1.131 ± 0.361 | 1.247 ± 0.422 | 0.983 ± 0.652 | 0.819 ± 0.238 | 0.873 ± 0.284 | 0.539 ± 0.318 | 0.397 ± 0.262 |
SPIADB | 1.179 ± 0.398 | 1.325 ± 0.544 | 1.019 ± 0.674 | 0.828 ± 0.230 | 0.890 ± 0.308 | 0.502 ± 0.406 | 0.338 ± 0.247 |
ESTDB | 1.193 ± 0.418 | 1.335 ± 0.543 | 1.028 ± 0.743 | 0.849 ± 0.304 | 0.903 ± 0.369 | 0.728 ± 0.356 | 0.337 ± 0.252 |
APEADB | 1.144 ± 0.368 | 1.266 ± 0.470 | 0.958 ± 0.689 | 0.811 ± 0.256 | 0.864 ± 0.315 | 0.694 ± 0.409 | 0.315 ± 0.246 |
Methods | Computation cost (s) |
GRU | 1.117 ± 0.460 |
LSTM | 1.357 ± 0.704 |
BiLSTM | 2.298 ± 2.091 |
CNN_LSTM_3 | 2.484 ± 2.218 |
CNN_LSTM_5 | 2.889 ± 0.909 |
MC | 0.669 ± 0.340 |
RMC | 0.105 ± 0.096 |