
Glucose trend prediction based on continuous glucose monitoring (CGM) data is a crucial step in the implementation of an artificial pancreas (AP). A glucose trend prediction model with high accuracy in real-time can greatly improve the glycemic control effect of the artificial pancreas and effectively prevent the occurrence of hyperglycemia and hypoglycemia. In this paper, we propose an improved wavelet transform threshold denoising algorithm for the non-linearity and non-smoothness of the original CGM data. By quantitatively comparing the mean square error (MSE) and signal-to-noise ratio (SNR) before and after the improvement, we prove that the improved wavelet transform threshold denoising algorithm can reduce the degree of distortion after the smoothing of CGM data and improve the extraction effect of CGM data features at the same time. Based on this finding, we propose a glucose trend prediction model (IWT-GRU) based on the improved wavelet transform threshold denoising algorithm and gated recurrent unit. We compared the root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R2) of Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Support vector regression (SVR), Gated Recurrent Unit (GRU) and IWT-GRU on the original CGM monitoring data of 80 patients for 7 consecutive days with different prediction horizon (PH). The results showed that the IWT-GRU model outperformed the other four models. At PH = 45 min, the RMSE was 0.5537 mmol/L, MAPE was 2.2147%, R2 was 0.989 and the average runtime was only 37.2 seconds. Finally, we analyze the limitations of this study and provide an outlook on the future direction of blood glucose trend prediction.
Citation: Tao Yang, Qicheng Yang, Yibo Zhou, Chuanbiao Wen. Glucose trend prediction model based on improved wavelet transform and gated recurrent unit[J]. Mathematical Biosciences and Engineering, 2023, 20(9): 17037-17056. doi: 10.3934/mbe.2023760
[1] | Qing Tian, Canyu Sun . Structure preserved ordinal unsupervised domain adaptation. Electronic Research Archive, 2024, 32(11): 6338-6363. doi: 10.3934/era.2024295 |
[2] | Yu Wang . Bi-shifting semantic auto-encoder for zero-shot learning. Electronic Research Archive, 2022, 30(1): 140-167. doi: 10.3934/era.2022008 |
[3] | Hui Jiang, Di Wu, Xing Wei, Wenhao Jiang, Xiongbo Qing . Discriminator-free adversarial domain adaptation with information balance. Electronic Research Archive, 2025, 33(1): 210-230. doi: 10.3934/era.2025011 |
[4] | Shiqi Zou, Xiaoping Shi, Shenmin Song . MOEA with adaptive operator based on reinforcement learning for weapon target assignment. Electronic Research Archive, 2024, 32(3): 1498-1532. doi: 10.3934/era.2024069 |
[5] | Xiaoying Zheng, Jing Wu, Xiaofeng Li, Junjie Huang . UAV search coverage under priority of important targets based on multi-location domain decomposition. Electronic Research Archive, 2024, 32(4): 2491-2513. doi: 10.3934/era.2024115 |
[6] | Zhenzhong Xu, Xu Chen, Linchao Yang, Jiangtao Xu, Shenghan Zhou . Multi-modal adaptive feature extraction for early-stage weak fault diagnosis in bearings. Electronic Research Archive, 2024, 32(6): 4074-4095. doi: 10.3934/era.2024183 |
[7] | Jing Lu, Longfei Pan, Jingli Deng, Hongjun Chai, Zhou Ren, Yu Shi . Deep learning for Flight Maneuver Recognition: A survey. Electronic Research Archive, 2023, 31(1): 75-102. doi: 10.3934/era.2023005 |
[8] | Li Sun, Bing Song . Feature adaptive multi-view hash for image search. Electronic Research Archive, 2023, 31(9): 5845-5865. doi: 10.3934/era.2023297 |
[9] | Jicheng Li, Beibei Liu, Hao-Tian Wu, Yongjian Hu, Chang-Tsun Li . Jointly learning and training: using style diversification to improve domain generalization for deepfake detection. Electronic Research Archive, 2024, 32(3): 1973-1997. doi: 10.3934/era.2024090 |
[10] | Wangwei Zhang, Menghao Dai, Bin Zhou, Changhai Wang . MCADFusion: a novel multi-scale convolutional attention decomposition method for enhanced infrared and visible light image fusion. Electronic Research Archive, 2024, 32(8): 5067-5089. doi: 10.3934/era.2024233 |
Glucose trend prediction based on continuous glucose monitoring (CGM) data is a crucial step in the implementation of an artificial pancreas (AP). A glucose trend prediction model with high accuracy in real-time can greatly improve the glycemic control effect of the artificial pancreas and effectively prevent the occurrence of hyperglycemia and hypoglycemia. In this paper, we propose an improved wavelet transform threshold denoising algorithm for the non-linearity and non-smoothness of the original CGM data. By quantitatively comparing the mean square error (MSE) and signal-to-noise ratio (SNR) before and after the improvement, we prove that the improved wavelet transform threshold denoising algorithm can reduce the degree of distortion after the smoothing of CGM data and improve the extraction effect of CGM data features at the same time. Based on this finding, we propose a glucose trend prediction model (IWT-GRU) based on the improved wavelet transform threshold denoising algorithm and gated recurrent unit. We compared the root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R2) of Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Support vector regression (SVR), Gated Recurrent Unit (GRU) and IWT-GRU on the original CGM monitoring data of 80 patients for 7 consecutive days with different prediction horizon (PH). The results showed that the IWT-GRU model outperformed the other four models. At PH = 45 min, the RMSE was 0.5537 mmol/L, MAPE was 2.2147%, R2 was 0.989 and the average runtime was only 37.2 seconds. Finally, we analyze the limitations of this study and provide an outlook on the future direction of blood glucose trend prediction.
High-speed rails possess the advantages of fast speed, high efficiency, low energy consumption, and low pollution and play a vital backbone role in China's transportation system at present [1,2]. With the advent of the era of intelligence and the vigorous development of the 5G technique, the high-speed rail control system, which integrates new technology like big data and intelligent management, also thrives gradually [3,4]. However, now the communication environment faced by the high-speed rail control system has become more complex and complicated, and the risk of being attacked or threatened by external security has also increased over the years. Therefore, how to build up a solid defensive barrier for the high-speed rail control system and protect it from being assaulted by outside threats have become a hot research issue in both railway transit optimization and the cybersecurity field [5,6].
To guard against external malicious attacks and ensure traffic safety, China, the country with the longest high-speed railways, has proposed the China Train Control System 3 high-speed train control system (also known as the CTCS-3). In CTCS-3, an abnormal data detection algorithm based on deterministic finite automaton (DFA) is raised to proactively identify anomalous data in the data stream sent to the high-speed rail control system [7]. Nevertheless, with the fast development of China's information network, an increasing number of high-speed trains are put into operation, and the corresponding data has explosively increased [8,9]. This situation brings a problem that the traditional, serial DFA-based data detection algorithm has become more difficult to detect abnormal data and potential threats from train control data in time. Especially in Spring Festival and other vacations, the inadequate capacity to handle the newly emerging train control data has become the primary factor influencing extra trains. In the face of the rapid development of the high-speed rail train control system, many researchers are devoted to the optimization work of CTCS-3 to improve the detection efficiency of train control data and ensure the smooth and healthy operation of the high-speed rail system.
With the development of communication and information technologies, some of the researchers try to solve problems with big data and blockchain technology [10–15]. Inspired by this idea, researchers in the train control field also employ the mighty computing power of distributed computing platforms to design multi-task parallel detection algorithms and improve the efficiency of train control data detection [16,17]. Even though some progress has been made, the high coupling dependencies inside the train control data stop the detection work from performing concurrently. It causes the result that the detection efficiency for the control data is low, and the utilization of resources is not high.
In response to the above problems, this paper employed the parallel speculation technique commonly used in multi-core areas and raised a speculative parallel detection algorithm for high-speed train control data. In the beginning, the structure of the train control data is analyzed. Then the algorithm divides the train control data into small segments and eliminates internal dependencies that hide in the data. Secondly, based on the divided data, the traditional DFA-based algorithm is parallelized and transformed based on the parallel speculation technology, so that the inline control dependencies that impact the algorithm flow can be modified. Finally, the parallel algorithm is implemented on Apache Spark, the widely used distributed platform, to detect the divided data simultaneously, making full use of the mighty computing power to improve the detection efficiency of the high-speed rail train control data. The main contributions of this article are as follows.
1) The internal dependencies which impact the algorithm's parallelism are found based on the modeling analysis for the conventional data detection process.
2) The speculation technique is introduced to overcome the internal dependencies and parallelizes the conventional data detection algorithm.
3) Implement the proposed algorithm with the speculation mechanism onto the distributed platform and improve abnormal data detection efficiency.
The high-speed train control data refers to the data generated by the control system during the operation of high-speed trains and is related to the safety of high-speed train operation. For example, the high-speed train control data includes the synthetic platform signal data, the monitoring level conversion data, the driver operation data, the electronic control equipment operation data, the central control system failure data, the emergency braking command data, and so on. Unlike the equipment operation data collected by onboard train sensors, the train control data is recorded by the dedicated onboard train control system ATS (Automatic Train Supervision) and transmitted through a remote wireless network: as shown in Figure 1. In CTCS-3, the train control data is collected at set intervals. When the collection works end, the collected information is manually dumped by a laptop and then sent to the railway bureau via WLAN or wireless network GSM-R (Global System for Mobile Communications-Railway). Otherwise, the onboard computer automatically stores the collected information and transmits it through GSM-R [18].
However, due to the lack of security audits and intrusion prevention measures, GSM-R, the enclosed network system, is very vulnerable to outside security issues such as hacking, extortion, and virus attacks. The railway bureau also tends to be in danger because of the unsafe network. Even though safety methods such as SIM cloning, anti-blocking, and other IoT-based safety precautions are introduced to enhance transmission security, there is still the risk of damage or tampering [19].
Some researchers attempt to detect abnormal data with novel machine learning techniques to solve the problem and boost data detection efficiency. For instance, Belgrana and Maruno have improved network intrusion detection rates and reduced processing time with Neural Network [20,21]. In contrast, other researchers proposed a multi-stage optimized ML-based NIDS framework or fuzzy detection system that reduces computational complexity and maintains the detection performance [22,23]. However, these methods cannot be adopted by the high-speed train intrusion prevention system. Because the safe operation of high-speed trains depends on accurate train control data, but deep learning-based methods cannot ensure 100% reliability of train control data [24].
On the contrary, CTCS-3 brings the DFA-based data detection algorithm into the communication server to eliminate the risk that impacts data transmission safety. In this way, the abnormal data can be identified, and the attacks and threats can be defended. Specifically, when the communication server receives the train control data, it will be compared with the abnormal characteristics predefined by a DFA. Then it can be known that whether there is abnormal information in the control data.
The DFA is made up of five tuples like (S,Σ,δ,s0,F), where S is the set of states; ∑ is the alphabet, which represents the set of finite input symbols; s0 is the initial state of the DFA; F is the set of accepting states (F⊆S); δ is the transfer function, and it means that a state in state set S calculates with a letter from the alphabet Σ, then the calculation result is another state in state set S. The DFA-based data detection process begins from the start state s0, reading the train control data ω character by character (ω∈∑∗, * means the connection of the elements in the alphabet ∑) and changing its state according to the transfer function δ. Once the DFA transfers its state to the acceptance state (the state in the set F), it means that abnormality is found in the train control data ω. Otherwise, there is no abnormality after detection. The pseudo-code is shown in Table 1.
Algorithm 1: DFA-based data detection algorithm |
Input 1: Train control data ω and its length I |
Input 2: Specific DFA that recognizes the abnormality |
Output: Whether the abnormality is found |
1 state = start_ state; |
2 for i = 0 to |I-1| do |
3 input char =ω[i]; |
4 state = δ[state][input char]; |
5 if state∈F then |
6 return Abnormality Found; |
7 end |
8 end |
9 return NotFound; |
It can be seen from Table 1 that the procedure of algorithm 1 consists of a series of iterations, in which the current character ω[i] and the current state are sequentially calculated, and then the state of the DFA migrated after the calculation. Assuming that the time for processing one single character is M, the length of the train control data is I, then the time and space complexity for detecting the whole train control data will be M * I and I, respectively. In total, the traditional algorithm is inefficient. This brings a problem that when the data volume becomes large or the state number increases, the time and space overhead will promote dramatically. Meanwhile, the efficiency of the detection algorithm will experience a significant reduction.
Most researchers use parallel computing or distributed computing technology to solve the problem and raise data detection efficiency. For example, Ding proposed a multi-step regular expression matching algorithm based on a parallel character index, which improves the matching throughput by 48.5% on average [25]. Liu proposed a multi-DFA parallel detection algorithm based on a distributed computing platform, improving the query response time by up to 43.9% [14]. Lu pointed out that the GPU platform can also implement large-scale detecting [26]. These measures focus on transplanting traditional algorithms onto new platforms and improving detection efficiency by synchronizing multiple DFAs.
However, these solutions are mainly on multiple DFAs cooperation, and the overall efficiencies are limited when the data amount grows larger. Even some parallel detection tasks with high latency would make the whole procedure time-consuming. By analyzing the underlying logic of these proposed algorithms in-depth, it can be known that the DFA-based data detection procedure proceeds iterations by iterations, but the current solutions are hard to optimize these iterative computations in parallel. When the data scale becomes large, the detection procedure becomes a long iteration chain, which is hardly shortened or parallelized by any optimization measures above. Thus the efficiency of the detection algorithm on high-speed train control data is considered low.
In summary, in order to solve the problem that the time and space overhead generated in the iterative operation affects the detection efficiency of the high-speed train control data. This paper proposes a highly effective data detection algorithm based on speculative parallel optimization techniques. Firstly, the high-speed train control data are divided into small blocks of the same size. Secondly, by introducing the speculative parallel optimization technique widely used in multi-core platforms, this paper reforms the traditional DFA-based detection algorithm with a parallel speculation mechanism to avoid the control dependencies between different iterations and make the iteration computations work in parallel. Finally, the reformed algorithm is implemented on Apache Spark, the most popular distributed computing platform with high concurrency, to detect the small divided blocks in parallel. In this way, not only can the data detection efficiency be improved, but also the computing resource utilization of the distributed computing platform can be raised simultaneously.
Based on the description of existing problems and solutions, the speculative parallel detection algorithm for high-speed train control data is designed as follows: The algorithm is divided into three parts, namely high-speed train control data partition, speculative parallel detection, and speculation result verification. The design ideas of the algorithm are described in detail below in order.
First and foremost, according to the typical speculative parallel optimization process, the input data should be divided into blocks and then gathered into the dataset [27]. Precisely, in the proposed algorithm, the control data should be cut into segments of the same size. Besides, to support parallel execution, every part except the first one should also be assigned an initial DFA state. In this way can all the segments be detected independently, without waiting for the state from its former part. Thereby the goal of parallel detecting the multiple data segments can be achieved.
Except for establishing the data dividing principle, the size of the division granularity also needs to be considered. Because in the speculative parallel data detection algorithm, if the division size is too small, resulting in indirectly that the derived parallel tasks workload increases, and the number of parallel tasks is affected at the same time. Besides, the coordination between different speculative parallel tasks may be complicated and complex to be adjusted. In the worst case, fine granular data partitioning will also increase the communication frequency between parallel tasks and affect the overall performance of the proposed algorithm. On the contrary, if the granularity of the data is too coarse, the load imbalance and data skew problems in distributed computing would happen, which leads to an increase in the running time of a single parallel task and affects the overall performance of the proposed speculative parallel algorithm. Therefore, it is necessary to establish a theoretical model to reveal the relationship between data partitioning granularity and parallel algorithm time spending. Then select a suitable data partitioning granularity after careful calculation.
The model building process is as follows. Firstly, the time cost is in Eq (1). As shown in Eq (1), the time cost T of the speculative parallel algorithm consists of two parts: Ts is the time of distributing parallel tasks and dividing the high-speed train control data into equal-length data segments. But comparing with the time of parallel tasks distribution and transmission, the time for data division is so short that it can be ignored. In addition, the Te is the time for parallel tasks' execution.
T=Ts+Te | (1) |
Assuming that the total amount of data is s, the data division granularity is n, the number of processors in the distributed cluster is k, the data transmission speed in the cluster is p. Besides, the mathematical expectation of the execution time of a single task is t, the data segmentation is s/n, and correspondingly, the number of the parallel tasks is the same as s/n. Then after bringing the relevant parameters, Eq (1) can be represented as Eq (2).
T=np+s/nk⋅t=np+s⋅tn⋅k | (2) |
The first part in Eq (2) is the parallel tasks' distribution time, which is subjected to the division granularity. Furthermore, the second part in Eq (2) is the time for parallel tasks' execution, and it is determined by the total data amount and every single task processing time. In the practical running environment, the high-speed train control data s, the number of the cluster's processors k, the data transmission speed p, and the mathematical expectation of a single task execution time t is known. Therefore, Eq (2) represents the relationship between the data partitioning granularity and parallel algorithm time cost, and the schematic diagram on Eq (2) is described in Figure 2.
Since the partitioning granularity and execution time cannot be negative numbers, only the first quadrant is adopted. It can be seen from Figure 2 that when the data partitioning granularity tends to 0, the number of the divided data segments tends to have positive infinity. In such circumstances, the execution time of the parallel algorithm will increase significantly. In an actual operating environment, this circumstance refers to that fine data division granularity would lead the task distribution, the message communication and the coordination of parallel tasks become complicated, and finally, the overall performance of the proposed algorithm would slow down. Instead, when the data partitioning granularity increases, the proposed algorithm's execution time increases gradually due to the rise in the running time of the single parallel task. In extreme cases, the partitioning granularity is equal to s, and the proposed algorithm degenerates to the traditional, unparallel algorithm.
Thus, the goal for setting a proper division granularity becomes to find the minimum in Eq (2). Taking the derivative of Eq (2) and setting the derivative to 0, and it is easy to know that when the division granularity meets the demand of Eq (3), the execution time of the speculative parallel algorithm is minimum, which also means that the parallel algorithm takes the shortest time.
n=±√s⋅t⋅pk | (3) |
Then the corresponding speculative parallel detection mechanism is designed to ensure that the divided data can be detected simultaneously. The proposed algorithm takes the speculative parallel optimization technique to reform the traditional data detection algorithm and make the detection procedure for large-scale data faster.
At first, it is necessary to analyze the inline relationship of the traditional DFA-based detection processes. From Algorithm 1 in Table 1, it can be seen that during the first iteration, when the first letter of the high-speed train control data is read, based on the initial state, the algorithm starts to migrate the current DFA state with the help of the transfer function. When the state changes, the algorithm decides to continue detecting the following letters or reporting the abnormal information. In addition, it is easy to know that any iterations except the first iteration could not predict the former DFA states in advance by analyzing the relationship between different iterations. Similarly, any iterations except the first iteration could not execute until the former iterations finish. However, even though the DFA state is the critical factor that impacts the detection procedure through the analysis, for lack of the DFA state prediction method, it is still challenging to achieve the parallelization of traditional data detection procedure by a simple divide conquer strategy.
Under this circumstance, the speculative parallel optimization technique, commonly seen in the multi-core parallelism research field, is borrowed to resolve these difficulties. This technique is an explicit parallelization technology. By temporarily ignoring the dependencies between the concurrent units, threads that were initially thought not to be parallel can be tentatively executed in parallel, and the performance would be improved when such parallel execution succeeds. The speculative execution model is shown in Figure 3.
As Figure 3(a) shows, T1 and T2 can only be executed serially during the sequential execution procedure due to the control dependence inside the two parts. However, as shown in Figure 3(b), the speculative parallel optimization technique can eliminate the influence of the existing control dependence with the pre-calculation method so that T2 could execute in advance instead of waiting for the result of T1. When T1 finishes, the result of T1 could validate the correctness of the pre-calculation content; thereby, the speculative result of T2 could also be validated at the same time: if the pre-calculation content is correct after validation, then the result of T2 is also correct, and the speculative execution succeeds in general; but if the validation result turns wrong, then the speculative execution on T2 fails. As shown in Figure 3(c), the speculative result will be discarded, and T2 will be re-executed with the parameter that comes from T1.
Eliminating the dependencies between different iterations is the key to optimizing the conventional data detection algorithm based on DFA. After analyzing Algorithm 1, it can be seen that the dependencies are that the output of the previous operation is the input of the current iteration. The current iteration cannot execute independently because of the lack of the DFA state from its previous iteration. Therefore, when the division of the high-speed train control data finishes, every iteration (except the first iteration) in the data detection procedure should be assigned one DFA state so that they can perform independently. Instead of waiting for the DFA state from its former iteration, such assignment refers to the pre-calculation method in the speculative parallel optimization technique. In this way, all iterations could perform independently without relying on the results of its previous iteration, and the concurrent detection on divided segments could be achieved at the same time.
However, since the parallel iterations execute with the predicted states, the results may not be correct. So the validation procedure is established after the speculative parallel detection procedure to provide the correctness of parallel iterations. The specific step is as follows: if no abnormal data is detected after the speculative parallel detection, all data segments should be re-detected by its former iterations' result: the state calculated by speculative execution. At last, if the re-detection result is the same as the result of the speculative execution. It proves the speculative result is correct, and there is no anomaly in the control data. However, if the result conflicts, it indicates that incorrect speculation occurs in the speculative parallel detection, and the result of the re-detection process shall prevail.
For instance, DFA(M) could detect the abnormal data VIRUS like Eq (4).
M=({0,1,2,3,4,5},{V,I,R,U,S,OTHERS},δ,0,{5}) | (4) |
Wherein the migration function δ is described in Eq (5).
δ={f(0,V)=1,f(1,I)=2,f(2,R)=3,f(3,U)=4,f(4,S)=5,f({2,3,4,5},V)=1,f({2,3,4,5},OTHERS)=1} | (5) |
Moreover, the corresponding state transition diagram of δ is like Figure 4. The dotted line in Figure 4 means that the transition direction when letters except for V, I, R, U, S, is met in DFA(M).
Suppose that a part of high-speed train control data I = AVOIDS_VIRULENCE needs to be checked with DFA(M). According to the proposed algorithm, in the beginning, the data is separated into two parts of equal length, wherein I1 = AVOIDS_V and I2 = IRULENCE. On this basis, the predicted state for I2 is 0, then I1 and I2 can detect abnormal data simultaneously, and the specific process is in Figure 5.
Figure 5 shows the parallel detection process of data segments I1 and I2, where Figure 5(a) represents the definitive, non-speculative task based on the initial state 0. The execution result for detecting I1must be correct. Figure 5(b) illustrates the speculative task based on the pre-calculated state 0, and the execution result for detecting I2 may be incorrect. In addition, the result of each state transition on the speculative task is recorded in the cache to validate the speculative result after parallel detection.
It can also be seen from Figure 5 that for the high-speed train control data I with 16 letters, the conventional algorithm would spend 16 rounds of iterations to fulfill the detection work. In contrast, only eight iterations are required (s1 to s8 in Figure 5) when we adopt the speculative parallel detection procedure. If the computing resources are sufficient and the division granularity is reasonable enough, the detection efficiency will be higher.
When the detection comes to an end, and abnormal data is found, the abnormality will be reported, and the proposed algorithm ends without validation. However, as shown in Figure 5, if no potential exception is found after the detection, we are not sure such results are correct. Therefore the speculative result in Figure 5(b) should be validated.
The validation of speculative results only occurs when abnormalities are not detected after parallel detection. The specific process is as follows: the current data segment obtains the final state, which comes from the detection result on the previous data segment. Then the re-detection for the current data segment operates. Still take data segment I2 in Figure 5(b) for example. Figure 6 illustrates the specific validation process. After the parallel detection on data segments I1 and I2, the validation on the speculative task begins with state 1, and state 1 is the final state when the definitive task on segment I1 is over. During the validation, every state that the validation process produced would be compared with the previous speculative task generated state.
As shown in Figure 6, the speculative result for data segment I2 is stored in the cache. The validation is from s9 to s12, and in total, four rounds of iterations are performed. During the validation, whenever a new state is obtained, it would be compared with the result in cache immediately: If the two states are the same, the validation procedure, as well as the whole parallel algorithm, finishes, and there is no necessity to validate the subsequent speculation results. Because if such a situation occurs, even if the verification continues, the following verification result must be consistent with the speculative result, and there are no abnormalities in high-speed train control data I.
If the state generated by iteration during validation is different from that of the speculative result, the validation continues, and the validation process stops in two situations. The first situation is that an acceptance state appears. It demonstrates that the speculative result is proved wrong after validation, and the abnormality is recorded. The second situation is that the validation of the speculative result is over. It demonstrates that even though the speculative result is still incorrect, this fault does not affect the outcome; that is, the high-speed train control data indeed does not contain abnormal data. The efficiency of the speculative parallel algorithm is similar to the conventional, un-parallel algorithm.
In summary, the essence of the validation stage is the duplicate detection for the divided data with speculative results. Therefore, whether the additional detection affects the efficiency of the parallel algorithm is an issue worthy of discussion. Moreover, the validation for the speculative result relies on an assuming value (the last state that comes from the previous detection result), so the accuracy of the validation also needs to be considered. In response to the problems, we make the following analysis.
At first, by cleaning up the logical relations between the speculative parallel detection procedure and the validation of speculative result procedure, it can be seen that both of the two procedures are performed on the divided data concurrently, rather than detecting the data from the beginning to the end. Thus, once an acceptance state is met, it indicates that an anomaly is detected. The parallel algorithm will report the anomaly and stop immediately. In such a case, the parallel algorithm is highly effective compared with the conventional algorithm. Or a state in the validation stage is consistent with the speculative result (for example, step s12 in Figure 6). The algorithm also finishes early, and the performance is still better than the un-parallel algorithm. In the worst case, the proposed algorithm cannot find abnormalities after the two rounds of parallel data detection. The execution time of the proposed algorithm is only equivalent to the sum of the traditional algorithm's running time and some parallel overheads, so the detection efficiencies of both the speculative parallel algorithm and the conventional algorithm are almost the same. In total, it can be concluded that the additional detection would not affect the performance of our algorithm markedly.
Secondly, it is necessary to make an argument on the correctness of the validation procedure. The argument can be described as: for any divided data segment uk, whether the validation for uk based on the speculative result from its previous segments uk-1 is correct. The specific description of this argument and the corresponding proof is demonstrated as Theorem 1.
Theorem 1: For a specific DFA(N) who is responsible for detecting the abnormal data T in the high–speed train control data, and s0 is the initial state of the DFA. For any continuous data segments uk−1,uk,uk+1, if φ(s0,uk−1ukuk+1)=φ(s0,uk+1), then φ(s0,uk−1ukuk+1)=φ(s0,ukuk+1).
Proof: Assume that Theorem 1 is incorrect; it means that φ(s0,uk−1ukuk+1)=φ(s0,uk+1)=s1, and s1≠s2. Besides, since that T can be recognized by DFA(N), there must be a segment u, which can let the result of the state transfer function σ(s1,u) or σ(s2,u) be the acceptance state. Then the following assumptions can be made based on the above conditions.
1st assumption: the result of σ(s1,u) is the acceptance state while the result of σ(s2,u) is not. In this assumption, according to the fundamental state transfer law, it is easy to know that σ(s1,u)=σ(φ(s0,uk+1),u)=σ(s0,uk+1u), so the result of σ(s0,uk+1u) is the same as σ(s1,u), and it is apparent that uk+1u∈T. Then according to the conversion principle of DFA and regular expression, uk+1u∈T⇒ukuk+1u∈T, so the result of σ(s0,ukuk+1u) is also an acceptance state. Based on the deduction, σ(s0,ukuk+1u)=σ(φ(s0,ukuk+1),u)=σ(s2,u), and it means that the result of σ(s2,u) is as same as the result of σ(s1,u). So, contradiction happens, and the 1st assumption is invalid.
2nd assumption: the result of σ(s2,u) is the acceptance state while the result of σ(s1,u) is not. Then according to the state transfer law, σ(s2,u)=σ(φ(s0,ukuk+1),u)=σ(s0,ukuk+1u), so the result of σ(s0,ukuk+1u) is the same as σ(s2,u), and it is apparent that ukuk+1u∈T. According to the conversion principle of DFA and regular expression, uk+1u∈T⇒uk−1ukuk+1u∈T, so the result of σ(s0,uk−1ukuk+1u) is an acceptance state, and σ(s0,uk−1ukuk+1u)=σ(φ(s0,uk−1ukuk+1),u)=σ(s1,u). It means the result of σ(s1,u) is as same as the result of σ(s2,u). So, contradiction happens, and the 2nd assumption is also invalid.
In conclusion, the two assumptions all make contradictions, s1=s2, and Theorem 1 is tenable.
Through Theorem 1 and its proof, it can be known that even if the validation procedure is based on the speculative execution result, the outcome of the validation is still correct. To sum up, the accuracy of the validation stage can be guaranteed.
The implementation platform for the proposed algorithm is Apache Spark, which is the most popular distributed computing platform in both the industrial and the scientific fields. Apache Spark with high concurrency and reliability is widely used as the general computing engine for data-intensive works in power, communications, and other areas [28–32]. According to the design philosophy of the CTCS-3 control system, Apache Spark would be deployed on the communication server. Therefore, the proposed algorithm would also work on the communication server-based Apache Spark.
Before the implementation, some parameters should be adjusted. Firstly, the task scheduling policy on Apache Spark is switched from FIAR to FIFO. In such a policy, the proposed algorithm can dynamically adjust the parallelism degree based on computation resource amount and ensure the smooth operation of every parallel task. Besides, the working threads for each task, the memory size, and other parameters should also be adjusted according to the standard tuning measure on Apache Spark. At last, the degradation of the local data latency is limited to 3000 ms so that some parallel tasks with high latency would not slow down the overall performance of the proposed algorithm.
Based on parameter adjustment, Table 2 shows the pseudo-code of the proposed algorithm.
Algorithm 2: Speculative parallel detection algorithm for high-speed train control data |
Input 1: Train control data D1 |
Input 2: Specific DFA that recognizes the abnormality |
Output: Position j where abnormality is found |
1 n = ceil (formula3, D1.size) |
2 buffer [D1.size/n][n] = NULL |
3 In parallel:{ |
4 Specblk[D1.size/n] = divide (D1, n) |
5 } |
6 Parallel_for (i = 0; i < Specblk.length; i++):{ |
7 ω = Specblk[i] |
8 I = Specblk[i].length |
9 state = start_state |
10 for j = 0 to |I-1| do:{ |
11 input char =ω[j] |
12 state = δ[state][input char] |
13 if state∈F then |
14 return j |
15 else buffer[i][j] = state |
16 } |
17 } |
18 Parallel_for (i = 1; i < Specblk.length; i++):{ |
19 ω = Specblk[i] |
20 I = Specblk[i].length |
21 state = buffer[|I-1|][i-1] |
22 for j = 0 to |I-1| do:{ |
23 input char =ω[j] |
24 state = δ[state][input char] |
25 if state∈F then |
26 return j |
27 else if buffer[i][j] == state |
28 return NotFound |
29 } |
30 return NotFound |
31 } |
In Table 2, algorithm 2 consists of four parts: the first part is from lines 1 to 2, whose responsibility is to calculate the data division step according to Eq (3) and initialize the state cache. The second part is from line 3 to line 5, and the main job of this part is to divide the high-speed train control data into data segments of the same length. The third part starts from line 6 and ends at line 17, and it is the speculative parallel detection part. Unlike the detection part in algorithm 1, every state that the DFA transferred should be recorded for validation later during the parallel detection part. The last part is the validation part from line 18 to line 31. In this part, the second round of data detection based on the speculative results is in parallel. In particular, line 27 and line 28 mean that if the current DFA state and the state in cache is consistent, it indicates that the speculative execution result is correct, no further validation is required, and the algorithm ends immediately.
Besides, the deployment diagram of the speculative parallel detection algorithm for high-speed train control data on the distributed computing platform Apache Spark is shown in Figure 7. Figure 7(a) shows the deployment diagram of the conventional serial data detection algorithm. It can be seen from Figure 7(a) that since the data cannot be divided in the traditional algorithm, the execution process in Figure 7(a) is restricted by the data composition order; thus, the conventional algorithm is time-consuming and unable to make full use of the computing resources. However, in Figure 7(b), by the speculative parallel optimization technique, all computing nodes in the cluster-including the main node and the worker nodes, are joining in the computation, and the high-speed train control data can be divided and conquered, which not only improves the detection efficiency of the train control data but also provide the effective use of the computing resources of the distributed computing platform.
Last but not least, Figure 7(b) also shows the time cost for the proposed algorithm in the worst case. If the abnormal data is found or the validation succeeds within a few iterations in the actual runtime environment, the algorithm will end earlier. In addition, data synchronization occurs after the end of the speculative parallel detection, and its purpose is to submit the cache data in each parallel task to the master node, in order to ensure the validation of the speculation result.
First of all, in cooperating with Shaanxi Provincial Key Laboratory of Network Computing and Security Technology, we built our distributed cluster on the high-speed EMU network control platform. The operating system is Ubuntu Server 20.04, while the distributed computing cluster is Apache Spark 2.2.0. The cluster is built up in the master-slave model with nine Lenovo Think Server TS80X servers.
The experimental scheme and specific configuration are shown in Table 3. Five groups of schemes are designed in this experiment, involving traditional serial detection algorithm and speculative parallel detection algorithm. In scheme 1, only one working node is equipped for this experiment because the traditional algorithm cannot use additional computing resources to detect train control data in parallel. Instead, other schemes in Table 3 are equipped with a different number of working nodes to test the performance of the proposed algorithms supported by the different scales of the computing resources.
Name | Experimental subject | Specific configuration |
Scheme 1 | Traditional serial algorithm | 1 main node + 1 working node |
Scheme 2 | Speculative parallel algorithm | 1 main node + 2 working nodes |
Scheme 3 | Speculative parallel algorithm | 1 main node + 4 working nodes |
Scheme 4 | Speculative parallel algorithm | 1 main node + 6 working nodes |
Scheme 5 | Speculative parallel algorithm | 1 main node + 8 working nodes |
Besides, with the help of Xi'an Railway Bureau, we randomly select eight pieces of high-speed train control data from all trains running from July to December 2020. Besides, all train models and the corresponding marshaling orders are included in the experiment. The specific experimental data is split into two sets according to the marshaling order; and the details are shown in Table 4.
Dataset | Train number | Time | Model | Marshaling | Data size |
Dataset 1 | D1913 | 4 h 7 min | CRH380B | 8 | 10.43 GB |
Dataset 1 | G858 | 4 h 28 min | CR400AF | 8 | 12.05 GB |
Dataset 1 | G2231 | 5 h 31 min | CRH380B | 8 | 14.69 GB |
Dataset 1 | D1903 | 7 h 40 min | CRH380B | 8 | 19.79 GB |
Dataset 2 | D1927 | 3 h 32 min | CRH380B | 16 | 18.11 GB |
Dataset 2 | G2233 | 5 h 56 min | CRH380B | 16 | 30.19 GB |
Dataset 2 | D1925 | 7 h 7 min | CRH380B | 16 | 38.54 GB |
Dataset 2 | G98 | 7 h 42 min | CRH380AL | 16 | 43.27 GB |
As to the detection rule, 217 regular expression rules from the famous intrusion detection system Bro are selected to construct a complex DFA with 8094 states in our experiment. In addition, the data division granularity is set 32 MB by Eq (3) during the experiment.
At last, all experimental schemes on each set of data are performed ten times to ensure the accuracy of the results, and the average value of the results on ten executions will be adopted as the experimental results. If the variance of the results on ten executions is enormous after statistics, the experiment will be repeated.
The experimental results on five different schemes with two sets of experimental data are in Figure 8. As Figure 8 demonstrates, the abscissa represents the experimental data, and the ordinate represents the time. Each column represents the time it takes to process a column of experimental data using a practical scheme. In addition, Figure 8(a) shows the performance of different experimental schemes for data set 1, while Figure 8(b) shows the performance of different experimental schemes when processing data set 2.
It can be seen from Figure 8 that compared with the traditional algorithm, the speculative parallel algorithm with multiple working nodes significantly reduces the detection time for the train control data. Meanwhile, by observing different experimental schemes in the same dataset, it can be seen that along with the increment of the computing resources, the execution efficiency of the speculative parallel algorithm grows gradually.
Moreover, by summarizing the performance of schemes 1 and 2 in Figure 8, it is easy to know that for the large-scale high-speed train control data. The average time consumption of the speculative parallel algorithm based on dual working nodes is reduced by about 35% compared with the traditional serial algorithm. Similarly, in schemes 1 and 3, schemes 1 and 4, and schemes 1 and 5 from Figure 8, it can be seen that when comparing with the traditional, un-parallel algorithm, the speculative parallel algorithm based on 4/6/8 working nodes reduces the detection time by 62, 70, and 80% respectively. In total, the experiment shows that compared with the traditional DFA-based abnormal data detection algorithm, the speculative parallel detection algorithm can significantly improve the detection efficiency for high-speed train control data.
The analysis of experimental results is designed to make a deep understanding of the growth law of the proposed algorithm. The speedup of each experimental scheme is calculated as Eq (6). In Eq (6), s represents the speedup value, T1 indicates the conventional algorithm execution time, which is also known as the time cost in experimental scheme 1, and Tn represents the speculative parallel algorithm execution time.
s=T1Tn | (5) |
After calculation, the results are collected and counted, and Figure 9 illustrates the change of the speedup curve. In this figure, the abscissa represents the experimental scheme, and the ordinate represents the speedup value.
Specifically, the speedup changes on parallel experimental schemes (Scheme 2 to Scheme 5) with data sets 1 and 2 are illustrated in Figure 9(a), (b), respectively. By observing the changing patterns on different speedups, it can be concluded that our algorithm shows good scalability. The speedup grows steadily and regularly when the computing resources keep increasing.
Besides, after calculating the slope of each speedup curve in Figure 9(a), a phenomenon can be observed that the curve slopes in Schemes 2 and 3 are large, decreasing in Schemes 4 and 5. This phenomenon happens in every speedup curve in Figure 9(a), and it means that the algorithm's scalability at the very beginning is better than the one in the end. In a real environment, it indicates that in Schemes 2 and 3, the investment of computing resources leads to a significant improvement in algorithm efficiency. However, in Schemes 4 and 5, an obvious diminishing marginal utility appears. It brings a problem that with the continuous investment of computing resources, the performance improvement of parallel algorithms is becoming increasingly limited.
The same change patterns are also witnessed in Figure 9(b), proving that this phenomenon is common in the proposed algorithm. This phenomenon is because the continuous investment of computing resources will increase network communications and data exchanges between different computing nodes in the cluster, resulting in the proposed algorithm's performance no longer maintaining the same growth rate.
In addition, comparing the curves in Figure 9(a), (b), it can be seen that in the same experimental schemes, the curves' slopes in small size data (for instance, D1913 and G858 in dataset 1) are always greater than the curves' slopes in big size data (for example, D1925 and G98 in dataset 2). Because the increase in the amount of data will also affect the algorithm's inner processes like task scheduling, process communication, and data exchange, which in turn affect the performance of the proposed algorithm. Therefore, the focus of subsequent research work will be how to optimize the scheduling strategy of parallel units in the algorithm, reduce data exchange, and reduce communication delay. Furthermore, since the edge computing model and IoT computing devices are more flexible than the conventional centralized computing model [33–35], implementing the proposed on edge computing equipments would also be the future work direction.
This paper proposes a parallel data detection algorithm for a high-speed rail control system based on the parallel speculation optimization technique and Apache Spark. The parallel speculation optimization technique helped overcome the complex dependencies that impact the detection order for high-speed train control data, so that the data can be detected in a parallel and speculative way. The introduction of Apache Spark provided support for large-scale concurrency. The experiment and the corresponding analysis showed that the proposed algorithm obtained better performance than the conventional data detection algorithm. Besides, the proposed algorithm also possessed outstanding scalability, and the detection efficiency could be markedly improved by increasing more computing resources. At present, with the support of Xi'an Railway Bureau, this algorithm has been implemented in the railway control center to help detect the control data generated by high-speed trains, and the overall performance was very satisfying. However, through the experiment, it was also known that there was still room for algorithm optimizations, and in-depth research on the scheduling strategy for the proposed algorithms in distributed systems would continue in the future.
This research work is supported in part by the National Joint Funds of China (U20B2050), National Key R & D Program of China (2018YFB1201500), National Natural Science Foundation of China (61801379), and Special Scientific Research Project of Shaanxi Provincial Department of Education (21JK0781).
All authors declare no conflict of interest in this paper.
[1] | T. T. Zhou, The Discovery and Mechanism of Anti-T2DM Lead Structure Based on Pancreatic β Cell Function Improvement/Liver Gluconeogenic Inhibition Strategy (in Chinese), Ph.D thesis, University of Chinese Academy of Sciences (Shanghai Institute of Materia Medica, CAS), 2017. Available from: https://kns.cnki.net/kcms/detail/detail.aspx?FileName = 1017820618.nh & DbName = CDFD2018. |
[2] |
R. Williams, S. Karuranga, B. Malanda, P. Saeedi, A. Basit, S. Besancon, et al., Global and regional estimates and projections of diabetes-related health expenditure: Results from the International Diabetes Federation Diabetes Atlas, 9th edition, Diabetes Res. Clin. Pract., 162 (2020), 108072. https://doi.org/10.1016/j.diabres.2020.108072 doi: 10.1016/j.diabres.2020.108072
![]() |
[3] |
M. Khan, M. J. Hashim, J. K. King, R. D. Govender, H. Mustafa, J. Al Kaabi, Epidemiology of type 2 diabetes - global burden of disease and forecasted trends, J. Epidemiol. Global Health, 10 (2020), 107-111. https://doi.org/10.2991/jegh.k.191028.001 doi: 10.2991/jegh.k.191028.001
![]() |
[4] |
N. H. Cho, J. E. Shaw, S. Karuranga, Y. Huang, R. F. J. Da, A. W. Ohlrogge, et al., IDF diabetes atlas: global estimates of diabetes prevalence for 2017 and projections for 2045, Diabetes Res. Clin. Pract., 138 (2018), 271-281. https://doi.org/10.1016/j.diabres.2018.02.023 doi: 10.1016/j.diabres.2018.02.023
![]() |
[5] |
Z. Luo, G. Fabre, V. G. Rodwin, Meeting the challenge of diabetes in China, Int. J. Health Policy Manage., 9 (2020), 47-52. https://doi.org/10.15171/ijhpm.2019.80 doi: 10.15171/ijhpm.2019.80
![]() |
[6] |
N. A. Elsayed, G. Aleppo, V. R. Aroda, R. R. Bannuru, F. M. Brown, D. Bruemmer, et al., 2. classification and diagnosis of diabetes: standards of care in diabetes-2023, Diabetes Care, 46 (2023), S19-S40. https://doi.org/10.2337/dc23-S002 doi: 10.2337/dc23-S002
![]() |
[7] |
H. Sun, P. Saeedi, S. Karuranga, M. Pinkepank, K. Ogurtsova, B. B. Duncan, et al., IDF Diabetes Atlas: global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045, Diabetes Res. Clin. Pract., 183 (2022), 109119. https://doi.org/10.1016/j.diabres.2021.109119 doi: 10.1016/j.diabres.2021.109119
![]() |
[8] |
J. Osorio, Severe hypoglycemia associated with risk of vascular events and death, Nat. Rev. Cardiol., 7 (2010), 666. https://doi.org/10.1038/nrcardio.2010.176 doi: 10.1038/nrcardio.2010.176
![]() |
[9] | S. J. Dicken, R. L. Batterham, The role of diet quality in mediating the association between ultra-processed food intake, obesity and health-related outcomes: a review of prospective cohort studies, Nutrients, 14 (2021). https://doi.org/10.3390/nu14010023 |
[10] |
A. Consoli, G. Formoso, Patient perceptions of insulin therapy in diabetes self-management with insulin injection devices, Acta Diabetol., 60 (2023), 705-710. https://doi.org/10.1007/s00592-023-02054-7 doi: 10.1007/s00592-023-02054-7
![]() |
[11] |
S. Reddy, C. C. Wu, A. Jose, J. L. Hsieh, S. D. Rautela, Personalized virtual care using continuous glucose monitoring in adults with type 2 diabetes treated with less intensive therapies, Clin. Diabetes, 41 (2023), 452-457. https://doi.org/10.2337/cd22-0128 doi: 10.2337/cd22-0128
![]() |
[12] | A. T. Reenberg, T. K. S. Ritschel, B. Dammann, J. B. Jørgensen, High-performance uncertainty quantification in large-scale virtual clinical trials of closed-loop diabetes treatment, in 2022 American Control Conference (ACC), (2022), 1367-1372. https://doi.org/10.23919/ACC53348.2022.9867234 |
[13] |
J. Huang, A. M. Yeung, A. Y. Dubord, H. Wolpert, P. G. Jacobs, W. A. Lee, et al., Diabetes technology meeting 2022, J. Diabetes Sci. Technol., 17 (2023), 550757959. https://doi.org/10.1177/19322968221148743 doi: 10.1177/19322968221148743
![]() |
[14] |
D. L. Rodriguez-Sarmiento, F. Leon-Vargas, M. Garcia-Jaramillo, Artificial pancreas systems: experiences from concept to commercialisation, Expert Rev. Med. Devices, 19 (2022), 877-894. https://doi.org/10.1080/17434440.2022.2150546 doi: 10.1080/17434440.2022.2150546
![]() |
[15] |
S. L. Kang, Y. N. Hwang, J. Y. Kwon, S. M. Kim, Effectiveness and safety of a model predictive control (MPC) algorithm for an artificial pancreas system in outpatients with type 1 diabetes (T1D): systematic review and meta-analysis, Diabetol. Metab. Syndr., 14 (2022), 187. https://doi.org/10.1186/s13098-022-00962-2 doi: 10.1186/s13098-022-00962-2
![]() |
[16] | L. N. Zhang, T. Y. Li, L. X. Guo, Q. Pan, Clinical progress and future prospect of continuous glucose monitoring (in Chinese), Chin. J. Clin. Healthcare, 25 (2022), 303-309. Available from: https://kns.cnki.net/kcms/detail/detail.aspx?FileName = LZBJ202203003 & DbName = CJFQ2022. |
[17] |
J. P. Anderson, J. R. Parikh, D. K. Shenfeld, V. Ivanov, C. Marks, B. W. Church, et al., Reverse engineering and evaluation of prediction models for progression to type 2 diabetes: an application of machine learning using electronic health records, J. Diabetes Sci. Technol., 10 (2015), 6-18. https://doi.org/10.1177/1932296815620200 doi: 10.1177/1932296815620200
![]() |
[18] |
K. Saiti, M. Macas, L. Lhotska, K. Stechova, P. Pithova, Ensemble methods in combination with compartment models for blood glucose level prediction in type 1 diabetes mellitus, Comput. Methods Programs Biomed., 196 (2020), 105628. https://doi.org/10.1016/j.cmpb.2020.105628 doi: 10.1016/j.cmpb.2020.105628
![]() |
[19] |
F. Tena, O. Garnica, J. Lanchares, J. I. Hidalgo, Ensemble models of cutting-edge deep neural networks for blood glucose prediction in patients with diabetes, Sensors, 21 (2021), 7090. https://doi.org/10.3390/s21217090 doi: 10.3390/s21217090
![]() |
[20] |
R. Karim, I. Vassanyi, I. Kosa, Improved methods for mid-term blood glucose level prediction using dietary and insulin logs, Medicina, 57 (2021), 676. https://doi.org/10.3390/medicina57070676 doi: 10.3390/medicina57070676
![]() |
[21] |
H. Xu, S. Bao, X. Zhang, S. Liu, W. Jing, Y. Ji, Blood glucose prediction method based on particle swarm optimization and model fusion, Diagnostics, 12 (2022), 3062. https://doi.org/10.3390/diagnostics12123062 doi: 10.3390/diagnostics12123062
![]() |
[22] |
T. Koutny, M. Mayo, Predicting glucose level with an adapted branch predictor, Comput. Biol. Med., 145 (2022), 105388. https://doi.org/10.1016/j.compbiomed.2022.105388 doi: 10.1016/j.compbiomed.2022.105388
![]() |
[23] |
G. Yang, S. Liu, Y. Li, L. He, Short-term prediction method of blood glucose based on temporal multi-head attention mechanism for diabetic patients, Biomed. Signal Process. Control, 82 (2023), 104552. https://doi.org/10.1016/j.bspc.2022.104552 doi: 10.1016/j.bspc.2022.104552
![]() |
[24] |
Z. Nie, M. Rong, K. Li, Blood glucose prediction based on imaging photoplethysmography in combination with machine learning, Biomed. Signal Process. Control, 79 (2023), 104179. https://doi.org/https://doi.org/10.1016/j.bspc.2022.104179 doi: 10.1016/j.bspc.2022.104179
![]() |
[25] | S. Oviedo, J. Vehi, R. Calm, J. Armengol, A review of personalized blood glucose prediction strategies for T1DM patients, Int. J. Numer. Methods Biomed. Eng., 33 (2017). https://doi.org/10.1002/cnm.2833 |
[26] |
V. Felizardo, N. M. Garcia, N. Pombo, I. Megdiche, Data-based algorithms and models using diabetics real data for blood glucose and hypoglycaemia prediction - a systematic literature review, Artif. Intell. Med., 118 (2021), 102120. https://doi.org/10.1016/j.artmed.2021.102120 doi: 10.1016/j.artmed.2021.102120
![]() |
[27] |
E. I. Georga, V. C. Protopappas, D. Polyzos, D. I. Fotiadis, Evaluation of short-term predictors of glucose concentration in type 1 diabetes combining feature ranking with regression models, Med. Biol. Eng. Comput., 53 (2015), 1305-1318. https://doi.org/10.1007/s11517-015-1263-1 doi: 10.1007/s11517-015-1263-1
![]() |
[28] | T. E. Idriss, A. Idri, I. Abnane, Z. Bakkoury, Predicting blood glucose using an LSTM neural network, in 2019 Federated Conference on Computer Science and Information Systems (FedCSIS), (2019), 35-41. https://doi.org/10.15439/2019F159 |
[29] | J. L. Teng, Z. J. Rong, Y. Xu, B. B. Dan, Study on blood glucose prediction method based on GRU network (in Chinese), Comput. Appl. Software, 37 (2020), 107-112. Available from: https://kns.cnki.net/kcms/detail/detail.aspx?FileName = JYRJ202010018 & DbName = CJFQ2020. |
[30] |
S. L. Cichosz, T. Kronborg, M. H. Jensen, O. Hejlesen, Penalty weighted glucose prediction models could lead to better clinically usage, Comput. Biol. Med., 138 (2021), 104865. https://doi.org/10.1016/j.compbiomed.2021.104865 doi: 10.1016/j.compbiomed.2021.104865
![]() |
[31] |
S. L. Cichosz, M. H. Jensen, O. Hejlesen, Short-term prediction of future continuous glucose monitoring readings in type 1 diabetes: development and validation of a neural network regression model, Int. J. Med. Inf., 151 (2021), 104472. https://doi.org/10.1016/j.ijmedinf.2021.104472 doi: 10.1016/j.ijmedinf.2021.104472
![]() |
[32] |
M. F. Rabby, Y. Tu, M. I. Hossen, I. Lee, A. S. Maida, X. Hei, Stacked LSTM based deep recurrent neural network with kalman smoothing for blood glucose prediction, BMC Med. Inf. Decis. Making, 21 (2021), 101. https://doi.org/10.1186/s12911-021-01462-5 doi: 10.1186/s12911-021-01462-5
![]() |
[33] |
J. Carrillo-Moreno, C. Pérez-Gandía, R. Sendra-Arranz, G. García-Sáez, M. E. Hernando, Á. Gutiérrez, Long short-term memory neural network for glucose prediction, Neural Comput. Appl., 33 (2021), 4191-4203. https://doi.org/10.1007/s00521-020-05248-0 doi: 10.1007/s00521-020-05248-0
![]() |
[34] | C. Liang, Study on Methods of Blood Glucose Trend Prediction Based on Time Series Data (in Chinese), Master's thesis, Guilin University of Electronic Technology, 2022. Available from: https://kns.cnki.net/kcms/detail/detail.aspx?FileName = 1022783447.nh & DbName = CMFD2023. |
[35] | X. L. Peng, Blood Glucose Prediction and Hypoglycemia Warning Evaluation Based on LSTM-GRU Model (in Chinese), Master's thesis, School of Henan University, 2022. Available from: https://kns.cnki.net/kcms/detail/detail.aspx?FileName = 1022688198.nh & DbName = CMFD2023. |
[36] |
F. Uesugi, Novel image processing method inspired by wavelet transform, Micron, 168 (2023), 103442. https://doi.org/10.1016/j.micron.2023.103442 doi: 10.1016/j.micron.2023.103442
![]() |
[37] | J. E. Oh, W. T. Kim, H. J. Sim, A. B. Abu, H. J. Lee, J. Y. Lee, Fault diagnosis using wavelet transform method for random signals, J. Korean Soc. Precis. Eng., 22 (2005), 80-89. Available from: https://www.dbpia.co.kr/Journal/articleDetail?nodeId = NODE00855112. |
[38] |
J. Zhao, P. Xu, X. Liu, X. Ji, M. Li, D. Sooranna, et al., Application of machine learning methods for the development of antidiabetic drugs, Curr. Pharm. Des., 28 (2022), 260-271. https://doi.org/10.2174/1381612827666210622104428 doi: 10.2174/1381612827666210622104428
![]() |
[39] |
J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, PNAS, 79 (1982), 2554-2558. https://doi.org/10.1073/pnas.79.8.2554 doi: 10.1073/pnas.79.8.2554
![]() |
1. | Qiang Zhou, Wengang Ma, Yadong Zhang, Jin Guo, Bearing fault diagnosis for variable working conditions via lightweight transformer and homogeneous generalized contrastive learning with inter-class repulsive discriminant, 2025, 139, 09521976, 109548, 10.1016/j.engappai.2024.109548 | |
2. | Wuqiang Shen, Zhenyue Long, Lei Cui, Jinbo Zhang, Parikshit N. Mahalle, Dimitrios A. Karras, 2024, Multisound source joint localization algorithm for moving targets with multipath coherent sources based on DOA clustering algorithm, 9781510681699, 48, 10.1117/12.3038798 |
Algorithm 1: DFA-based data detection algorithm |
Input 1: Train control data ω and its length I |
Input 2: Specific DFA that recognizes the abnormality |
Output: Whether the abnormality is found |
1 state = start_ state; |
2 for i = 0 to |I-1| do |
3 input char =ω[i]; |
4 state = δ[state][input char]; |
5 if state∈F then |
6 return Abnormality Found; |
7 end |
8 end |
9 return NotFound; |
Algorithm 2: Speculative parallel detection algorithm for high-speed train control data |
Input 1: Train control data D1 |
Input 2: Specific DFA that recognizes the abnormality |
Output: Position j where abnormality is found |
1 n = ceil (formula3, D1.size) |
2 buffer [D1.size/n][n] = NULL |
3 In parallel:{ |
4 Specblk[D1.size/n] = divide (D1, n) |
5 } |
6 Parallel_for (i = 0; i < Specblk.length; i++):{ |
7 ω = Specblk[i] |
8 I = Specblk[i].length |
9 state = start_state |
10 for j = 0 to |I-1| do:{ |
11 input char =ω[j] |
12 state = δ[state][input char] |
13 if state∈F then |
14 return j |
15 else buffer[i][j] = state |
16 } |
17 } |
18 Parallel_for (i = 1; i < Specblk.length; i++):{ |
19 ω = Specblk[i] |
20 I = Specblk[i].length |
21 state = buffer[|I-1|][i-1] |
22 for j = 0 to |I-1| do:{ |
23 input char =ω[j] |
24 state = δ[state][input char] |
25 if state∈F then |
26 return j |
27 else if buffer[i][j] == state |
28 return NotFound |
29 } |
30 return NotFound |
31 } |
Name | Experimental subject | Specific configuration |
Scheme 1 | Traditional serial algorithm | 1 main node + 1 working node |
Scheme 2 | Speculative parallel algorithm | 1 main node + 2 working nodes |
Scheme 3 | Speculative parallel algorithm | 1 main node + 4 working nodes |
Scheme 4 | Speculative parallel algorithm | 1 main node + 6 working nodes |
Scheme 5 | Speculative parallel algorithm | 1 main node + 8 working nodes |
Dataset | Train number | Time | Model | Marshaling | Data size |
Dataset 1 | D1913 | 4 h 7 min | CRH380B | 8 | 10.43 GB |
Dataset 1 | G858 | 4 h 28 min | CR400AF | 8 | 12.05 GB |
Dataset 1 | G2231 | 5 h 31 min | CRH380B | 8 | 14.69 GB |
Dataset 1 | D1903 | 7 h 40 min | CRH380B | 8 | 19.79 GB |
Dataset 2 | D1927 | 3 h 32 min | CRH380B | 16 | 18.11 GB |
Dataset 2 | G2233 | 5 h 56 min | CRH380B | 16 | 30.19 GB |
Dataset 2 | D1925 | 7 h 7 min | CRH380B | 16 | 38.54 GB |
Dataset 2 | G98 | 7 h 42 min | CRH380AL | 16 | 43.27 GB |
Algorithm 1: DFA-based data detection algorithm |
Input 1: Train control data ω and its length I |
Input 2: Specific DFA that recognizes the abnormality |
Output: Whether the abnormality is found |
1 state = start_ state; |
2 for i = 0 to |I-1| do |
3 input char =ω[i]; |
4 state = δ[state][input char]; |
5 if state∈F then |
6 return Abnormality Found; |
7 end |
8 end |
9 return NotFound; |
Algorithm 2: Speculative parallel detection algorithm for high-speed train control data |
Input 1: Train control data D1 |
Input 2: Specific DFA that recognizes the abnormality |
Output: Position j where abnormality is found |
1 n = ceil (formula3, D1.size) |
2 buffer [D1.size/n][n] = NULL |
3 In parallel:{ |
4 Specblk[D1.size/n] = divide (D1, n) |
5 } |
6 Parallel_for (i = 0; i < Specblk.length; i++):{ |
7 ω = Specblk[i] |
8 I = Specblk[i].length |
9 state = start_state |
10 for j = 0 to |I-1| do:{ |
11 input char =ω[j] |
12 state = δ[state][input char] |
13 if state∈F then |
14 return j |
15 else buffer[i][j] = state |
16 } |
17 } |
18 Parallel_for (i = 1; i < Specblk.length; i++):{ |
19 ω = Specblk[i] |
20 I = Specblk[i].length |
21 state = buffer[|I-1|][i-1] |
22 for j = 0 to |I-1| do:{ |
23 input char =ω[j] |
24 state = δ[state][input char] |
25 if state∈F then |
26 return j |
27 else if buffer[i][j] == state |
28 return NotFound |
29 } |
30 return NotFound |
31 } |
Name | Experimental subject | Specific configuration |
Scheme 1 | Traditional serial algorithm | 1 main node + 1 working node |
Scheme 2 | Speculative parallel algorithm | 1 main node + 2 working nodes |
Scheme 3 | Speculative parallel algorithm | 1 main node + 4 working nodes |
Scheme 4 | Speculative parallel algorithm | 1 main node + 6 working nodes |
Scheme 5 | Speculative parallel algorithm | 1 main node + 8 working nodes |
Dataset | Train number | Time | Model | Marshaling | Data size |
Dataset 1 | D1913 | 4 h 7 min | CRH380B | 8 | 10.43 GB |
Dataset 1 | G858 | 4 h 28 min | CR400AF | 8 | 12.05 GB |
Dataset 1 | G2231 | 5 h 31 min | CRH380B | 8 | 14.69 GB |
Dataset 1 | D1903 | 7 h 40 min | CRH380B | 8 | 19.79 GB |
Dataset 2 | D1927 | 3 h 32 min | CRH380B | 16 | 18.11 GB |
Dataset 2 | G2233 | 5 h 56 min | CRH380B | 16 | 30.19 GB |
Dataset 2 | D1925 | 7 h 7 min | CRH380B | 16 | 38.54 GB |
Dataset 2 | G98 | 7 h 42 min | CRH380AL | 16 | 43.27 GB |