
In recent years, dynamic programming and reinforcement learning theory have been widely used to solve the nonlinear control system (NCS). Among them, many achievements have been made in the construction of network model and system stability analysis, but there is little research on establishing control strategy based on the detailed requirements of control process. Spurred by this trend, this paper proposes a detail-reward mechanism (DRM) by constructing the reward function composed of the individual detail evaluation functions in order to replace the utility function in the Hamilton-Jacobi-Bellman (HJB) equation. And this method is introduced into a wider range of deep reinforcement learning algorithms to solve optimization problems in NCS. After the mathematical description of the relevant characteristics of NCS, the stability of iterative control law is proved by Lyapunov function. With the inverted pendulum system as the experiment object, the dynamic environment is designed and the reward function is established by using the DRM. Finally, three deep reinforcement learning algorithm models are designed in the dynamic environment, which are based on Deep Q-Networks, policy gradient and actor-critic. The effects of different reward functions on the experimental accuracy are compared. The experimental results show that in NCS, using the DRM to replace the utility function in the HJB equation is more in line with the detailed requirements of the designer for the whole control process. By observing the characteristics of the system, designing the reward function and selecting the appropriate deep reinforcement learning algorithm model, the optimization problem of NCS can be solved.
Citation: Shixuan Yao, Xiaochen Liu, Yinghui Zhang, Ze Cui. An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning[J]. Mathematical Biosciences and Engineering, 2022, 19(9): 9258-9290. doi: 10.3934/mbe.2022430
[1] | Kun Lan, Jianzhen Cheng, Jinyun Jiang, Xiaoliang Jiang, Qile Zhang . Modified UNet++ with atrous spatial pyramid pooling for blood cell image segmentation. Mathematical Biosciences and Engineering, 2023, 20(1): 1420-1433. doi: 10.3934/mbe.2023064 |
[2] | Jinzhu Yang, Meihan Fu, Ying Hu . Liver vessel segmentation based on inter-scale V-Net. Mathematical Biosciences and Engineering, 2021, 18(4): 4327-4340. doi: 10.3934/mbe.2021217 |
[3] | Yu Li, Meilong Zhu, Guangmin Sun, Jiayang Chen, Xiaorong Zhu, Jinkui Yang . Weakly supervised training for eye fundus lesion segmentation in patients with diabetic retinopathy. Mathematical Biosciences and Engineering, 2022, 19(5): 5293-5311. doi: 10.3934/mbe.2022248 |
[4] | Yue Li, Hongmei Jin, Zhanli Li . A weakly supervised learning-based segmentation network for dental diseases. Mathematical Biosciences and Engineering, 2023, 20(2): 2039-2060. doi: 10.3934/mbe.2023094 |
[5] | Yantao Song, Wenjie Zhang, Yue Zhang . A novel lightweight deep learning approach for simultaneous optic cup and optic disc segmentation in glaucoma detection. Mathematical Biosciences and Engineering, 2024, 21(4): 5092-5117. doi: 10.3934/mbe.2024225 |
[6] | Danial Sharifrazi, Roohallah Alizadehsani, Javad Hassannataj Joloudari, Shahab S. Band, Sadiq Hussain, Zahra Alizadeh Sani, Fereshteh Hasanzadeh, Afshin Shoeibi, Abdollah Dehzangi, Mehdi Sookhak, Hamid Alinejad-Rokny . CNN-KCL: Automatic myocarditis diagnosis using convolutional neural network combined with k-means clustering. Mathematical Biosciences and Engineering, 2022, 19(3): 2381-2402. doi: 10.3934/mbe.2022110 |
[7] | Zhaoxuan Gong, Jing Song, Wei Guo, Ronghui Ju, Dazhe Zhao, Wenjun Tan, Wei Zhou, Guodong Zhang . Abdomen tissues segmentation from computed tomography images using deep learning and level set methods. Mathematical Biosciences and Engineering, 2022, 19(12): 14074-14085. doi: 10.3934/mbe.2022655 |
[8] | Duolin Sun, Jianqing Wang, Zhaoyu Zuo, Yixiong Jia, Yimou Wang . STS-TransUNet: Semi-supervised Tooth Segmentation Transformer U-Net for dental panoramic image. Mathematical Biosciences and Engineering, 2024, 21(2): 2366-2384. doi: 10.3934/mbe.2024104 |
[9] | Xiangfen Song, Yinong Wang, Qianjin Feng, Qing Wang . Improved graph cut model with features of superpixels and neighborhood patches for myocardium segmentation from ultrasound image. Mathematical Biosciences and Engineering, 2019, 16(3): 1115-1137. doi: 10.3934/mbe.2019053 |
[10] | Dongwei Liu, Ning Sheng, Tao He, Wei Wang, Jianxia Zhang, Jianxin Zhang . SGEResU-Net for brain tumor segmentation. Mathematical Biosciences and Engineering, 2022, 19(6): 5576-5590. doi: 10.3934/mbe.2022261 |
In recent years, dynamic programming and reinforcement learning theory have been widely used to solve the nonlinear control system (NCS). Among them, many achievements have been made in the construction of network model and system stability analysis, but there is little research on establishing control strategy based on the detailed requirements of control process. Spurred by this trend, this paper proposes a detail-reward mechanism (DRM) by constructing the reward function composed of the individual detail evaluation functions in order to replace the utility function in the Hamilton-Jacobi-Bellman (HJB) equation. And this method is introduced into a wider range of deep reinforcement learning algorithms to solve optimization problems in NCS. After the mathematical description of the relevant characteristics of NCS, the stability of iterative control law is proved by Lyapunov function. With the inverted pendulum system as the experiment object, the dynamic environment is designed and the reward function is established by using the DRM. Finally, three deep reinforcement learning algorithm models are designed in the dynamic environment, which are based on Deep Q-Networks, policy gradient and actor-critic. The effects of different reward functions on the experimental accuracy are compared. The experimental results show that in NCS, using the DRM to replace the utility function in the HJB equation is more in line with the detailed requirements of the designer for the whole control process. By observing the characteristics of the system, designing the reward function and selecting the appropriate deep reinforcement learning algorithm model, the optimization problem of NCS can be solved.
Coronary artery disease is the leading cause of cardiovascular mortality. Cardiac imaging has a pivotal role in preventing, diagnosing and treating ischemic heart disease. In recent years, non-invasive clinical cardiac imaging techniques have been rapidly developed; they are commonly used to assess myocardial ischemia and quantitative perfusion parameters, including single-photon emission computed tomography (SPECT), magnetic resonance imaging, computed tomography, myocardial contrast echocardiography (MCE), etc. Compared with the other cardiac imaging techniques, MCE has the advantages of being radiation-free, convenient and inexpensive (about 3–4 times less expensive than SPECT) [1]. MCE has been validated as an effective myocardial perfusion imaging method- to evaluate myocardial perfusion and infarction size [2,3], the microvascular changes after coronary revascularization [4] and the outcome in those undergoing a heart transplant [5].
In the clinical application of MCE, the ultrasound contrast agents (UEAs) containing the gas cores and lipid shells are injected intravenously into the myocardium. Contrast imaging can be generated from the signals produced by the resonance of microbubbles; blood containing microbubbles appear to be a bright white region [6]. Shown in Figure 1, after the distribution of the UEA reaches a stable state, high mechanical index impulses will be applied to clear all of the microbubbles, and then several end-systolic frames during the destruction-replenishment of UEA are selected to fit the time-intensity curve (shown in Figure 2), from which myocardial perfusion parameters such as blood volume, and flux rate can be obtained.
However, the analysis of MCE is very time-consuming, as it includes two steps that need to be done manually by experienced echocardiographers. Several end-systolic frames need to be extracted, and then the myocardium, i.e., the region of interest, needs to be segmented in the cardiac frames too. Therefore, automatic myocardial segmentation methods are desired for efficiency and operator-independence of the MCE perfusion analysis. Nevertheless, myocardial segmentation faces the following challenges. First, the concentration of UEA changes over time during destruction and replenishment, causing great intensity variations in images [7]. Second, the variations in shape and position of the myocardium according to different chambers, heart motions, patient individual differences, etc. Moreover, unclear myocardial borders and misleading structures such as papillary muscle have similar appearances to the myocardium.
Existing myocardial segmentation methods could be broadly classified into traditional image segmentation algorithms and machine learning algorithms. Traditional image segmentation algorithms define the segmentation task as a contour finding problem by using optimization methods based on image information, such as active contour [8] and an active shape model [9]. Malpica et al. [10] proposed a coupled active contour model guided by optical flow estimates to track the myocardium in MCE. Pickard et al. [11] applied principal component analysis with an active shape model algorithm to model the shape variability; they proposed a specialized gradient vector flow field to guide the contours to the myocardial borders, Guo et al. [12] proposed an automatic myocardial segmentation method based on an active contours model and neutrosophic similarity score; they applied a clustering algorithm to detect the initial ventricle region to speed up the evolution procedure and increase accuracy. However, due to the low complexity of the traditional image segmentation algorithm, it does not perform well on the MCE myocardial segmentation task with a large intensity variation [13] and it still needs manual tracing of myocardial contours; in addition, the optimization algorithm would be easily stuck in the local optimal solution without a good manual initial contour. Machine learning algorithms for myocardial segmentation tasks are often defined as pixel-level classification tasks, known as semantic segmentation. Li et al. [13] combined a random forest with a shape model, achieving notable improvement in segmentation accuracy compared with the classic random forest and active shape model. In recent years, deep learning has shown superior performance and great potential in medical image analysis; the majority of these deep learning approaches in cardiac ultrasound focus on left ventricle segmentation. Azarmehr et al. [14] experimented with three deep learning left ventricle (LV) segmentation models (U-Net, SegNet and fully connected DenseNets) on 992 echocardiograms, the U-net model outperformed the other models and achieved an average dice coefficient of 0.93. Veni et al. [15] proposed a U-net combined with a shape-driven deformable model in the form of a level set. The U-net model is used to produce the segmentation of LV, which is considered as a prior shape; then, the prior shape drives the level set to converge the final shape; the model produced a 0.86 dice coefficient on a private 2D echocardiographic dataset. Hu et al. [16] proposed a segmentation model based on a bilateral segmentation network (BiSeNet); it consists of two paths, a spatial path for capturing low-level spatial features and a context path for exploiting high-level context semantic features; they also used a fusion module to fuse the features of those two paths, achieving a dice coefficient of 0.932 and 0.908 in the left ventricle and left atrium, respectively. To the best of our knowledge, Li et al. [17] was the first to apply deep learning methods to MCE segmentation; they proposed an encoder-decoder architecture based on a U-net, introduced a bi-directional training schema incorporating temporal information in MCE sequences and achieved the highest segmentation precision compared to the traditional U-net model.
However, we believe that the MCE segmentation accuracy still has great improvement space due to the rapid development of new deep learning algorithms. Among all of the deep learning algorithms, DeepLabv3+ [18] has become an excellent algorithm in the field of medical segmentation by virtue of its ability to extract multi-scale information and its encoder-decoder structure. Thus, in this paper, we propose a semantic segmentation method based on DeepLabV3+ to solve the segmentation problem in the MCE automatic perfusion quantification.
Li et al. [17] have made the MCE dataset publicly available; it consists of MCE data from 100 patients from Guangdong Provincial People's Hospital. Apical two-chamber view (A2C), apical three-chamber view (A3C) and apical four-chamber view (A4C) MCE data were collected from each patient. Every MCE sequence has 30 end-systolic frames. In summary, there are 100 (patients) × 3 (chamber views) × 30 (end-systolic frames) = 9000 frames. The manual annotations of the myocardium were performed by an experienced echocardiographer. We split the dataset into the training dataset and test dataset at a proportion of 7:3. The segmentation models were trained for each chamber view separately; the detailed data information is illustrated in Table 1.
Patient number | MCE sequence number | Frame number | |
training data | 70 | 210 | 6300 |
testing data | 30 | 90 | 2700 |
As shown in Figure 3, the segmentation model was modified based on Deeplabv3+; it consists of a dilated ResNet backbone to extract feature maps, an atrous spatial pyramid pooling (ASPP) module to convert feature maps into multi-scale information and a decode module to generate the final predictions.
The backbone network was based on a modified 101 depth ResNet [19]. First, we replaced the 7 × 7 convolution in the input stem with a 3 × 3 convolution to improve the performance and accelerate the training process [20]. the standard ResNet uses downsampling operations such as a convolutional layer with a stride greater than 1 to increase feature maps. However, it would cause receptive field reduction; thus, DeepLabV3+ utilizes dialated convolutions [21] to alleviate spatial information losses from downsampling operations, also known as atrous convolution. The implementation of atrous convolution involves adding zeros between weights in the convolutional kernel with a stride of 1, as shown in Figure 4. In this way, features can be extracted across pixels, increasing the receptive field without introducing redundant parameters that need to be learned. Figure 5 shows a comparison of the original ResNet and dilated ResNet in the final two groups of the ResNet.
The ASPP module applies atrous convolution to extract multi-scale information by using atrous convolution with different dilation factors. The ASPP module consists of one 1 × 1 standard convolution and 3 × 3 atrous convolutions in parallel. The original DeepLabV3+ model proposed dilation factors of 6, 12 and 18 for atrous convolutions; however, the original structure may not be suitable because the myocardial border is not very clear due to the huge intensity variation in MCE; so, we added convolution with dilation factor 4 to the ASPP module to obtain more detailed spatial information. In conclusion, the ASPP module gets five feature maps from five parallel atrous convolutions and concatenates them together; it then sends them to the decoder. The detailed architecture of the ASPP module is shown in Figure 3.
The decoder decodes features aggregated by the encoder at multiple levels and generates a semantic segmentation mask from high dimensional feature vectors. The decoder is simple but effective, and it is the most most significant improvement of the DeepLabV3+ compared to the predecessor DeepLabV3 [22]; in this way, the detailed boundaries of the myocardium can be recovered faster and stronger [18]. In the decoder module, the feature map from the encoder is first upsampled by 4 bilinearly and then concatenated with the lower-level feature map from the backbone after channel reduction from 1 × 1 convolution; since the lower-level feature map contains more spatial information, the fusion of the lower-level feature map and high-level feature map improves the segmentation accuracy. After a 3 × 3 convolution, the segmentation prediction is obtained by upsampling by 4 bilinearly.
The model was implemented by using a Pytorch 1.11.0 deep learning framework and trained using NVIDIA RTX 2060 SUPER with 8 GB of memory. All images were center cropped to 256 × 256 and RGB pixels were normalized using the following: mean = [0.485, 0.456, 0.406], standard deviation = [0.229, 0.224, 0.225].
For data augmentation, during the training phase, all images were randomly scaled by [0.8, 1.2], rotated by [-5◦, 5◦] and randomly flipped by a probability of 0.5. During the testing phase, we did not apply any augmentations.
Every model was trained for 80 epochs that contained 84000 iterations. Stochastic gradient descent was used as the optimizer, where the momentum was set to 0.9 and the weight decay was set to 3e-5. The initial learning rate was set to 0.01, and the minimum learning rate was set to 0.001 and followed the polynomial decay policy, which is defined as
lr=initial_lr×(1−iterationnum_iteration)power | (1) |
where iteration represents the current iteration, num_iteration represents the total iteration, initial_lr = 0.01, power = 0.9.
Moreover, we used dice loss [23] as our loss function due to the imbalance problem, because the myocardium is small compared to the large heart chamber; it is defined as
lossdice(P,T)=1−2|P∩T||P|+|T| | (2) |
where P is the predicted myocardium area and T is the ground truth of the myocardium area. In addition, we added an auxiliary loss [24] only in the training phase to help optimize the learning process.
The Dice coefficient and intersection over union (IoU) were used as evaluation criteria to evaluate the performance of the model; they are defined as
dice(P,T)=2|P∩T||P|+|T| | (3) |
IOU(P,T)=|P∩T|P∪T | (4) |
where P is the predicted myocardium area and T is the ground truth.
The visualization of the segmentation results are illustrated in Figure 6; six apical four-chamber view MCE images were randomly selected from a subject in the test dataset and input into our trained model. It can be seen from the figure that the proposed model gets the correct prediction of myocardium in the presence of the misleading structure, papillary muscle. The boundaries of the predicted segmentation area have a great match with the ground truth.
Moreover, we compared the modified DeepLabV3+ to the original DeepLabV3+, the results of Li et al. [17] and other state-of-the-art models, e.g., a U-net [25] with a Deeplabv3 backbone and PSPnet [24] with a ResNet-101 backbone; the results are shown in Table 2.
Modifed | Original |
||||
DeepLabV3+ | DeepLabV3+ | ||||
Dice | |||||
A2C | 0.84 | 0.84 | 0.81 | 0.83 | 0.82 |
A3C | 0.84 | 0.83 | 0.81 | 0.84 | 0.82 |
A4C | 0.86 | 0.84 | 0.82 | 0.83 | 0.80 |
IoU | |||||
A2C | 0.74 | 0.72 | 0.69 | 0.73 | 0.69 |
A3C | 0.72 | 0.71 | 0.65 | 0.72 | 0.70 |
A4C | 0.75 | 0.75 | 0.71 | 0.72 | 0.72 |
The modified DeepLabV3+ improved the segmentation results for the dice coefficient in A3C by 0.01 and in A4C by 0.02, and for the IoU in A2C by 0.02 and A3C by 0.01. Moreover, the modified DeepLabV3+ also outperformed other state-of-the-art models in both metrics.
To see the trade-off between model performance and complexity, we also tested the model with different depths of ResNet, i.e., 18 and 50, and compared it with the PSPnet and U-net. The number of parameters and GFlops (giga floating-point operations per second) were selected to indicate the model complexity, and the average IoU of all chamber views was selected to evaluate the model performance. In addition, the MCE frame per second processed (FPS) was evaluated on a personal computer with the RTX 2060 super and an Intel® Core™ i5-9600 processor; the results are shown in Table 3 and the comparisons of the number of parameters and GFlops to the average IoU are illustrated in Figure 7 and Figure 8, respectively.
No. of parameters | GFlops | Average IoU (%) | FPS | |
modified DeepLabV3 + (ResNet18) | 12.47 | 54.21 | 70.94 | 39.6 |
modified DeepLabV3 + (ResNet50) | 43.58 | 176.25 | 72.81 | 21.2 |
modified DeepLabV3 + (ResNet101) | 62.68 | 255.14 | 74.23 | 15.2 |
PSPnet (ResNet101) | 68.07 | 256.44 | 72.68 | 15.7 |
U-net | 29.06 | 203.43 | 71.02 | 20.5 |
Although the parameter count of PSPnet (ResNet101) increased 36.87% and GFlops increased 31.27% relative to the modified DeepLabV3+ (ResNet50), DeepLabV3+ (ResNet50) still outperformed PSPnet, which proves the efficiency of the proposed model. Comparing different ResNet backbone depths of the modified DeepLabV3+ and 18 depth only had 28.61% and 19.89% of the parameter count of the 50 depth and 101 depth, respectively, and 30.76% and 21.25% of GFlops of the 50 depth and 101 depth, respectively; it still had a 97.43% IoU for the 50 depth and 95.56% IoU for the 101 depth.
The feasibility of model application takes both performance and computational complexity into consideration; we believe balance can be made based on the depth of the backbone ResNet.
This paper proposed a modified architecture DeepLabV3+ model for MCE segmentation. The model consists of three main modules: the backbone, ASPP module and decoder. The backbone utilizes ResNet with atrous convolution, which allows the algorithm to find the best balance between the receptive field from a large field of view and the resolution of the feature map from a small field of view. The modified ASPP module also applies atrous convolution with different dilation factors to resample multi-scale patterns from the feature maps extracted from the backbone. The decoder combines the lower-level feature map from the encoder and high-level feature map from the decoder to generate the final prediction. A comparison between the proposed model and other state-of-the-art models, i.e., the PSPnet and U-net was conducted; the proposed model has achieved the best scores for both the dice and IoU. Moreover, we also did a performance and complexity analysis for all of the models, including the proposed model with different backbone depths, PSPnet and U-net; the results show the efficiency of the proposed architecture, and a comparison of the different depths of ResNet backbone illustrated the application feasibility of the proposed model. In the future study, we will focus on the balance of performance and complexity of the model to seek opportunities for the application in clinical analysis. Moreover, the Li et al. data only provides MCE frames rendered by a coloring mapping procedure; however, the rendering schema depends on settings from different companies and radiologists, so it will reduce the objectives and make the segmentation lose detail, which might bring variation to the algorithm's performance. Thus, we hope to experiment with our model on original MCE frames to improve the robustness and generalization ability in the future.
This work was supported by the Natural Science Foundation of China (NSFC) under grant number 62171408, and the Key Research and Development Program of Zhejiang Province (2020C03060, 2020C03016, 2022C03111).
The authors declare that there is no conflict of interest.
[1] |
J. Wu, W. Sun, S. F. Su, Y. Q. Wu, Adaptive quantized control for uncertain nonlinear systems with unknown control directions, Int. J. Robust Nonlinear Control, 31 (2021), 8658–8671. https://doi.org/10.1002/rnc.5748 doi: 10.1002/rnc.5748
![]() |
[2] |
A. Shatyrko, J. Diblík, D. Khusainov, M. Růžičková, Stabilization of Lur'e-type nonlinear control systems by Lyapunov-Krasovskii functionals, Adv. Diff. Equations, 2012 (2012), 1–9. https://doi.org/10.1186/1687-1847-2012-229 doi: 10.1186/1687-1847-2012-229
![]() |
[3] |
K. Tatsuya, Limit-cycle-like control for 2-dimensional discrete-time nonlinear control systems and its application to the Hénon map, Commun. Nonlinear Sci. Numer. Simul., 18 (2013), 171–183. https://doi.org/10.1016/j.cnsns.2012.06.012 doi: 10.1016/j.cnsns.2012.06.012
![]() |
[4] |
Y. H. Wei, Lyapunov stability theory for nonlinear nabla fractional order systems, IEEE Trans. Circuits Sys., 68 (2021), 3246–3250. https://doi.org/10.1109/TCSII.2021.3063914 doi: 10.1109/TCSII.2021.3063914
![]() |
[5] |
G. Pole, A. Girard, P. Tabuada, Approximately bisimilar symbolic models for nonlinear control systems, Automatica, 44 (2008), 2508–2516. https://doi.org/10.1016/j.automatica.2008.02.021 doi: 10.1016/j.automatica.2008.02.021
![]() |
[6] |
H. G. Zhang, X. Zhang, Y. H. Luo, J. Yang, An overview of research on adaptive dynamic programming, Acta Autom. Sin., 39 (2013), 303–311. https://doi.org/10.1016/S1874-1029(13)60031-2 doi: 10.1016/S1874-1029(13)60031-2
![]() |
[7] |
M. Volckaert, M. Diehl, J. Swevers, Generalization of norm optimal ILC for nonlinear systems with constraints, Mech. Syst. Signal Proc., 39 (2013), 280–296. https://doi.org/10.1016/j.ymssp.2013.03.009 doi: 10.1016/j.ymssp.2013.03.009
![]() |
[8] |
W. N. Gao, Z. P. Jiang, Nonlinear and adaptive suboptimal control of connected vehicles: A global adaptive dynamic programming approach, J. Intell. Rob. Syst., 85 (2017), 597–611. http://doi.org/10.1007/s10846-016-0395-3 doi: 10.1007/s10846-016-0395-3
![]() |
[9] |
E. Trélat, Optimal control and applications to aerospace: Some results and challenges, J. Optim. Theory Appl., 154 (2012), 713–758. https://doi.org/10.1007/s10957-012-0050-5 doi: 10.1007/s10957-012-0050-5
![]() |
[10] |
M. Margaliot, Stability analysis of switched systems using variational principles: An introduction, Automatica, 42 (2006), 2059–2077. https://doi.org/10.1016/j.automatica.2006.06.020 doi: 10.1016/j.automatica.2006.06.020
![]() |
[11] |
A. Maidi, J. P. Corriou, Open-loop optimal controller design using variational iteration method, Appl. Math. Comput., 219 (2013), 8632–8645. https://doi.org/10.1016/j.amc.2013.02.075 doi: 10.1016/j.amc.2013.02.075
![]() |
[12] |
F. H. Clarke, R. B. Vinter, The relationship between the maximum principle and dynamic programming, SIAM J. Control Optim., 25 (1987), 1291–1311. http://doi.org/10.1137/0325071 doi: 10.1137/0325071
![]() |
[13] |
R. W. Beard, G. N. Saridis, J. T. Wen, Approximate solutions to the time-invariant Hamilton–Jacobi–Bellman equation, J. Optim. Theory Appl., 96 (1998), 589–626. http://doi.org/10.1023/A:1022664528457 doi: 10.1023/A:1022664528457
![]() |
[14] |
J. A. Roubos, S. Mollov, R. Babuška, H. B. Verbruggen, Fuzzy model-based predictive control using Takagi–Sugeno models, Int. J. Approximate Reasoning, 22 (1999), 3–30. http://doi.org/10.1016/S0888-613X(99)00020-1 doi: 10.1016/S0888-613X(99)00020-1
![]() |
[15] |
D. A. Bristow, M. Tharayil, A. G. Alleyne, A survey of iterative learning control, IEEE Control Syst. Mag., 26 (2006), 96–114. https://doi.org/10.1109/MCS.2006.1636313 doi: 10.1109/MCS.2006.1636313
![]() |
[16] | P. J. Werbos, W. T. Miller, R. S. Sutton, A menu of designs for reinforcement learning over time, Neural networks for control, MIT press, Cambridge, (1990), 67–95. |
[17] |
J. Wang, R. Y. K. Fung, Adaptive dynamic programming algorithms for sequential appointment scheduling with patient preferences, Artif. Intell. Med., 63 (2015), 33–40. https://doi.org/10.1016/j.artmed.2014.12.002 doi: 10.1016/j.artmed.2014.12.002
![]() |
[18] |
D. V. Prokhorov, D. C. Wunsch, Adaptive critic designs, IEEE Trans. Neural Networks, 8 (1997), 997–1007. http://doi.org/10.1109/72.623201 doi: 10.1109/72.623201
![]() |
[19] |
J. J. Murray, C. J. Cox, G. G. Lendaris, R. Saeks, Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern., 32 (2002), 140–153. http://doi.org/10.1109/TSMCC.2002.801727 doi: 10.1109/TSMCC.2002.801727
![]() |
[20] |
H. G. Zhang, Q. L. Wei, D. R. Liu, An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games, Automatica, 47 (2011), 207–214. http://doi.org/10.1016/j.automatica.2010.10.033 doi: 10.1016/j.automatica.2010.10.033
![]() |
[21] |
Q. L. Wei, H. G. Zhang, D. R. Liu, Y. Zhao, An optimal control scheme for a class of discrete-time nonlinear systems with time delays using adaptive dynamic programming, Acta Autom. Sin., 36 (2010), 121–129. http://doi.org/10.1016/S1874-1029(09)60008-2 doi: 10.1016/S1874-1029(09)60008-2
![]() |
[22] |
J. Ding, S. N. Balakrishnan, Approximate dynamic programming solutions with a single network adaptive critic for a class of nonlinear systems, J. Control Theory Appl., 9 (2011), 370–380. http://doi.org/10.1007/s11768-011-0191-3 doi: 10.1007/s11768-011-0191-3
![]() |
[23] | D. R. Liu, D. Wang, D. B. Zhao, Adaptive dynamic programming for optimal control of unknown nonlinear discrete-time systems, in 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) IEEE, (2011), 242–249. https://doi.org/10.1109/ADPRL.2011.5967357 |
[24] |
J. Modayil, A. White A, R. S. Sutton, Multi-timescale nexting in a reinforcement learning robot, Adapt. Behav., 22 (2014), 146–160. http://doi.org/10.1177/1059712313511648 doi: 10.1177/1059712313511648
![]() |
[25] |
C. X. Mu, Y. Zhang, Z. K. Gao, C. Y. Sun, ADP-based robust tracking control for a class of nonlinear systems with unmatched uncertainties, IEEE Trans. Syst. Man Cybern. Syst., 50 (2019), 4056–4067. http://doi.org/10.1109/TSMC.2019.2895692 doi: 10.1109/TSMC.2019.2895692
![]() |
[26] |
H. Y. Dong, X. W. Zhao, B. Luo, Optimal tracking control for uncertain nonlinear systems with prescribed performance via critic-only ADP, IEEE Trans. Syst. Man Cybern. Syst., 52 (2020), 561–573. https://doi.org/10.1109/TSMC.2020.3003797 doi: 10.1109/TSMC.2020.3003797
![]() |
[27] |
R. Z. Song, L. Zhu, Optimal fixed-point tracking control for discrete-time nonlinear systems via ADP, IEEE/CAA J. Autom. Sin., 6 (2019), 657–666. https://doi.org/10.1109/JAS.2019.1911453 doi: 10.1109/JAS.2019.1911453
![]() |
[28] |
M. M. Liang, Q. L. Wei, A partial policy iteration ADP algorithm for nonlinear neuro-optimal control with discounted total reward, Neurocomputing, 424 (2021), 23–34. https://doi.org/10.1016/j.neucom.2020.11.014 doi: 10.1016/j.neucom.2020.11.014
![]() |
[29] |
B. Fan, Q. M. Yang, X. Y. Tang, Y. X. Sun, Robust ADP design for continuous-time nonlinear systems with output constraints, IEEE Trans. Neural Networks Learn. Syst., 29 (2018), 2127–2138. https://doi.org/10.1109/TNNLS.2018.2806347 doi: 10.1109/TNNLS.2018.2806347
![]() |
[30] |
X. Yang, H. B. He, Self-learning robust optimal control for continuous-time nonlinear systems with mismatched disturbances, Neural Networks, 99 (2018), 19–30. https://doi.org/10.1016/j.neunet.2017.11.022 doi: 10.1016/j.neunet.2017.11.022
![]() |
[31] |
D. R. Liu, X. Yang, D. Wang, Q. L. Wei, Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints, IEEE Trans. Cybern., 45 (2015), 1372–1385. http://doi.org/10.1109/TCYB.2015.2417170 doi: 10.1109/TCYB.2015.2417170
![]() |
[32] |
X. Yang, D. R. Liu, D. Wang, Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints, Int. J. Control, 87 (2014), 553–566. https://doi.org/10.1080/00207179.2013.848292 doi: 10.1080/00207179.2013.848292
![]() |
[33] |
J. G. Zhao, M. G. Gan, Finite-horizon optimal control for continuous-time uncertain nonlinear systems using reinforcement learning, Int. J. Syst. Sci., 51 (2020), 2429–2440. https://doi.org/10.1080/00207721.2020.1797223 doi: 10.1080/00207721.2020.1797223
![]() |
[34] |
B. Zhao, D. R. Liu, C. M. Luo, Reinforcement learning-based optimal stabilization for unknown nonlinear systems subject to inputs with uncertain constraints, IEEE Trans. Neural Networks Learn. Syst., 31 (2019), 4330–4340. https://doi.org/10.1109/TNNLS.2019.2954983 doi: 10.1109/TNNLS.2019.2954983
![]() |
[35] |
D. Wang, J. F. Qiao, Approximate neural optimal control with reinforcement learning for a torsional pendulum device, Neural Networks, 117 (2019), 1–7. https://doi.org/10.1016/j.neunet.2019.04.026 doi: 10.1016/j.neunet.2019.04.026
![]() |
[36] |
J. W. Kim, B. J. Park, H. Yoo, T. H. Oh, J. H. Lee, J. M. Lee, A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system, J. Proc. Control, 87 (2020), 166–178. https://doi.org/10.1016/j.jprocont.2020.02.003 doi: 10.1016/j.jprocont.2020.02.003
![]() |
[37] |
F. Y. Wang, N. Jin, D. R. Liu, Q. L. Wei, Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with epsilon-error bound, IEEE Trans. Neural Networks, 22 (2010), 24–36. https://doi.org/10.1109/TNN.2010.2076370 doi: 10.1109/TNN.2010.2076370
![]() |
[38] |
K. G. Vamvoudakis, F. L. Lewis, Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton–Jacobi equations, Automatica, 47 (2011), 1556–1569. https://doi.org/10.1016/j.automatica.2011.03.005 doi: 10.1016/j.automatica.2011.03.005
![]() |
[39] |
Q. L. Wei, D. R. Liu, An iterative epsilon-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state, Neural Networks, 32 (2012), 236–244. https://doi.org/10.1007/978-981-10-4080-1_2 doi: 10.1007/978-981-10-4080-1_2
![]() |
[40] |
D. R. Liu, Q. L. Wei, P. F. Yan, Generalized policy iteration adaptive dynamic programming for discrete-time nonlinear systems, IEEE Trans. Syst. Man Cybern. Syst., 45 (2015), 1577–1591. https://doi.org/10.1109/TSMC.2015.2417510 doi: 10.1109/TSMC.2015.2417510
![]() |
[41] |
S. H. Li, H. B. Du, X. H. Yu, Discrete-time terminal sliding mode control systems based on euler's discretization, IEEE Trans. Autom. Control, 59 (2013), 546–552. https://doi.org/10.1109/TAC.2013.2273267 doi: 10.1109/TAC.2013.2273267
![]() |
[42] | D. Bertsekas, Dynamic Programming and Optimal Control: Volume I, Athena scientific, 2012. |
[43] |
C. J. C. H. Watkins, P. Dayan, Q-learning, Mach. Learn., 8 (1992), 279–292. https://doi.org/10.1007/BF00992698 doi: 10.1007/BF00992698
![]() |
[44] | A. Y. Ng, D. Harada, S. Russell, Policy invariance under reward transformations: Theory and application to reward shaping, LCML, 99 (1999), 278–287. |
[45] | L. Buşoniu, B. D. Schutter, R. Babuška, Approximate dynamic programming and reinforcement learning, in Interactive collaborative information systems, (2010), 3–44. https://doi.org/10.1007/978-3-642-11688-9_1 |
[46] |
T. Aotani, T. Kobayashi, K. Sugimoto, Bottom-up multi-agent reinforcement learning by reward shaping for cooperative-competitive tasks, Appl. Intell., 51 (2021), 4434–4452. https://doi.org/10.1007/s10489-020-02034-2 doi: 10.1007/s10489-020-02034-2
![]() |
[47] |
C. HolmesParker, A. K. Agogino, K. Tumer, Combining reward shaping and hierarchies for scaling to large multiagent systems, Knowl. Eng. Rev., 31 (2016), 3–18. https://doi.org/10.1017/S0269888915000156 doi: 10.1017/S0269888915000156
![]() |
[48] |
P. Mannion, S. Devlin, K. Mason, J. Duggan, E. Howley, Policy invariance under reward transformations for multi-objective reinforcement learning, Neurocomputing, 263 (2017), 60–73. https://doi.org/10.1016/j.neucom.2017.05.090 doi: 10.1016/j.neucom.2017.05.090
![]() |
[49] |
P. Mannion, S. Devlin, J. Duggan, E. Howley, Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning, Knowl. Eng. Rev., 33 (2018). https://doi.org/10.1017/S0269888918000292 doi: 10.1017/S0269888918000292
![]() |
[50] |
C. Y. Hu, A confrontation decision-making method with deep reinforcement learning and knowledge transfer for multi-agent system, Symmetry, 12 (2020), 631. https://doi.org/10.3390/sym12040631 doi: 10.3390/sym12040631
![]() |
[51] |
A. G. Barto, R. S. Sutton, C. W. Anderson, Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Trans. Syst. Man Cybern. Syst., 5 (1983), 834–846. https://doi.org/10.1109/TSMC.1983.6313077 doi: 10.1109/TSMC.1983.6313077
![]() |
[52] |
L. B. Prasad, B. Tyagi, H. O. Gupta, Optimal control of nonlinear inverted pendulum system using PID controller and LQR: Performance analysis without and with disturbance input, Int. J. Autom. Comput., 11 (2014), 661–670. https://doi.org/10.1007/s11633-014-0818-1 doi: 10.1007/s11633-014-0818-1
![]() |
[53] |
V. Mnih, K. Kavukcuoglu, D. Silver, J. Veness, A. Graves, M. Riedmiller, et al, Human-level control through deep reinforcement learning, Nature, 518 (2015), 529–533. https://doi.org/10.1038/nature14236 doi: 10.1038/nature14236
![]() |
[54] | T. d. Bruin, J. Kober, K. Tuyls, R. Babuˇska, Experience selection in deep reinforcement learning for control, J. Mach. Learn. Res., 19 (2018). |
[55] | B. C. Stadie, S. Levine, p. Abbeel, Incentivizing exploration in reinforcement learning with deep predictive models, preprint, arXiv: 1507.00814. |
[56] | Z. L. Ning, P. R. Dong, X. J. Wang, JJPC. Rodrigues, F. Xia, Deep reinforcement learning for vehicular edge computing: An intelligent offloading system, in ACM Transactions on Intelligent Systems and Technology, 10 (2019), 1–24. https://doi.org/10.1145/3317572 |
[57] |
H. Yoo, B. Kim, J. W. Kim, J. H. Lee, Reinforcement learning based optimal control of batch processes using Monte-Carlo deep deterministic policy gradient with phase segmentation, Comput. Chem. Eng., 144 (2021), 107133. https://doi.org/10.1016/j.compchemeng.2020.107133 doi: 10.1016/j.compchemeng.2020.107133
![]() |
[58] | T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, et al, Continuous control with deep reinforcement learning, preprint, arXiv: 1509.02971. |
[59] | S. Satheeshbabu, N. K. Uppalapati, T. Fu, G. Krishnan, Continuous control of a soft continuum arm using deep reinforcement learning, in 2020 3rd IEEE International Conference on Soft Robotics (RoboSoft), IEEE, (2020), 497–503. https://doi.org/10.1109/RoboSoft48309.2020.9116003 |
[60] |
Y. Ma, W. B. Zhu, M. G. Benton, J. Romagnoli, Continuous control of a polymerization system with deep reinforcement learning, J. Proc. Control, 75 (2019), 40–47. https://doi.org/10.1016/j.jprocont.2018.11.004 doi: 10.1016/j.jprocont.2018.11.004
![]() |
[61] |
R. B. Zmood, The euclidean space controllability of control systems with delay, SIAM J. Control, 12 (1974), 609–623. https://doi.org/10.1137/0312045 doi: 10.1137/0312045
![]() |
1. | Yuchen Wu, Jin Li, Junkai Yang, 2023, Using Improved DeepLabV3+ for Complex Scene Segmentation, 979-8-3503-0562-3, 855, 10.1109/AUTEEE60196.2023.10408693 | |
2. | Rongpu Cui, Shichu Liang, Weixin Zhao, Zhiyue Liu, Zhicheng Lin, Wenfeng He, Yujun He, Chaohui Du, Jian Peng, He Huang, A Shape-Consistent Deep-Learning Segmentation Architecture for Low-Quality and High-Interference Myocardial Contrast Echocardiography, 2024, 50, 03015629, 1602, 10.1016/j.ultrasmedbio.2024.06.001 | |
3. | Tomonari Yamada, Takaaki Yoshimura, Shota Ichikawa, Hiroyuki Sugimori, Improving Cerebrovascular Imaging with Deep Learning: Semantic Segmentation for Time-of-Flight Magnetic Resonance Angiography Maximum Intensity Projection Image Enhancement, 2025, 15, 2076-3417, 3034, 10.3390/app15063034 | |
4. | Saurabhi Samant, Anastasios Nikolaos Panagopoulos, Wei Wu, Shijia Zhao, Yiannis S. Chatzizisis, Artificial Intelligence in Coronary Artery Interventions: Preprocedural Planning and Procedural Assistance, 2025, 4, 27729303, 102519, 10.1016/j.jscai.2024.102519 |
Patient number | MCE sequence number | Frame number | |
training data | 70 | 210 | 6300 |
testing data | 30 | 90 | 2700 |
Modifed | Original |
||||
DeepLabV3+ | DeepLabV3+ | ||||
Dice | |||||
A2C | 0.84 | 0.84 | 0.81 | 0.83 | 0.82 |
A3C | 0.84 | 0.83 | 0.81 | 0.84 | 0.82 |
A4C | 0.86 | 0.84 | 0.82 | 0.83 | 0.80 |
IoU | |||||
A2C | 0.74 | 0.72 | 0.69 | 0.73 | 0.69 |
A3C | 0.72 | 0.71 | 0.65 | 0.72 | 0.70 |
A4C | 0.75 | 0.75 | 0.71 | 0.72 | 0.72 |
No. of parameters | GFlops | Average IoU (%) | FPS | |
modified DeepLabV3 + (ResNet18) | 12.47 | 54.21 | 70.94 | 39.6 |
modified DeepLabV3 + (ResNet50) | 43.58 | 176.25 | 72.81 | 21.2 |
modified DeepLabV3 + (ResNet101) | 62.68 | 255.14 | 74.23 | 15.2 |
PSPnet (ResNet101) | 68.07 | 256.44 | 72.68 | 15.7 |
U-net | 29.06 | 203.43 | 71.02 | 20.5 |
Patient number | MCE sequence number | Frame number | |
training data | 70 | 210 | 6300 |
testing data | 30 | 90 | 2700 |
Modifed | Original |
||||
DeepLabV3+ | DeepLabV3+ | ||||
Dice | |||||
A2C | 0.84 | 0.84 | 0.81 | 0.83 | 0.82 |
A3C | 0.84 | 0.83 | 0.81 | 0.84 | 0.82 |
A4C | 0.86 | 0.84 | 0.82 | 0.83 | 0.80 |
IoU | |||||
A2C | 0.74 | 0.72 | 0.69 | 0.73 | 0.69 |
A3C | 0.72 | 0.71 | 0.65 | 0.72 | 0.70 |
A4C | 0.75 | 0.75 | 0.71 | 0.72 | 0.72 |
No. of parameters | GFlops | Average IoU (%) | FPS | |
modified DeepLabV3 + (ResNet18) | 12.47 | 54.21 | 70.94 | 39.6 |
modified DeepLabV3 + (ResNet50) | 43.58 | 176.25 | 72.81 | 21.2 |
modified DeepLabV3 + (ResNet101) | 62.68 | 255.14 | 74.23 | 15.2 |
PSPnet (ResNet101) | 68.07 | 256.44 | 72.68 | 15.7 |
U-net | 29.06 | 203.43 | 71.02 | 20.5 |