In this study, we address a value function reconstruction problem in optimal output tracking of continuous-time linear time-invariant systems. The objective is to determine the state weight matrix within the value function that results in the specified optimal control law. First, by augmenting the system state and the desired tracking dynamic variable, the optimal tracking problem with a discount value function is turned into a linear regulator problem. Inverse optimal output tracking is thus simplified as an inverse optimal control of the augmented system. Second, a model-based inverse reinforcement learning algorithm is suggested to calculate the state weight matrix in the augmented value function. This algorithm updates the cost matrix via gradient descent and calculates the weight matrix through inverse optimal control. After continuous iterations, the weight matrix converges to a steady state. Third, astringency of the algorithm is rigorously analyzed, and the stability of the corresponding system is confirmed. Finally, the proposed algorithm's effectiveness is confirmed through simulation, illustrating that the system output asymptotically tracks a predetermined reference trajectory.
Citation: Yi Mo, Jingling Zhao, Kunyu Xiang, Dengguo Xu. Inverse optimal output tracking for continuous-time linear systems based on inverse reinforcement learning[J]. AIMS Mathematics, 2026, 11(4): 9284-9302. doi: 10.3934/math.2026383
In this study, we address a value function reconstruction problem in optimal output tracking of continuous-time linear time-invariant systems. The objective is to determine the state weight matrix within the value function that results in the specified optimal control law. First, by augmenting the system state and the desired tracking dynamic variable, the optimal tracking problem with a discount value function is turned into a linear regulator problem. Inverse optimal output tracking is thus simplified as an inverse optimal control of the augmented system. Second, a model-based inverse reinforcement learning algorithm is suggested to calculate the state weight matrix in the augmented value function. This algorithm updates the cost matrix via gradient descent and calculates the weight matrix through inverse optimal control. After continuous iterations, the weight matrix converges to a steady state. Third, astringency of the algorithm is rigorously analyzed, and the stability of the corresponding system is confirmed. Finally, the proposed algorithm's effectiveness is confirmed through simulation, illustrating that the system output asymptotically tracks a predetermined reference trajectory.
| [1] |
Y. Zhang, X. Yan, W. Zou, Z. Xiang, Fuzzy optimal tracking control for autonomous surface vehicles with prescribed-time convergence analysis, IEEE Trans. Fuzzy Syst., 32 (2024), 6523–6533. https://doi.org/10.1109/TFUZZ.2024.3451493 doi: 10.1109/TFUZZ.2024.3451493
|
| [2] |
W. Tan, R. Luo, Z. Peng, Q. Ling, Online adaptive optimal control algorithm based on weighted policy iteration, IEEE Trans. Neur. Net. Lear., 36 (2025), 15723–15734. https://doi.org/10.1109/TNNLS.2025.3564329 doi: 10.1109/TNNLS.2025.3564329
|
| [3] |
L. Cui, B. Pang, M. Krstić, Z. Jiang, Learning-based adaptive optimal control of linear time-delay systems: a value iteration approach, Automatica, 171 (2025), 111944. https://doi.org/10.1016/j.automatica.2024.111944 doi: 10.1016/j.automatica.2024.111944
|
| [4] |
D. Wang, J. Wang, D. Liu, J. Qiao, General multistep value iteration for optimal learning control, Automatica, 175 (2025), 112168. https://doi.org/10.1016/j.automatica.2025.112168 doi: 10.1016/j.automatica.2025.112168
|
| [5] |
R. Bellman, R. E. Kalaba, An inverse problem in dynamic programming and automatic control, J. Math. Anal. Appl., 7 (1963), 322-325. https://doi.org/10.1016/0022-247X(63)90056-8 doi: 10.1016/0022-247X(63)90056-8
|
| [6] |
R. E. Kalman, When is a linear control system optimal? J. Basic Eng., 86 (1964), 51–60. https://doi.org/10.1115/1.3653115 doi: 10.1115/1.3653115
|
| [7] |
T. L. Molloy, J. J. Ford, T. Perez, Finite-horizon inverse optimal control for discrete-time nonlinear systems, Automatica, 87 (2018), 442–446. https://doi.org/10.1016/j.automatica.2017.09.023 doi: 10.1016/j.automatica.2017.09.023
|
| [8] |
M. Krstíc, Z. Li, Inverse optimal design of input-to-state stabilizing nonlinear controllers, IEEE Trans. Automat. Contr., 43 (1998), 336–350. https://doi.org/10.1109/9.661589 doi: 10.1109/9.661589
|
| [9] | T. Rajpurohit, W. M. Haddad, Nonlinear-nonquadratic optimal and inverse optimal control for stochastic dynamical systems, Int. J. Robust Nonlin., 27 (2017), 4723–4751. |
| [10] |
N. Yokoyama, Inference of aircraft intent via inverse optimal control including second-order optimality condition, J. Guid. Control Dynam., 41 (2018), 1–11. https://doi.org/10.2514/1.G002792 doi: 10.2514/1.G002792
|
| [11] |
K. Lu, S. Han, J. Yang, H. Yu, Inverse optimal adaptive tracking control of robotic manipulators driven by compliant actuators, IEEE Trans. Ind. Electron., 71 (2024), 6139–6149. https://doi.org/10.1109/TIE.2023.3296831 doi: 10.1109/TIE.2023.3296831
|
| [12] |
J. Zhao, D. Xu, X. Zhang, X. Li, Inverse optimal tracking control for AC/DC converter based on inverse reinforcement learning, Int. J. Control Autom. Syst., 24 (2026), 320–331. https://doi.org/10.1007/s12555-026-00015-8 doi: 10.1007/s12555-026-00015-8
|
| [13] | A. Y. Ng, S. Russell, Algorithms for inverse reinforcement learning, Proceedings of the Seventeenth International Conference on Machine Learning, 2000,663–670. |
| [14] |
P. Abbeel, A. Y. Ng, Apprenticeship learning via inverse reinforcement learning, Proceedings of the twenty-first International Conference on Machine Learning, 2004, 1. https://doi.org/10.1145/1015330.1015430 doi: 10.1145/1015330.1015430
|
| [15] |
J. Zheng, S. Liu, L. M. Ni, Robust bayesian inverse reinforcement learning with sparse behavior noise, Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014, 2198–2205. https://doi.org/10.1609/aaai.v28i1.8979 doi: 10.1609/aaai.v28i1.8979
|
| [16] | L. Yu, J. Song, S. Ermon, Multi-agent adversarial inverse reinforcement learning, Proceedings of the 36th International Conference on Machine Learning, 2019, 7194–7201. |
| [17] | G. Kalweit, M. Huegle, M. Werling, J. Boedecker, Deep inverse Q-learning with constraints, Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020, 14291–14302. |
| [18] |
S. Choi, S. Kim, H. Jin Kim, Inverse reinforcement learning control for trajectory tracking of a multirotor UAV, Int. J. Control Autom. Syst., 15 (2017), 1826–1834. https://doi.org/10.1007/s12555-015-0483-3 doi: 10.1007/s12555-015-0483-3
|
| [19] |
C. You, J. Lu, D. Filev, P. Tsiotras, Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning, Robot. Auton. Syst., 114 (2019), 1–18. https://doi.org/10.1016/j.robot.2019.01.003 doi: 10.1016/j.robot.2019.01.003
|
| [20] |
Q. Tang, H. Guo, Q. Chen, Multi-market bidding behavior analysis of energy storage system based on inverse reinforcement learning, IEEE Trans. Power Syst., 37 (2022), 4819–4831. https://doi.org/10.1109/TPWRS.2022.3150518 doi: 10.1109/TPWRS.2022.3150518
|
| [21] |
J. Zhang, X. Wang, L. Li, G. Qu, L. Wan, Parameter recovery of neural network-based hammerstein system via immersion and invariance adaptive optimization scheme, Expert Syst. Appl., 274 (2025), 127069. https://doi.org/10.1016/j.eswa.2025.127069 doi: 10.1016/j.eswa.2025.127069
|
| [22] |
B. Lian, W. Xue, F. Lewis, T. Chai, Online inverse reinforcement learning for nonlinear systems with adversarial attacks, Int. J. Robust Nonlin. Control, 31 (2021), 6646–6667. https://doi.org/10.1002/rnc.5626 doi: 10.1002/rnc.5626
|
| [23] |
W. Xue, B. Lian, J. Fan, P. Kolaric, T. Chai, F. Lewis, Inverse reinforcement Q-learning through expert imitation for discrete-time systems, IEEE Trans. Neur. Net. Lear., 34 (2023), 2386–2399. https://doi.org/10.1109/TNNLS.2021.3106635 doi: 10.1109/TNNLS.2021.3106635
|
| [24] |
Z. Sun, G. Jia, Inverse reinforcement learning by expert imitation for the stochastic linear-quadratic optimal control problem, Neurocomputing, 633 (2025), 129758. https://doi.org/10.1016/j.neucom.2025.129758 doi: 10.1016/j.neucom.2025.129758
|
| [25] |
J. Huang, D. Xu, Y. Li, X. Zhang, J. Zhao, Inverse reinforcement learning for discrete-time linear systems based on inverse optimal control, ISA Trans., 163 (2025), 108–119. https://doi.org/10.1016/j.isatra.2025.04.027 doi: 10.1016/j.isatra.2025.04.027
|
| [26] |
Z. Pang, S. Tang, J. Cheng, S. He, Scaling policy iteration based reinforcement learning for unknown discrete-time linear systems, Automatica, 176 (2025), 112227. https://doi.org/10.1016/j.automatica.2025.112227 doi: 10.1016/j.automatica.2025.112227
|
| [27] |
H. Wu, Q. Hu, J. Zheng, F. Dong, Z. Ouyang, D. Li, Discounted inverse reinforcement learning for linear quadratic control, IEEE Trans. Cybernetics, 55 (2025), 1995–2007. https://doi.org/10.1109/TCYB.2025.3540967 doi: 10.1109/TCYB.2025.3540967
|
| [28] |
M. Ghiyasi, L. Zhao, W. Zhu, Non-cooperative two-stage inverse DEA: a Stackelberg games approach for the efficiency analysis of China's regional economic development and people's living standards, Ann. Oper. Res., 346 (2025), 2035–2063. https://doi.org/10.1007/s10479-025-06489-9 doi: 10.1007/s10479-025-06489-9
|
| [29] |
H. J. Asl, E. Uchibe, Inverse reinforcement learning methods for linear differential games, Syst. Control Lett., 193 (2024), 105936. https://doi.org/10.1016/j.sysconle.2024.105936 doi: 10.1016/j.sysconle.2024.105936
|
| [30] |
E. Martirosyan, M. Cao, Reinforcement learning for inverse linear-quadratic dynamic non-cooperative games, Syst. Control Lett., 191 (2024), 105883. https://doi.org/10.1016/j.sysconle.2024.105883 doi: 10.1016/j.sysconle.2024.105883
|
| [31] |
B. Lian, W. Xue, F. Lewis, A. Davoudi, Inverse value iteration and Q-learning: algorithms, stability, and robustness, IEEE Trans. Neur. Net. Lear., 36 (2025), 6970–6980. https://doi.org/10.1109/TNNLS.2024.3409182 doi: 10.1109/TNNLS.2024.3409182
|
| [32] |
B. Lian, W. Xue, F. Lewis, A. Davoudi, Inverse Q-learning using input-output data, IEEE Trans. Cybernetics, 54 (2024), 728–738. https://doi.org/10.1109/TCYB.2023.3338197 doi: 10.1109/TCYB.2023.3338197
|
| [33] |
B. Lian, W. Xue, Y. Xie, F. Lewis, A. Davoudi, Off policy inverse Q-learning for discrete-time antagonistic unknown systems, Automatica, 155 (2023), 111171. https://doi.org/10.1016/j.automatica.2023.111171 doi: 10.1016/j.automatica.2023.111171
|
| [34] |
G. Zong, R. Liu, H. Xie, Y. Wang, X. Zhao, Observer-based adaptive NN tracking control for nonlinear NCSs under intermittent DoS attacks: a finite-time prescribed performance method, IEEE Trans. Syst. Man Cy., 55 (2025), 2322–2331. https://doi.org/10.1109/TSMC.2024.3521025 doi: 10.1109/TSMC.2024.3521025
|
| [35] |
H. Modares, F. L. Lewis, Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning, IEEE Trans. Automat. Contr., 59 (2014), 3051–3056. https://doi.org/10.1109/TAC.2014.2317301 doi: 10.1109/TAC.2014.2317301
|
| [36] |
W. Xue, B. Lian, J. Fan, T. Chai, F. Lewis, Inverse reinforcement learning for trajectory imitation using static output feedback control, IEEE Trans. Cybernetics, 54 (2024), 1695–1707. https://doi.org/10.1109/TCYB.2023.3241015 doi: 10.1109/TCYB.2023.3241015
|
| [37] |
E. Masero, G. Mussita, A. La Bella, R. Scattolini, A novel reinforcement learning-based approach for optimal control: an application to multi-tank water systems, Eng. Appl. Artif. Intel., 162 (2025), 112407. https://doi.org/10.1016/j.engappai.2025.112407 doi: 10.1016/j.engappai.2025.112407
|
| [38] |
G. Wang, D. Wang, H. Lin, J. Wang, X. Yi, A DC error suppression adaptive second-order backstepping observer for sensorless control of PMSM, IEEE Trans. Power Electr., 39 (2024), 6664–6676. https://doi.org/10.1109/TPEL.2024.3367326 doi: 10.1109/TPEL.2024.3367326
|
| [39] | D. P. Bertsekas, Nonlinear programming, Belmont: Athena Scientific, 1995. |