A finite-time Q-Learning algorithm with finite-time constraints

Peng Miao; Peng Miao

doi:10.3934/math.20251038

AIMS Mathematics

2025, Volume 10, Issue 10: 23380-23393. doi: 10.3934/math.20251038

Previous Article Next Article

Research article

A finite-time Q-Learning algorithm with finite-time constraints

Peng Miao ^{1,2
,
,}

1.
Department of Basic Courses, Zhengzhou University of Science and Technology, Zhengzhou, 450064, China
2.
School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, 450001, China

Received: 14 August 2025 Revised: 18 September 2025 Accepted: 09 October 2025 Published: 15 October 2025
MSC : 93D40, 93D05, 93E20, 68T20

In this paper, a finite-time Q-Learning algorithm is designed and applied to the mountain car problem. Compared with traditional Q-Learning algorithms, our designed finite-time Q-Learning algorithm can achieve learning objectives more rapidly. It is widely recognized that the operational efficiency of the Q-Learning algorithm heavily depends on the capabilities of the underlying hardware, and computational processes often consume a considerable amount of time. To reduce the time overhead associated with Q-Learning execution, this study utilizes the theoretical framework of finite-time stability to devise a novel Q-Learning algorithm. This innovative approach has been effectively implemented to tackle challenges related to the mountain car problem. Simulation results show a significant reduction in training completion time, along with a substantial increase in the subsequent success rate of the algorithm's performance.
- finite-time,
- Q-Learning algorithm,
- mountain car,
- parameter selection strategy
Citation: Peng Miao. A finite-time Q-Learning algorithm with finite-time constraints[J]. AIMS Mathematics, 2025, 10(10): 23380-23393. doi: 10.3934/math.20251038

Related Papers:

Abstract

In this paper, a finite-time Q-Learning algorithm is designed and applied to the mountain car problem. Compared with traditional Q-Learning algorithms, our designed finite-time Q-Learning algorithm can achieve learning objectives more rapidly. It is widely recognized that the operational efficiency of the Q-Learning algorithm heavily depends on the capabilities of the underlying hardware, and computational processes often consume a considerable amount of time. To reduce the time overhead associated with Q-Learning execution, this study utilizes the theoretical framework of finite-time stability to devise a novel Q-Learning algorithm. This innovative approach has been effectively implemented to tackle challenges related to the mountain car problem. Simulation results show a significant reduction in training completion time, along with a substantial increase in the subsequent success rate of the algorithm's performance.

References

[1]	E. S. Low, P. Ong, K. C. Cheah, Solving the optimal path planning of a mobile robot using improved Q-learning, Robot. Auton. Syst., 115 (2019), 143–161. https://doi.org/10.1016/j.robot.2019.02.013 doi: 10.1016/j.robot.2019.02.013
[2]	Q. Zhou, Y. Lian, J. Wu, M. Zhu, H. Wang, J. Cao, An optimized Q-Learning algorithm for mobile robot local path planning, Knowl.-Based Syst., 286 (2024), 111400. https://doi.org/10.1016/j.knosys.2024.111400 doi: 10.1016/j.knosys.2024.111400
[3]	A. M. Rahmani, R. A Naqvi, E. Yousefpoor, M. S Yousefpoor, O. H. Ahmed, M. Hosseinzadeh, et al., A Q-learning and fuzzy logic-based hierarchical routing scheme in the intelligent transportation system for smart cities, Mathematics, 10 (2022), 4192. https://doi.org/10.3390/math10224192 doi: 10.3390/math10224192
[4]	T. Zhang, J. Cheng, Y. Zou, Multimodal transportation routing optimization based on multi-objective Q-learning under time uncertainty, Complex. Intell. Syst., 10 (2024), 3133–3152. https://doi.org/10.1007/s40747-023-01308-9 doi: 10.1007/s40747-023-01308-9
[5]	D. N. Railkar, S. Joshi, Penetration Testing Framework using the Q Learning Ensemble Deep CNN Discriminator Framework, Int. J. Adv. Comput. Sci. Appl., 15 (2024). https://doi.org/10.14569/ijacsa.2024.0150385
[6]	Z. Hu, R. Beuran, Y. Tan, Automated penetration testing using deep reinforcement learning. In 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS & PW), 2020, September, 2–10. https://doi.org/10.1109/EuroSPW51379.2020.00010
[7]	T. Gao, Optimizing robotic arm control using deep Q-learning and artificial neural networks through demonstration-based methodologies: A case study of dynamic and static conditions, Robot. Auton. syst., 181 (2024), 104771. https://doi.org/10.1016/j.robot.2024.104771 doi: 10.1016/j.robot.2024.104771
[8]	N. Khlif, N. Khraief, S. Belghith, Comparative Analysis of Modified Q-Learning and DQN for Autonomous Robot Navigation, J. Future Artif. Intell. Technol., 1 (2024), 296–308. https://doi.org/10.62411/faith.3048-3719-49 doi: 10.62411/faith.3048-3719-49
[9]	U. Yadav, S. V. Bondre, B. Thakre, Deep Reinforcement Learning in Robotics and Autonomous Systems, Deep Reinforcement Learning and Its Industrial Use Cases: AI for Real-World Applications, 2024,207–238. https://doi.org/10.1002/9781394272587.ch10
[10]	Z. Bai, H. Pang, M. Liu, M. Wang, An improved Q-Learning algorithm and its application to the optimized path planning for unmanned ground robot with obstacle avoidance, In 2022 6th CAA International Conference on Vehicular Control and Intelligence (CVCI), 2022, October, 1–6. https://doi.org/10.1109/CVCI56766.2022.9964859
[11]	X. Wang, J. Liu, C. Nugent, I. Cleland, Y. Xu, Mobile agent path planning under uncertain environment using reinforcement learning and probabilistic model checking, Knowl.-Based Syst., 264 (2023), 110355. https://doi.org/10.1016/j.knosys.2023.110355 doi: 10.1016/j.knosys.2023.110355
[12]	Y. Zhang, C. Li, G. Zhang, Y. Li, Z. Liang, Local path planning of mobile robot based on improved Q-learning algorithm, Journal of Shandong University of Technology (Natural Science Edition), 37 (2023), 1–6. https://doi.org/10.13367/j.cnki.sdgc.2023.02.004 doi: 10.13367/j.cnki.sdgc.2023.02.004
[13]	L. Gao, T. L. Ma, K. Liu, Y. X. Zhang, Application of improved Q-Learning algorithm in path planning, Journal of Jilin University (Information Science Edition), 36 (2018), 439–443. https://doi.org/10.19292/j.cnki.jdxxp.2018.04.013 doi: 10.19292/j.cnki.jdxxp.2018.04.013
[14]	E. S. Low, P. Ong, C. Y. Low, R. Omar, Modified Q-learning with distance metric and virtual target on path planning of mobile robot, Expert Syst. Appl., 199 (2022), 117191. https://doi.org/10.1016/j.eswa.2022.117191 doi: 10.1016/j.eswa.2022.117191
[15]	L. J. Song, Z. Y. Zhou, Y. L. Li, J. Hou, X. He, Research on path planning algorithm based on improved Q-learning algorithm, J. Chinese Comput. Syst., 45 (2023), 823–829. https://doi.org/10.20009/j.cnki.21-1106/TP.2022-0627 doi: 10.20009/j.cnki.21-1106/TP.2022-0627
[16]	S. I. A. Meerza, M. Islam, M. M. Uzzal, Q-learning based particle swarm optimization algorithm for optimal path planning of swarm of mobile robots. In 2019 1st international conference on advances in science, engineering and robotics technology (ICASERT), 2019, May, 1–5. https://doi.org/10.1109/ICASERT.2019.8934450
[17]	S. P. Bhat, D. S. Bernstein, Finite-time stability of continuous autonomous systems, SIAM J. Control Optim., 38 (2000), 751–766. https://doi.org/10.1137/S0363012997321358 doi: 10.1137/S0363012997321358
[18]	J. Tan, S. Xue, T. Niu, K. Qu, H. Cao, B. Chen, Fixed-time concurrent learning-based robust approximate optimal control, Nonlinear Dynam., 2025, 1–21. https://doi.org/10.1007/s11071-025-11235-8
[19]	P. Miao, Y. H. Zheng, S. Li, A new FXTZNN model for solving TVCS equation and application to pseudo-inverse of a matrix, Appl. Math. Comput., 465 (2024), 128409. https://doi.org/10.1016/j.amc.2023.128409 doi: 10.1016/j.amc.2023.128409
[20]	P. Miao, D. Y. Zhang, S. Li, A novel fixed-time zeroing neural network and its application to path tracking control of wheeled mobile robots, J. Comput. Appl. Math., 460 (2025), 116402. https://doi.org/10.1016/j.cam.2024.116402 doi: 10.1016/j.cam.2024.116402
[21]	A. W. Moore, Efficient memory-based learning for robot control (No. UCAM-CL-TR-209), University of Cambridge, Computer Laboratory, 1990. https://doi.org/10.48456/tr-209
[22]	W. Shi, S. Song, C. Wu, C. P. Chen, Multi pseudo Q-learning-based deterministic policy gradient for tracking control of autonomous underwater vehicles, IEEE T. Neural Net. Lear., 30 (2018), 3534–3546. https://doi.org/10.1109/TNNLS.2018.2884797 doi: 10.1109/TNNLS.2018.2884797
[23]	J. Yang, P. Wang, W. Yuan, Y. Ju, W. Han, J. Zhao, Automatic generation of optimal road trajectory for the rescue vehicle in case of emergency on mountain freeway using reinforcement learning approach, IET Intell. Transp. Syst., 15 (2021), 1142–1152. https://doi.org/10.1049/itr2.12081 doi: 10.1049/itr2.12081

Reader Comments

Your name:*

Email:*
© 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)