Deep reinforcement learning algorithms are widely used in the field of robot control. Sparse reward signals lead to blind exploration, affecting the efficiency of the manipulator during path planning for multi-axis systems at any given end-effector start and target position. To address the problem of tracking randomly located targets in three-dimensional space, this paper proposes a PPO (proximal policy optimization) algorithm with a fused reward mechanism, which enhances the tracking and guidance capabilities of the manipulator in multiple dimensions and reduces the blind randomness of the manipulator during the detection and sampling process. The fusion reward mechanism consists of four dimensions: trajectory correction reward, core area acceleration guidance reward, ladder adaptability reward, and abnormal termination penalty. Finally, a 7-degree-of-freedom Kuka manipulator is built on the PyBullet platform for simulation experiments. Experimental results show that, compared with the sparse reward mechanism, the PPO algorithm with the fused reward mechanism has a higher average success rate as high as 94.88% in task tracking, which can effectively improve the tracking efficiency and accuracy of the spatial manipulator.
Citation: Ruyi Dong, Kai Yang, Tong Wang. Research on tracking strategy of manipulator based on fusion reward mechanism[J]. AIMS Electronics and Electrical Engineering, 2025, 9(1): 99-117. doi: 10.3934/electreng.2025006
Deep reinforcement learning algorithms are widely used in the field of robot control. Sparse reward signals lead to blind exploration, affecting the efficiency of the manipulator during path planning for multi-axis systems at any given end-effector start and target position. To address the problem of tracking randomly located targets in three-dimensional space, this paper proposes a PPO (proximal policy optimization) algorithm with a fused reward mechanism, which enhances the tracking and guidance capabilities of the manipulator in multiple dimensions and reduces the blind randomness of the manipulator during the detection and sampling process. The fusion reward mechanism consists of four dimensions: trajectory correction reward, core area acceleration guidance reward, ladder adaptability reward, and abnormal termination penalty. Finally, a 7-degree-of-freedom Kuka manipulator is built on the PyBullet platform for simulation experiments. Experimental results show that, compared with the sparse reward mechanism, the PPO algorithm with the fused reward mechanism has a higher average success rate as high as 94.88% in task tracking, which can effectively improve the tracking efficiency and accuracy of the spatial manipulator.
| [1] |
Chen Z, Wang S, Wang J, Xu K, Lei T, Zhang H, et al. (2021) Control strategy of stable walking for a hexapod wheel-legged robot. ISA transactions 108: 367‒380. https://doi.org/10.1016/j.isatra.2020.08.033. doi: 10.1016/j.isatra.2020.08.033
|
| [2] |
Liu J, Jayakumar P, Stein JL, Ersal T (2018) A nonlinear model predictive control formulation for obstacle avoidance in high-speed autonomous ground vehicles in unstructured environments. Vehicle system dynamics 56: 853‒882. https://doi.org/10.1080/00423114.2017.1399209. doi: 10.1080/00423114.2017.1399209
|
| [3] |
Lasi H, Fettke P, Kemper HG, Feld T, Hoffmann M (2014) Industry 4.0. Business & information systems engineering 6: 239‒242. https://doi.org/10.1007/s12599-014-0334-4. doi: 10.1007/s12599-014-0334-4
|
| [4] |
Zhong RY, Xu X, Klotz E, Newman ST (2017) Intelligent manufacturing in the context of industry 4.0: a review. Engineering 3: 616‒630, https://doi.org/10.1016/J.ENG.2017.05.015. doi: 10.1016/J.ENG.2017.05.015
|
| [5] |
Watkins CJCH, Dayan P (1992) Q-learning. Machine learning 8: 279‒292. https://doi.org/10.1007/BF00992698. doi: 10.1007/BF00992698
|
| [6] |
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521: 436–444, https://doi.org/10.1038/nature14539. doi: 10.1038/nature14539
|
| [7] | Mnih V (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv: 1312.5602. |
| [8] | Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016) Dueling network architectures for deep reinforcement learning. In International conference on machine learning, 1995‒2003. PMLR. |
| [9] | Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, 30. https://doi.org/10.1609/aaai.v30i1.10295. |
| [10] | Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, et al. (2018) Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence, 32. https://doi.org/10.1609/aaai.v32i1.11796. |
| [11] | Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust Region Policy Optimization. arXiv preprint arXiv: 1502.05477. |
| [12] |
Xian B, Dawson DM, de Queiroz MS, Chen J (2004) A continuous asymptotic tracking control strategy for uncertain nonlinear systems. IEEE T Automat Contr 49: 1206‒1211. https://doi.org/10.1109/TAC.2004.831148. doi: 10.1109/TAC.2004.831148
|
| [13] | Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347. |
| [14] |
Galicki M (2016) Finite-time trajectory tracking control in a task space of robotic manipulators. Automatica 67: 165‒170. https://doi.org/10.1016/j.automatica.2016.01.025. doi: 10.1016/j.automatica.2016.01.025
|
| [15] | Memarian F, Goo W, Lioutikov R, Niekum S, Topcu U (2021) Self-Supervised Online Reward Shaping in Sparse-Reward Environments. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2369‒2375. https://doi.org/10.1109/IROS51168.2021.9636020. |
| [16] | Vecerik M, Hester T, Scholz J, Wang F, Pietquin O, Piot B, et al. (2017) Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv: 1707.08817. https://doi.org/10.48550/arXiv.1707.08817. |
| [17] | Colas C, Sigaud O, Oudeyer PY (2018) Gep-pg: Decoupling exploration and exploitation in deep reinforcement learning algorithms. International conference on machine learning, 1039‒1048. PMLR. |
| [18] | Conti E, Madhavan V, Petroski Such F, Lehman J, Stanley K, Clune J (2018) Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. Advances in neural information processing systems, 31. |
| [19] | Lehman J, Stanley K O (2008) Exploiting open-endedness to solve problems through the search for novelty. In ALIFE, 329‒336. |
| [20] |
Pugh JK, Soros LB, Stanley KO (2016) Quality diversity: A new frontier for evolutionary computation. Front Robot AI 3: 202845. https://doi.org/10.3389/frobt.2016.00040. doi: 10.3389/frobt.2016.00040
|
| [21] | Riedmiller M, Hafner R, Lampe T, Neunert M, Degrave J, Wiele T, et al. (2018) Learning by playing solving sparse reward tasks from scratch. In International conference on machine learning, 4344‒4353. |
| [22] |
Wang C, Wang J, Wang J, Zhang X (2020) Deep-reinforcement-learning-based autonomous UAV navigation with sparse rewards. IEEE Internet of Things Journal 7: 6180‒6190. https://doi.org/10.1109/JIOT.2020.2973193. doi: 10.1109/JIOT.2020.2973193
|
| [23] |
Qi C, Zhu Y, Song C, Yan G, Xiao F, Zhang X, et al. (2022) Hierarchical reinforcement learning based energy management strategy for hybrid electric vehicle. Energy 238: 121703. https://doi.org/10.1016/j.energy.2021.121703. doi: 10.1016/j.energy.2021.121703
|
| [24] |
Li J, Shi X, Li J, Zhang X, Wang J (2020) Random curiosity-driven exploration in deep reinforcement learning. Neurocomputing 418: 139‒147. https://doi.org/10.1016/j.neucom.2020.08.024. doi: 10.1016/j.neucom.2020.08.024
|
| [25] | Liu IJ, Jain U, Yeh RA, Schwing A (2021) Cooperative exploration for multi-agent deep reinforcement learning. In International conference on machine learning, 6826‒6836. PMLR. |
| [26] |
Xia Y, Lan M, Luo J, Chen X, Zhou G (2022) Iterative rule-guided reasoning over sparse knowledge graphs with deep reinforcement learning. Informa Process Manag 59: 103040. https://doi.org/10.1016/j.ipm.2022.103040. doi: 10.1016/j.ipm.2022.103040
|
| [27] |
Cai M, Hasanbeig M, Xiao S, Abate A, Kan Z (2021) Modular deep reinforcement learning for continuous motion planning with temporal logic. IEEE Robot Autom Lett 6: 7973‒7980. https://doi.org/10.1109/LRA.2021.3101544. doi: 10.1109/LRA.2021.3101544
|
| [28] | Roghair J, Niaraki A, Ko K, Jannesari A (2022) A vision based deep reinforcement learning algorithm for UAV obstacle avoidance. In Intelligent Systems and Applications: Proceedings of the 2021 Intelligent Systems Conference (IntelliSys) Volume 1, 115‒128. Springer International Publishing. https://doi.org/10.1007/978-3-030-82193-7_8. |
| [29] |
Eoh G, Park TH (2021) Automatic curriculum design for object transportation based on deep reinforcement learning. IEEE Access 9: 137281‒137294. https://doi.org/10.1109/ACCESS.2021.3118109. doi: 10.1109/ACCESS.2021.3118109
|
| [30] | Christianos F, Schäfer L, Albrecht S (2020) Shared experience actor-critic for multi-agent reinforcement learning. Advances in neural information processing systems 33: 10707‒10717. |
| [31] |
Hakak S, Alazab M, Khan S, Gadekallu TR, Maddikunta PKR, Khan WZ (2021) An ensemble machine learning approach through effective feature extraction to classify fake news. Future Generation Computer Systems 117: 47‒58. https://doi.org/10.1016/j.future.2020.11.022. doi: 10.1016/j.future.2020.11.022
|
| [32] | Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. |
| [33] | Neal RM (2001) Annealed importance sampling. Statistics and Computing. |
| [34] |
Silver D, Singh S, Precup D, Sutton RS (2021) Reward is enough. Artificial Intelligence 299: 103535. https://doi.org/10.1016/j.artint.2021.103535. doi: 10.1016/j.artint.2021.103535
|
| [35] |
Chen C, Hua Z, Zhang R, Liu G, Wen W (2020) Automated arrhythmia classification based on a combination network of CNN and LSTM. Biomed Signal Process Contr 57: 101819. https://doi.org/10.1016/j.bspc.2019.101819. doi: 10.1016/j.bspc.2019.101819
|
| [36] |
Liu X, Wen S, Hu Y, Han F, Zhang H, Karimi HR (2024). An active SLAM with multi-sensor fusion for snake robots based on deep reinforcement learning. Mechatronics 103: 103248. https://doi.org/10.1016/j.mechatronics.2024.103248. doi: 10.1016/j.mechatronics.2024.103248
|
| [37] |
Wen S, Shu Y, Rad A, Wen Z, Guo Z, Gong S (2025). A deep residual reinforcement learning algorithm based on Soft Actor-Critic for autonomous navigation. Expert Syst Appl 259: 125238. https://doi.org/10.1016/j.eswa.2024.125238. doi: 10.1016/j.eswa.2024.125238
|
| [38] | Azar AT, Sardar MZ, Ahmed S, Hassanien AE, Kamal NA (2023) Autonomous robot navigation and exploration using deep reinforcement learning with Gazebo and ROS. International Conference on Advanced Intelligent Systems and Informatics, 287‒299. Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-43247-7_26. |
| [39] |
Haider Z, Sardar MZ, Azar AT, Ahmed S, Kamal NA (2024) Exploring reinforcement learning techniques in the realm of mobile robotics. Int J Autom Control 18: 655‒697. https://doi.org/10.1504/IJAAC.2024.142043. doi: 10.1504/IJAAC.2024.142043
|
| [40] |
Ahmed S, Azar AT, Tounsi M, Ibraheem IK (2023) Adaptive control design for Euler–Lagrange systems using fixed-time fractional integral sliding mode scheme. Fractal and Fractional 7: 712. https://doi.org/10.3390/fractalfract7100712. doi: 10.3390/fractalfract7100712
|
| [41] |
Jahanshahi H, Zhu ZH (2024) Review of Machine Learning in Robotic Grasping Control in Space Application. Acta Astronaut 15. https://doi.org/10.1016/j.actaastro.2024.04.012. doi: 10.1016/j.actaastro.2024.04.012
|