A Robotic Mobile Fulfillment System (RMFS) is a new type of parts-to-picker order fulfillment system where multiple robots coordinate to complete a large number of order picking tasks. The multi-robot task allocation (MRTA) problem in RMFS is complex and dynamic, and it cannot be well solved by traditional MRTA methods. This paper proposes a task allocation method for multiple mobile robots based on multi-agent deep reinforcement learning, which not only has the advantage of reinforcement learning in dealing with dynamic environment but also can solve the task allocation problem of large state space and high complexity utilizing deep learning. First, a multi-agent framework based on cooperative structure is proposed according to the characteristics of RMFS. Then, a multi agent task allocation model is constructed based on Markov Decision Process. In order to avoid inconsistent information among agents and improve the convergence speed of traditional Deep Q Network (DQN), an improved DQN algorithm based on a shared utilitarian selection mechanism and priority empirical sample sampling is proposed to solve the task allocation model. Simulation results show that the task allocation algorithm based on deep reinforcement learning is more efficient than that based on a market mechanism, and the convergence speed of the improved DQN algorithm is much faster than that of the original DQN algorithm.
Citation: Ruiping Yuan, Jiangtao Dou, Juntao Li, Wei Wang, Yingfan Jiang. Multi-robot task allocation in e-commerce RMFS based on deep reinforcement learning[J]. Mathematical Biosciences and Engineering, 2023, 20(2): 1903-1918. doi: 10.3934/mbe.2023087
[1] | Tyson Loudon, Stephen Pankavich . Mathematical analysis and dynamic active subspaces for a long term model of HIV. Mathematical Biosciences and Engineering, 2017, 14(3): 709-733. doi: 10.3934/mbe.2017040 |
[2] | A. M. Elaiw, N. H. AlShamrani . Stability of HTLV/HIV dual infection model with mitosis and latency. Mathematical Biosciences and Engineering, 2021, 18(2): 1077-1120. doi: 10.3934/mbe.2021059 |
[3] | Ting Guo, Zhipeng Qiu . The effects of CTL immune response on HIV infection model with potent therapy, latently infected cells and cell-to-cell viral transmission. Mathematical Biosciences and Engineering, 2019, 16(6): 6822-6841. doi: 10.3934/mbe.2019341 |
[4] | Yicang Zhou, Yiming Shao, Yuhua Ruan, Jianqing Xu, Zhien Ma, Changlin Mei, Jianhong Wu . Modeling and prediction of HIV in China: transmission rates structured by infection ages. Mathematical Biosciences and Engineering, 2008, 5(2): 403-418. doi: 10.3934/mbe.2008.5.403 |
[5] | Cameron Browne . Immune response in virus model structured by cell infection-age. Mathematical Biosciences and Engineering, 2016, 13(5): 887-909. doi: 10.3934/mbe.2016022 |
[6] | Nara Bobko, Jorge P. Zubelli . A singularly perturbed HIV model with treatment and antigenic variation. Mathematical Biosciences and Engineering, 2015, 12(1): 1-21. doi: 10.3934/mbe.2015.12.1 |
[7] | A. M. Elaiw, N. H. AlShamrani . Analysis of an HTLV/HIV dual infection model with diffusion. Mathematical Biosciences and Engineering, 2021, 18(6): 9430-9473. doi: 10.3934/mbe.2021464 |
[8] | B. M. Adams, H. T. Banks, Hee-Dae Kwon, Hien T. Tran . Dynamic Multidrug Therapies for HIV: Optimal and STI Control Approaches. Mathematical Biosciences and Engineering, 2004, 1(2): 223-241. doi: 10.3934/mbe.2004.1.223 |
[9] | Sophia Y. Rong, Ting Guo, J. Tyler Smith, Xia Wang . The role of cell-to-cell transmission in HIV infection: insights from a mathematical modeling approach. Mathematical Biosciences and Engineering, 2023, 20(7): 12093-12117. doi: 10.3934/mbe.2023538 |
[10] | Tinevimbo Shiri, Winston Garira, Senelani D. Musekwa . A two-strain HIV-1 mathematical model to assess the effects of chemotherapy on disease parameters. Mathematical Biosciences and Engineering, 2005, 2(4): 811-832. doi: 10.3934/mbe.2005.2.811 |
A Robotic Mobile Fulfillment System (RMFS) is a new type of parts-to-picker order fulfillment system where multiple robots coordinate to complete a large number of order picking tasks. The multi-robot task allocation (MRTA) problem in RMFS is complex and dynamic, and it cannot be well solved by traditional MRTA methods. This paper proposes a task allocation method for multiple mobile robots based on multi-agent deep reinforcement learning, which not only has the advantage of reinforcement learning in dealing with dynamic environment but also can solve the task allocation problem of large state space and high complexity utilizing deep learning. First, a multi-agent framework based on cooperative structure is proposed according to the characteristics of RMFS. Then, a multi agent task allocation model is constructed based on Markov Decision Process. In order to avoid inconsistent information among agents and improve the convergence speed of traditional Deep Q Network (DQN), an improved DQN algorithm based on a shared utilitarian selection mechanism and priority empirical sample sampling is proposed to solve the task allocation model. Simulation results show that the task allocation algorithm based on deep reinforcement learning is more efficient than that based on a market mechanism, and the convergence speed of the improved DQN algorithm is much faster than that of the original DQN algorithm.
[1] |
N. Boysen, R. D. Koster, F. Weidinger, Warehousing in the e-commerce era: A survey, Eur. J. Oper. Res., 277 (2019), 396–411. https://doi.org/10.1016/j.ejor.2018.08.023 doi: 10.1016/j.ejor.2018.08.023
![]() |
[2] |
R. D. Koster, T. Le-Duc, K. J. Roodbergen, Design and control of warehouse order picking: A literature review, Eur. J. Oper. Res., 182 (2007), 481–501. https://doi.org/10.1016/j.ejor.2006.07.009 doi: 10.1016/j.ejor.2006.07.009
![]() |
[3] | M. Wulfraat, Is Kiva systems a good fit for your distribution center? An unbiased distribution consultant evaluation, 2012. Available from: http://www.mwpvl.com/html/kiva_systems.html. |
[4] |
H. Hu, Q. Zhang, H. Hu, J. Chen, Z. Li, Q-learning based mobile swarm intelligence task allocation algorithm, Comput. Integr. Manuf. Syst., 24 (2018), 1774–1783. https://doi.org/10.13196/j.cims.2018.07.019 doi: 10.13196/j.cims.2018.07.019
![]() |
[5] |
L. Mo, L. Su, X. Li, Research and development of task allocation methods in multi-robot systems, Manuf. Autom., 35 (2013), 21–24+28. https://doi.org/10.3969/j.issn.1009-0134.2013.05.07 doi: 10.3969/j.issn.1009-0134.2013.05.07
![]() |
[6] |
Z. Shi, Q. C, Cooperative task allocation for multiple UAVs based on improved multi objective quantum behaved particle swarm optimization algorithm, J. Nanjing Univ. Sci. Technol., 36 (2012), 945–951. https://doi.org/10.14177/j.cnki.32-1397n.2012.06.013 doi: 10.14177/j.cnki.32-1397n.2012.06.013
![]() |
[7] |
B. H. Sun, H. Wang, B. F. Fang, Z. L. Ling, J. Lin, Task allocation in emotional robot pursuit based on self-organizing algorithm, Robot, 39 (2017), 680–687. https://doi.org/10.13973/j.cnki.robot.2017.0680 doi: 10.13973/j.cnki.robot.2017.0680
![]() |
[8] |
B. F. Fang, L. Chen, H. Wang, S. L. Dai, Q. B. Zhong, Research on multirobot pursuit task allocation algorithm based on emotional cooperation factor, Sci. World J., (2014), 864180. https://doi.org/10.1155/2014/864180 doi: 10.1155/2014/864180
![]() |
[9] |
L. Liu, X. C. Ji, Z. Q. Zheng, Multi-robot task allocation based on market and capability classification, Robot, 28 (2006), 337–343. https://doi.org/10.13973/j.cnki.robot.2006.03.019 doi: 10.3321/j.issn:1002-0446.2006.03.019
![]() |
[10] | F. Janati, F. Abdollahi, S. S. Ghidary, M. Jannatifar, J. Baltes, S. Sadeghnejad, Multi-robot task allocation using clustering method, in Robot Intelligence Technology and Applications 4, Springer, Cham, 447 (2017), 233–247. https://doi.org/10.1007/978-3-319-31293-4_19 |
[11] |
A. Farinelli, L. Iocchi, D. Nardi, Distributed on-line dynamic task assignment for multi-robot patrolling, Auton. Robots, 41 (2017), 1321–1345. https://doi.org/10.1007/s10514-016-9579-8 doi: 10.1007/s10514-016-9579-8
![]() |
[12] | B. Heap, M. Pagnucco, Repeated sequential single-cluster auctions with dynamic tasks for multi-robot task allocation with pickup and delivery, in German Conference on Multiagent System Technologies, 8076 (2013), 87–100. https://doi.org/10.1007/978-3-642-40776-5_10 |
[13] |
H. Zhao, C. Shi, Exploration of the unknown environment of multi-robot collaboration based on market method, Comput. Digital Eng., 45 (2017), 2085–2089. https://doi.org/10.3969/j.issn.1672-9722.2017.11.001 doi: 10.3969/j.issn.1672-9722.2017.11.001
![]() |
[14] |
R. Yuan, J. Li, X. Wang, L. He, Multirobot task allocation in e-commerce robotic mobile fulfillment systems, Math. Probl. Eng., (2021), 6308950. https://doi.org/10.1155/2021/6308950 doi: 10.1155/2021/6308950
![]() |
[15] |
B. P. Zou, Y. M. Gong, X. H. Xu, Z. Yuan, Assignment rules in robotic mobile fulfilment systems for online retailers, Int. J. Prod. Res., 55 (2017), 6175–6192. https://doi.org/10.1080/00207543.2017.1331050 doi: 10.1080/00207543.2017.1331050
![]() |
[16] |
D. Roy, S. Nigam, R. D. Koster, I. Adan, J. Resing, Robot-storage zone assignment strategies in mobile fulfillment systems, Transp. Res. Part E Logist. Transp. Rev., 122 (2019), 119–142. https://doi.org/10.1016/j.tre.2018.11.005 doi: 10.1016/j.tre.2018.11.005
![]() |
[17] |
Z. Yuan, Y. Y. Gong, Bot-in-time delivery for robotic mobile fulfillment systems, IEEE Trans. Eng. Manage., 64 (2017), 83–93. https://doi.org/10.1109/TEM.2016.2634540 doi: 10.1109/TEM.2016.2634540
![]() |
[18] | P. Ghassemi, S. Chowdhury, Decentralized task allocation in multi-robot systems via bipartite graph matching augmented with fuzzy clustering, in The ASME 2018 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, 2018. https://doi.org/10.1115/DETC2018-86161 |
[19] |
K. R. Gue, K. Furmans, Z. Seibold, U. Onur, Grid store a puzzle-based storage system with decentralized control, IEEE Trans. Autom. Sci. Eng., 11 (2014), 429–438. https://doi.org/10.1109/TASE.2013.2278252 doi: 10.1109/TASE.2013.2278252
![]() |
[20] |
B. W. Shen, N. B. Yu, J. T. Liu, Intelligent scheduling and path planning of warehouse mobile robots, CAAI Trans. Intell. Syst., 9 (2014), 659–664. https://doi.org/10.3969/j.issn.1673-4785.201312048 doi: 10.3969/j.issn.1673-4785.201312048
![]() |
[21] |
R. P. Yuan, H. L. Wang, L. R. Sun, J. T. Li, Research on the task scheduling of "goods to picker" order picking system based on logistics AGV, Oper. Res. Manage. Sci., 27 (2018), 133–138. https://doi.org/10.12005/orms.2018.0241 doi: 10.12005/orms.2018.0241
![]() |
[22] |
J. T. Zhang, F. X. Yang, X. Weng, A building-block-based genetic algorithm for solving the robots allocation problem in a robotic mobile fulfillment system, Math. Probl. Eng., (2019), 6153848. https://doi.org/10.1155/2019/6153848 doi: 10.1155/2019/6153848
![]() |
[23] |
M. Zolfpour-Arokhlo, A. Selamat, S. Z. M. Hashim, H. Afkhami, Modeling of route planning system based on Q value-based dynamic programming with multi-agent reinforcement learning algorithms, Eng. Appl. Artif. Intell., 29 (2014), 163–177. https://doi.org/10.1016/ j.engappai.2014.01.001 doi: 10.1016/j.engappai.2014.01.001
![]() |
[24] |
J. Dou, C. Chen, P. Yang, Genetic scheduling and reinforcement learning in multirobot systems for intelligent warehouses, Math. Probl. Eng., (2015), 597956. https://doi.org/10.1155/2015/597956 doi: 10.1155/2015/597956
![]() |
[25] |
M. Chen, A. Liu, W. Liu, K. Ota, M. Dong, N. N. Xiong, RDRL: A Recurrent deep reinforcement learning scheme for dynamic spectrum access in reconfigurable wireless networks, IEEE Trans. Network Sci. Eng., 9 (2022), 364–376. https://doi.org/10.1109/TNSE.2021.3117565 doi: 10.1109/TNSE.2021.3117565
![]() |
[26] |
M. Chen, W. Liu, N. Zhang, J. Li, Y. Ren, M. Yi, et al., GPDS: A multi-agent deep reinforcement learning game for anti-jamming secure computing in MEC network, Expert Syst. Appl., 210 (2022), 118394. https://doi.org/10.1016/j.eswa.2022.118394 doi: 10.1016/j.eswa.2022.118394
![]() |
[27] |
B. Park, C. Kang, J. Choi, Cooperative multi-robot task allocation with reinforcement learning, Appl. Sci., 12(2022), 272–291. https://doi.org/10.3390/app12010272 doi: 10.3390/app12010272
![]() |
[28] | Y. D. Wang, Research on Multi-Object Workflow Scheduling Method Based on Deep-Q-Network Multi-Agent Reinforcement Learning, MA.Sc thesis, Chongqing University, 2019. https://doi.org/10.27670/d.cnki.gcqdu.2019.000536 |
1. | Marjolein de Rijk, Daowei Yang, Bas Engel, Marcel Dicke, Erik H. Poelman, Feeding guild of non-host community members affects host-foraging efficiency of a parasitic wasp, 2016, 97, 00129658, 1388, 10.1890/15-1300.1 | |
2. | Chandan Maji, Debasis Mukherjee, 2019, 2161, 0094-243X, 030021, 10.1063/1.5127486 | |
3. | Debasis Mukherjee, The effect of refuge and immigration in a predator–prey system in the presence of a competitor for the prey, 2016, 31, 14681218, 277, 10.1016/j.nonrwa.2016.02.004 | |
4. | Debasis Mukherjee, The effect of interference time in a predator–prey–nonprey system, 2022, 13, 1793-9623, 10.1142/S1793962322500222 |