Large language models have shown strong capabilities in performing natural language planning tasks, largely due to the chain-of-thought method, which enhances their ability to solve complex tasks through explicit intermediate inference. However, they face challenges in acquiring new knowledge, executing calculations, and interacting with the environment. Although previous work has enabled large language models to use external tools to improve reasoning and environmental interaction, there was no scalable or cohesive structure for these technologies. In this paper, we present LLM-Collab, where Collab represents the cooperative interaction between two AI agents, and the large language model plays a key role in the creation of AI agents. For this method, we took large language models as the reasoning core for AI agents and designed two AI agents to cooperate on the planning tasks: One as an analyst for tool selection and phase validation, and the other as an executor of specific tasks. Our method provided a comprehensive list of external tools to facilitate the invocation and integration of agents, ensuring a seamless collaboration process. This paradigm established a unified framework for autonomous task-solving based on massive language models by demonstrating how language communication and tool selection enable multi-agent collaboration.
Citation: Hong Cao, Rong Ma, Yanlong Zhai, Jun Shen. LLM-Collab: a framework for enhancing task planning via chain-of-thought and multi-agent collaboration[J]. Applied Computing and Intelligence, 2024, 4(2): 328-348. doi: 10.3934/aci.2024019
Large language models have shown strong capabilities in performing natural language planning tasks, largely due to the chain-of-thought method, which enhances their ability to solve complex tasks through explicit intermediate inference. However, they face challenges in acquiring new knowledge, executing calculations, and interacting with the environment. Although previous work has enabled large language models to use external tools to improve reasoning and environmental interaction, there was no scalable or cohesive structure for these technologies. In this paper, we present LLM-Collab, where Collab represents the cooperative interaction between two AI agents, and the large language model plays a key role in the creation of AI agents. For this method, we took large language models as the reasoning core for AI agents and designed two AI agents to cooperate on the planning tasks: One as an analyst for tool selection and phase validation, and the other as an executor of specific tasks. Our method provided a comprehensive list of external tools to facilitate the invocation and integration of agents, ensuring a seamless collaboration process. This paradigm established a unified framework for autonomous task-solving based on massive language models by demonstrating how language communication and tool selection enable multi-agent collaboration.
| [1] |
J. S. Park, J. O'Brien, C. J. Cai, M. R. Morris, P. Liang, M. S. Bernstein, Generative agents: interactive simulacra of human behavior, Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023, 1–22. https://doi.org/10.1145/3586183.3606763 doi: 10.1145/3586183.3606763
|
| [2] | H. Yang, S. Yue, Y. He, Auto-gpt for online decision making: benchmarks and additional opinions, arXiv: 2306.02224. https://doi.org/10.48550/arXiv.2306.02224 |
| [3] |
Q. Dong, L. Li, D. Dai, C. Zheng, J. Ma, R. Li, et al., A survey on in-context learning, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, 1107–1128. https://doi.org/10.18653/v1/2024.emnlp-main.64 doi: 10.18653/v1/2024.emnlp-main.64
|
| [4] | Z. Wang, Z. Cheng, H. Zhu, D. Fried, G. Neubig, What are tools anyway? a survey from the language model perspective, arXiv: 2403.15452. https://doi.org/10.48550/arXiv.2403.15452 |
| [5] | C. Aeronautiques, A. Howe, C. Knoblock, D. McDermott, A. Ram, M. Veloso, et al., Pddl| the planning domain definition language, Tech. Report CVC TR-98-003/DCS TR-1165, 1998. |
| [6] | J. He, J. Chen, X. He, J. Gao, L. Li, L. Deng, et al., Deep reinforcement learning with a natural language action space, arXiv: 1511.04636. https://doi.org/10.48550/arXiv.1511.04636 |
| [7] | Z. Xi, W. Chen, X. Guo, W. He, Y. Ding, B. Hong, et al., The rise and potential of large language model based agents: a survey, Sci. China Inform. Sci., in press. https://doi.org/10.1007/s11432-024-4222-0 |
| [8] | B. Liu, Y. Jiang, X. Zhang, Q. Liu, S. Zhang, J. Biswas, et al., Llm+p: empowering large language models with optimal planning proficiency, arXiv: 2304.11477. https://doi.org/10.48550/arXiv.2304.11477 |
| [9] | L. Gao, A. Madaan, S. Zhou, U. Alon, P. Liu, Y. Yang, et al., Pal: program-aided language models, Proceedings of the 40th International Conference on Machine Learning, 2023, 10764–10799. |
| [10] | S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, et al., React: synergizing reasoning and acting in language models, Proceedings of 11th International Conference on Learning Representations, ICLR, 2023, 1–33. |
| [11] | J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, et al., Chain-of-thought prompting elicits reasoning in large language models, Proceedings of the 36th International Conference on Neural Information Processing Systems, 2024, 24824–24837. |
| [12] |
R. Prabhakar, R. Sivaramakrishnan, D. Gandhi, Y. Du, M. Wang, X. Song, et al., Sambanova sn40l: scaling the ai memory wall with dataflow and composition of experts, Proceedings of 57th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2024, 1353–1366. https://doi.org/10.1109/MICRO61859.2024.00100 doi: 10.1109/MICRO61859.2024.00100
|
| [13] |
A. Talmor, J. Herzig, N. Lourie, J. Berant, CommonsenseQA: a question answering challenge targeting commonsense knowledge, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, 4149–4158. https://doi.org/10.18653/v1/N19-1421 doi: 10.18653/v1/N19-1421
|
| [14] |
M. Suzgun, N. Scales, N. Schärli, S. Gehrmann, Y. Tay, H. W. Chung, et al., Challenging BIG-bench tasks and whether chain-of-thought can solve them, Proceedings of Findings of the Association for Computational Linguistics: ACL 2023, 2023, 13003–13051. https://doi.org/10.18653/v1/2023.findings-acl.824 doi: 10.18653/v1/2023.findings-acl.824
|
| [15] | N. Crispino, K. Montgomery, F. Zeng, D. Song, C. Wang, Agent instructs large language models to be general zero-shot reasoners, Proceedings of the 41st International Conference on Machine Learning, 2024, 9458–9549. |
| [16] | X. Huang, W. Liu, X. Chen, X. Wang, H. Wang, D. Lian, et al., Understanding the planning of LLM agents: a survey, arXiv: 2402.02716. https://doi.org/10.48550/arXiv.2402.02716 |
| [17] | F. F. Xu, Y. Song, B. Li, Y. Tang, K. Jain, M. Bao, et al., Theagentcompany: benchmarking llm agents on consequential real world tasks, arXiv: 2412.14161. https://doi.org/10.48550/arXiv.2412.14161 |
| [18] | Y. Chen, W. Wang, S. Lobry, C. Kurtz, An llm agent for automatic geospatial data analysis, arXiv: 2410.18792. https://doi.org/10.48550/arXiv.2410.18792 |
| [19] |
C. Qian, W. Liu, H. Liu, N. Chen, Y. Dang, J. Li, et al., Chatdev: communicative agents for software development, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024, 15174–15186. https://doi.org/10.18653/v1/2024.acl-long.810 doi: 10.18653/v1/2024.acl-long.810
|
| [20] | J. Zhao, C. Zu, H. Xu, Y. Lu, W. He, Y. Ding, et al., Longagent: scaling language models to 128k context through multi-agent collaboration, arXiv: 2402.11550. https://doi.org/10.48550/arXiv.2402.11550 |
| [21] |
P. Gong, J. Li, J. Mao, Cosearchagent: a lightweight collaborative search agent with large language models, Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024, 2729–2733. https://doi.org/10.1145/3626772.3657672 doi: 10.1145/3626772.3657672
|
| [22] | X. Feng, Z. Y. Chen, Y. Qin, Y. Lin, X. Chen, Z. Liu, et al., Large language model-based human-agent collaboration for complex task solving, arXiv: 2402.12914. https://doi.org/10.48550/arXiv.2402.12914 |
| [23] |
P. S. Dhillon, S. Molaei, J. Li, M. Golub, S. Zheng, L. P. Robert, Shaping human-ai collaboration: varied scaffolding levels in co-writing with language models, Proceedings of the CHI Conference on Human Factors in Computing Systems, 2024, 1–18. https://doi.org/10.1145/3613904.3642134 doi: 10.1145/3613904.3642134
|
| [24] | J. Oswald, K. Srinivas, H. Kokel, J. Lee, M. Katz, S. Sohrabi, Large language models as planning domain generators, in Proceedings of the International Conference on Automated Planning and Scheduling, 34 (2024), 423–431. https://doi.org/10.1609/icaps.v34i1.31502 |
| [25] | C. H. Song, J. Wu, C. Washington, B. M. Sadler, W. L. Chao, Y. Su, Llm-planner: few-shot grounded planning for embodied agents with large language models, Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, 2998–3009. |
| [26] |
Z. Jiao, Y. Niu, Z. Zhang, S. C. Zhu, Y. Zhu, H. Liu, Sequential manipulation planning on scene graph, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, 8203–8210. https://doi.org/10.1109/IROS47612.2022.9981735 doi: 10.1109/IROS47612.2022.9981735
|
| [27] |
Y. Jiang, S. Zhang, P. Khandelwal, P. Stone, Task planning in robotics: an empirical comparison of pddl-and asp-based systems, Frontiers Inf. Technol. Electronic Eng., 20 (2019), 363–373. https://doi.org/10.1631/FITEE.1800514 doi: 10.1631/FITEE.1800514
|
| [28] | B. Y. Lin, Y. Fu, K. Yang, F. Brahman, S. Huang, C. Bhagavatula, et al., Swiftsage: a generative agent with fast and slow thinking for complex interactive tasks, Proceedings of the 37th International Conference on Neural Information Processing Systems, 2024, 23813–23825. |
| [29] |
J. Xu, H. Wang, Z. Y. Niu, H. Wu, W. Che, Knowledge graph grounded goal planning for open-domain conversation generation, Proceedings of the AAAI Conference on Artificial Intelligence, 34 (2020), 9338–9345. https://doi.org/10.1609/aaai.v34i05.6474 doi: 10.1609/aaai.v34i05.6474
|
| [30] | S. Agashe, Y. Fan, A. Reyna, X. E. Wang, Llm-coordination: evaluating and analyzing multi-agent coordination abilities in large language models, 2024, arXiv: 2310.03903. https://doi.org/10.48550/arXiv.2310.03903 |
| [31] | P. Haslum, N. Lipovetzky, D. Magazzeni, C. Muise, An introduction to the planning domain definition language, Cham: Springer, 2019. https://doi.org/10.1007/978-3-031-01584-7 |
| [32] | K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, et al., Training verifiers to solve math word problems, arXiv: 2110.14168. https://doi.org/10.48550/arXiv.2110.14168 |
| [33] | S. Y. Miao, C. C. Liang, K. Y. Su, A diverse corpus for evaluating and developing english math word problem solvers, arXiv: 2106.15772. https://doi.org/10.48550/arXiv.2106.15772 |
| [34] |
A. Patel, S. Bhattamishra, N. Goyal, Are NLP models really able to solve simple math word problems? Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, 2080–2094. https://doi.org/10.18653/v1/2021.naacl-main.168 doi: 10.18653/v1/2021.naacl-main.168
|
| [35] | A. Srivastava, A. Rastogi, A. Rao, A. A. M. Shoeb, A. Abid, A. Fisch, et al., Beyond the imitation game: quantifying and extrapolating the capabilities of language models, arXiv: 2206.04615. https://doi.org/10.48550/arXiv.2206.04615 |
| [36] |
M. Helmert, The fast downward planning system, J. Artif. Intell. Res., 26 (2006), 191–246. https://doi.org/10.1613/jair.1705 doi: 10.1613/jair.1705
|
| [37] | T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, et al., Language models are few-shot learners, Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020, 1877–1901. |