An overlay multicast routing method based on network situational awareness and hierarchical multi-agent reinforcement learning

Miao Ye; Yanye Chen; Yong Wang; Cheng Zhu; Qiuxiang Jiang; Gai Huang; Feng Ding; Miao Ye; Yanye Chen; Yong Wang; Cheng Zhu; Qiuxiang Jiang; Gai Huang; Feng Ding

doi:10.3934/era.2026154

Electronic Research Archive

2026, Volume 34, Issue 5: 3447-3480. doi: 10.3934/era.2026154

Previous Article Next Article

Research article Special Issues

An overlay multicast routing method based on network situational awareness and hierarchical multi-agent reinforcement learning

1.
School of Information and Communication, Guilin University of Electronic Technology, Guilin 541000, China
2.
School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541000, China
3.
Information Center, Guilin Medical University, Guilin 541000, China
4.
School of Optoelectronic Engineering, Guilin University of Electronic Technology, Guilin 541000, China

Received: 27 December 2025 Revised: 23 March 2026 Accepted: 02 April 2026 Published: 22 April 2026

Compared with IP multicast, Overlay Multicast (OM) trees constructed at the application layer offer superior compatibility and flexible deployment advantages in heterogeneous, cross-domain networks. However, OM implementations under traditional network architectures suffer from weak adaptability to highly dynamic traffic due to their lack of awareness of underlying physical resource states. Moreover, reinforcement learning-based approaches fail to decouple the multi-objective tightly coupled nature of OM, resulting in high computational complexity, slow policy convergence, and insufficient stability. To address these challenges, we proposed a MA-DHRL-OM routing method. First, leveraging the centralized topological view provided by Software-Defined Networking (SDN), the method collected link-state information and constructed a traffic-aware feature model to provide multi-dimensional decision support for OM path planning. Second, within a unified framework that integrates multi-agent reinforcement learning and hierarchical reinforcement learning, MA-DHRL-OM solves for the optimal OM tree as follows: The hierarchical learning architecture decomposes the construction of the OM tree into a two-stage subtask framework. By designing tailored decision logic and reward signal feedback mechanisms for upper- and lower-layer agents, it achieved hierarchical decoupling of the high-dimensional OM problem, effectively reducing the action space dimensionality and enhancing policy convergence stability. Moreover, the multi-agent collaboration mechanism enabled each agent to make independent decisions based on its local observations, thereby balancing multi-objective optimization while improving the algorithm's overall scalability and adaptability. Extensive simulation experiments demonstrated that, compared with existing methods, MA-DHRL-OM achieves superior performance in optimizing key metrics such as delay, bandwidth utilization, and packet loss rate while exhibiting more stable convergence behavior and greater flexibility in OM routing decisions.
- overlay multicast,
- software-defined networking,
- hierarchical reinforcement learning,
- multi-agent reinforcement learning,
- multicast routing
Citation: Miao Ye, Yanye Chen, Yong Wang, Cheng Zhu, Qiuxiang Jiang, Gai Huang, Feng Ding. An overlay multicast routing method based on network situational awareness and hierarchical multi-agent reinforcement learning[J]. Electronic Research Archive, 2026, 34(5): 3447-3480. doi: 10.3934/era.2026154

Related Papers:

Abstract

Compared with IP multicast, Overlay Multicast (OM) trees constructed at the application layer offer superior compatibility and flexible deployment advantages in heterogeneous, cross-domain networks. However, OM implementations under traditional network architectures suffer from weak adaptability to highly dynamic traffic due to their lack of awareness of underlying physical resource states. Moreover, reinforcement learning-based approaches fail to decouple the multi-objective tightly coupled nature of OM, resulting in high computational complexity, slow policy convergence, and insufficient stability. To address these challenges, we proposed a MA-DHRL-OM routing method. First, leveraging the centralized topological view provided by Software-Defined Networking (SDN), the method collected link-state information and constructed a traffic-aware feature model to provide multi-dimensional decision support for OM path planning. Second, within a unified framework that integrates multi-agent reinforcement learning and hierarchical reinforcement learning, MA-DHRL-OM solves for the optimal OM tree as follows: The hierarchical learning architecture decomposes the construction of the OM tree into a two-stage subtask framework. By designing tailored decision logic and reward signal feedback mechanisms for upper- and lower-layer agents, it achieved hierarchical decoupling of the high-dimensional OM problem, effectively reducing the action space dimensionality and enhancing policy convergence stability. Moreover, the multi-agent collaboration mechanism enabled each agent to make independent decisions based on its local observations, thereby balancing multi-objective optimization while improving the algorithm's overall scalability and adaptability. Extensive simulation experiments demonstrated that, compared with existing methods, MA-DHRL-OM achieves superior performance in optimizing key metrics such as delay, bandwidth utilization, and packet loss rate while exhibiting more stable convergence behavior and greater flexibility in OM routing decisions.

References

[1]	H. Marques, H. Silva, E. Logota, J. Rodriguez, S. Vahid, R. Tafazolli, Multiview real-time media distribution for next generation networks, Comput. Networks, 118 (2017), 96–124. https://doi.org/10.1016/j.comnet.2017.03.002 doi: 10.1016/j.comnet.2017.03.002
[2]	M. L. Hu, M. Xiao, Y. Hu, C. Cai, T. P. Deng, K. Peng, Software defined multicast using segment routing in LEO satellite networks, IEEE Trans. Mob. Comput. , 23 (2024), 835–849. https://doi.org/10.1109/TMC.2022.3215976 doi: 10.1109/TMC.2022.3215976
[3]	Y. H. Chu, S. G. Rao, S. Seshan, H. Zhang, A case for end system multicast, IEEE J. Sel. Areas Commun. , 20 (2002), 1456–1471. https://doi.org/10.1109/JSAC.2002.803066 doi: 10.1109/JSAC.2002.803066
[4]	M. Hosseini, D. T. Ahmed, S. Shirmohammadi, N. D. Georganas, A survey of application-layer multicast protocols, IEEE Commun. Surv. Tutorials, 9 (2007), 58–74. https://doi.org/10.1109/COMST.2007.4317616 doi: 10.1109/COMST.2007.4317616
[5]	T. Ruso, C. Chellappan, P. Sivasankar, Ppssm: Push/pull smooth video streaming multicast protocol design and implementation for an overlay network, Multimedia Tools Appl. , 75 (2016), 17097–17119. https://doi.org/10.1007/s11042-015-2979-5 doi: 10.1007/s11042-015-2979-5
[6]	A. Sampaio, P. Sousa, An adaptable and ISP-friendly multicast overlay network, Peer-to-Peer Networking Appl. , 12 (2019), 809–829. https://doi.org/10.1007/s12083-018-0680-y doi: 10.1007/s12083-018-0680-y
[7]	Y. Zhu, B. Li, J. Guo, Multicast with network coding in application-layer overlay networks, IEEE J. Sel. Areas Commun. , 22 (2004), 107–120. https://doi.org/10.1109/JSAC.2003.818801 doi: 10.1109/JSAC.2003.818801
[8]	J. Zhang, L. Liu, L. Ramaswamy, C. Pu, Peercast: Churn-resilient end system multicast on heterogeneous overlay networks, J. Network Comput. Appl. , 31 (2008), 821–850. https://doi.org/10.1016/j.jnca.2007.05.001 doi: 10.1016/j.jnca.2007.05.001
[9]	J. Su, J. Cao, B. Zhang, A survey of the research on ALM stability enhancement, Chin. J. Comput. , 32 (2009), 576–590.
[10]	X. C. Zhang, Z. Wang, W. M. Luo, B. P. Yan, Topology-aware application layer multicast scheme, J. Software, 21 (2010), 2010–2022. https://doi.org/10.3724/SP.J.1001.2010.03594 doi: 10.3724/SP.J.1001.2010.03594
[11]	Y. Zhang, X. Nie, J. Jiang, W. Wang, K. Xu, Y. Zhao, et al., BDS+: An inter-datacenter data replication system with dynamic bandwidth separation, IEEE/ACM Trans. Networking, 29 (2021), 918–934. https://doi.org/10.1109/TNET.2021.3054924 doi: 10.1109/TNET.2021.3054924
[12]	C. Kim, Y. Kim, J. H. Yang, I. Yeom, Analysis of bandwidth efficiency in overlay multicasting, Comput. Networks, 52 (2008), 384–398. https://doi.org/10.1016/j.comnet.2007.09.020 doi: 10.1016/j.comnet.2007.09.020
[13]	H. C. Lin, H. M. Yang, An approximation algorithm for constructing degree-dependent node-weighted multicast trees, IEEE Trans. Parallel Distrib. Syst. , 25 (2014), 1976–1985. https://doi.org/10.1109/TPDS.2013.108 doi: 10.1109/TPDS.2013.108
[14]	J. Ruckert, J. Blendin, R. Hark, D. Hausheer, Flexible, efficient, and scalable software-defined over-the-top multicast for ISP environments with DynSdm, IEEE Trans. Network Serv. Manage. , 13 (2016), 754–767. https://doi.org/10.1109/TNSM.2016.2607281 doi: 10.1109/TNSM.2016.2607281
[15]	F. Coras, J. Domingo-Pascual, F. Maino, D. Farinacci, A. Cabellos-Aparicio, Lcast: Software-defined inter-domain multicast, Comput. Networks, 59 (2014), 153–170. https://doi.org/10.1016/j.bjp.2013.10.010 doi: 10.1016/j.bjp.2013.10.010
[16]	H. Zhong, F. Wu, Y. Xu, J. Cui, QoS-aware multicast for scalable video streaming in software-defined networks, IEEE Trans. Multimedia, 23 (2021), 982–994. https://doi.org/10.1109/TMM.2020.2991539 doi: 10.1109/TMM.2020.2991539
[17]	Y. Gong, W. Huang, W. Wang, Y. Lei, A survey on software defined networking and its applications, Front. Comput. Sci. , 9 (2015), 827–845. https://doi.org/10.1007/s11704-015-3448-z doi: 10.1007/s11704-015-3448-z
[18]	H. W. Da Silva, F. R. Barbalho, A. V. Neto, Cross-layer multiuser session control for optimized communications on SDN-based cloud platforms, Future Gener. Comput. Syst. , 92 (2019), 1116–1130. https://doi.org/10.1016/j.future.2017.11.016 doi: 10.1016/j.future.2017.11.016
[19]	Y. Shi, J. Wong, H. A. Jacobsen, Y. Zhang, J. Chen, Topic-oriented bucket-based fast multicast routing in SDN-like publish/subscribe middleware, IEEE Access, 8 (2020), 89741–89756. https://doi.org/10.1109/ACCESS.2020.2994268 doi: 10.1109/ACCESS.2020.2994268
[20]	J. Cao, A minimum delay spanning tree algorithm for the application-layer multicast, J. Software, 16 (2005), 1766–1773. https://doi.org/10.1360/jos161766 doi: 10.1360/jos161766
[21]	Y. Zhu, B. Li, K. Q. Pu, Dynamic multicast in overlay networks with linear capacity constraints, IEEE Trans. Parallel Distrib. Syst. , 20 (2009), 925–939. https://doi.org/10.1109/tpds.2008.155 doi: 10.1109/tpds.2008.155
[22]	Q. Liu, R. Tang, H. Ren, Y. Pei, Optimizing multicast routing tree on application layer via an encoding-free non-dominated sorting genetic algorithm, Appl. Intell. , 50 (2020), 759–777. https://doi.org/10.1007/s10489-019-01547-9 doi: 10.1007/s10489-019-01547-9
[23]	S. Y. Tseng, C. C. Lin, Y. M. Huang, Ant colony-based algorithm for constructing broadcasting tree with degree and delay constraints, Expert Syst. Appl. , 35 (2008), 1473–1481. https://doi.org/10.1016/j.eswa.2007.08.018 doi: 10.1016/j.eswa.2007.08.018
[24]	X. Wang, S. Wang, X. Liang, D. Zhao, J. Huang, X. Xu, et al., Deep reinforcement learning: A survey, IEEE Trans. Neural Networks Learn. Syst. , 35 (2024), 5064–5078. https://doi.org/10.1109/TNNLS.2022.3207346 doi: 10.1109/TNNLS.2022.3207346
[25]	F. Zhao, F. Yin, L. Wang, Y. Yu, A co-evolution algorithm with dueling reinforcement learning mechanism for the energy-aware distributed heterogeneous flexible flow-shop scheduling problem, IEEE Trans. Syst. Man Cybern. Syst. , 55 (2025), 1794–1809. https://doi.org/10.1109/TSMC.2024.3510384 doi: 10.1109/TSMC.2024.3510384
[26]	Z. Pan, D. Lei, L. Wang, A knowledge-based two-population optimization algorithm for distributed energy-efficient parallel machines scheduling, IEEE Trans. Cybern. , 52 (2022), 5051–5063. https://doi.org/10.1109/TCYB.2020.3026571 doi: 10.1109/TCYB.2020.3026571
[27]	H. Wang, B. R. Sarker, J. Li, J. Li, Adaptive scheduling for assembly job shop with uncertain assembly times based on dual Q-learning, Int. J. Prod. Res. , 59 (2021), 5867–5883. https://doi.org/10.1080/00207543.2020.1794075 doi: 10.1080/00207543.2020.1794075
[28]	X. Li, J. Tian, C. Wang, Y. Jiang, X. Wang, J. Wang, Multi-objective multicast optimization with deep reinforcement learning, Cluster Comput. , 28 (2025), 222. https://doi.org/10.1007/s10586-024-04906-5 doi: 10.1007/s10586-024-04906-5
[29]	X. Li, Y. Wang, TABDeep: A two-level action branch architecture-based deep reinforcement learning for distributed sub-tree scheduling of online multicast sessions in EON, Comput. Networks, 243 (2024), 110288. https://doi.org/10.1016/j.comnet.2024.110288 doi: 10.1016/j.comnet.2024.110288
[30]	M. Ye, C. Zhao, P. Wen, Y. Wang, X. Wang, H. Qiu, DHRL-FNMR: An intelligent multicast routing approach based on deep hierarchical reinforcement learning in SDN, IEEE Trans. Network Serv. Manage. , 21 (2024), 5733–5755. https://doi.org/10.1109/TNSM.2024.3402275 doi: 10.1109/TNSM.2024.3402275
[31]	Y. Li, Q. Zhang, H. Yao, R. Gao, X. Xin, F. R. Yu, Stigmergy and hierarchical learning for routing optimization in multi-domain collaborative satellite networks, IEEE J. Sel. Areas Commun. , 42 (2024), 1188–1203. https://doi.org/10.1109/JSAC.2024.3365878 doi: 10.1109/JSAC.2024.3365878
[32]	K. Hu, M. Li, Z. Song, K. Xu, Q. Xia, N. Sun, et al., A review of research on reinforcement learning algorithms for multi-agents, Neurocomputing, 599 (2024), 128068. https://doi.org/10.1016/j.neucom.2024.128068 doi: 10.1016/j.neucom.2024.128068
[33]	P. Wen, M. Ye, Y. Wang, Q. He, H. Qiu, A multi-agent graph reinforcement learning method for many-to-many communication routing in SDWN, Acta Electron. Sin. , 53 (2025), 1885–1905.
[34]	J. H. Wang, J. Cai, J. Lu, K. Yin, J. Yang, Solving multicast problem in cloud networks using overlay routing, Comput. Commun. , 70 (2015), 1–14. https://doi.org/10.1016/j.comcom.2015.05.016 doi: 10.1016/j.comcom.2015.05.016
[35]	S. Y. Tseng, Y. M. Huang, C. C. Lin, Genetic algorithm for delay- and degree-constrained multimedia broadcasting on overlay networks, Comput. Commun. , 29 (2006), 3625–3632. https://doi.org/10.1016/j.comcom.2006.06.003 doi: 10.1016/j.comcom.2006.06.003
[36]	L. Lin, J. Zhou, L. Zhang, Z. Ye, Overlay multicast routing algorithm with minimum overlay cost, J. Comput. Appl. , 10 (2008), 2569–2576. https://doi.org/10.3724/SP.J.1087.2008.02569 doi: 10.3724/SP.J.1087.2008.02569
[37]	Q. Liu, Y. Wang, X. Li, H. Li, Gene-pool based genetic algorithm for optimizing application layer multicast, Comput. Eng. Appl. , 55 (2019), 142–150. https://doi.org/10.3778/j.issn.1002-8331.1903-0444 doi: 10.3778/j.issn.1002-8331.1903-0444
[38]	Y. Li, N. Wang, W. Zhang, Q. Liu, F. Liu, Discrete artificial fish swarm algorithm-based one-off optimization method for multiple co-existing application layer multicast routing trees, Electronics, 13 (2024), 894. https://doi.org/10.3390/electronics13050894 doi: 10.3390/electronics13050894
[39]	J. Chae, N. Kim, Multicast tree generation using meta reinforcement learning in SDN-based smart network platforms, KSⅡ Trans. Internet Inf. Syst. , 15 (2021), 3138–3150. https://doi.org/10.3837/tiis.2021.09.003 doi: 10.3837/tiis.2021.09.003
[40]	M. Ye, H. W. Hu, Y. Wang, Q. He, X. L. Wang, P. Wen, et al., MA-CDMR: An intelligent cross domain multicast routing method based on multi-agent deep reinforcement learning in SDWN multi controller domain, Chin. J. Comput. , 48 (2025), 1417–1442. https://doi.org/10.11897/SP.J.1016.2025.01417 doi: 10.11897/SP.J.1016.2025.01417
[41]	M. Kim, H. Choo, M. W. Mutka, H. J. Lim, K. Park, On QoS multicast routing algorithms using k-minimum Steiner trees, Inf. Sci. , 238 (2013), 190–204. https://doi.org/10.1016/j.ins.2013.03.006 doi: 10.1016/j.ins.2013.03.006
[42]	Mininet-WIFI. Available from: https://mininet-wifi.github.io/ (accessed Mar.16, 2023).
[43]	iPerf. Available from: https://iperf.fr (accessed Mar. 16, 2023).
[44]	Ryu. Available from: https://ryu-sdn.org/ (accessed Mar. 16, 2023).
[45]	Y. R. Chen, A. Rezapour, W. G. Tzeng, S. C. Tsai, RL-routing: An SDN routing algorithm based on deep reinforcement learning, IEEE Trans. Network Sci. Eng. , 7 (2020), 3185–3199. https://doi.org/10.1109/TNSE.2020.3017751 doi: 10.1109/TNSE.2020.3017751

Reader Comments

Your name:*

Email:*
© 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)