Research article Special Issues

An overlay multicast routing method based on network situational awareness and hierarchical multi-agent reinforcement learning

  • Published: 22 April 2026
  • Compared with IP multicast, Overlay Multicast (OM) trees constructed at the application layer offer superior compatibility and flexible deployment advantages in heterogeneous, cross-domain networks. However, OM implementations under traditional network architectures suffer from weak adaptability to highly dynamic traffic due to their lack of awareness of underlying physical resource states. Moreover, reinforcement learning-based approaches fail to decouple the multi-objective tightly coupled nature of OM, resulting in high computational complexity, slow policy convergence, and insufficient stability. To address these challenges, we proposed a MA-DHRL-OM routing method. First, leveraging the centralized topological view provided by Software-Defined Networking (SDN), the method collected link-state information and constructed a traffic-aware feature model to provide multi-dimensional decision support for OM path planning. Second, within a unified framework that integrates multi-agent reinforcement learning and hierarchical reinforcement learning, MA-DHRL-OM solves for the optimal OM tree as follows: The hierarchical learning architecture decomposes the construction of the OM tree into a two-stage subtask framework. By designing tailored decision logic and reward signal feedback mechanisms for upper- and lower-layer agents, it achieved hierarchical decoupling of the high-dimensional OM problem, effectively reducing the action space dimensionality and enhancing policy convergence stability. Moreover, the multi-agent collaboration mechanism enabled each agent to make independent decisions based on its local observations, thereby balancing multi-objective optimization while improving the algorithm's overall scalability and adaptability. Extensive simulation experiments demonstrated that, compared with existing methods, MA-DHRL-OM achieves superior performance in optimizing key metrics such as delay, bandwidth utilization, and packet loss rate while exhibiting more stable convergence behavior and greater flexibility in OM routing decisions.

    Citation: Miao Ye, Yanye Chen, Yong Wang, Cheng Zhu, Qiuxiang Jiang, Gai Huang, Feng Ding. An overlay multicast routing method based on network situational awareness and hierarchical multi-agent reinforcement learning[J]. Electronic Research Archive, 2026, 34(5): 3447-3480. doi: 10.3934/era.2026154

    Related Papers:

  • Compared with IP multicast, Overlay Multicast (OM) trees constructed at the application layer offer superior compatibility and flexible deployment advantages in heterogeneous, cross-domain networks. However, OM implementations under traditional network architectures suffer from weak adaptability to highly dynamic traffic due to their lack of awareness of underlying physical resource states. Moreover, reinforcement learning-based approaches fail to decouple the multi-objective tightly coupled nature of OM, resulting in high computational complexity, slow policy convergence, and insufficient stability. To address these challenges, we proposed a MA-DHRL-OM routing method. First, leveraging the centralized topological view provided by Software-Defined Networking (SDN), the method collected link-state information and constructed a traffic-aware feature model to provide multi-dimensional decision support for OM path planning. Second, within a unified framework that integrates multi-agent reinforcement learning and hierarchical reinforcement learning, MA-DHRL-OM solves for the optimal OM tree as follows: The hierarchical learning architecture decomposes the construction of the OM tree into a two-stage subtask framework. By designing tailored decision logic and reward signal feedback mechanisms for upper- and lower-layer agents, it achieved hierarchical decoupling of the high-dimensional OM problem, effectively reducing the action space dimensionality and enhancing policy convergence stability. Moreover, the multi-agent collaboration mechanism enabled each agent to make independent decisions based on its local observations, thereby balancing multi-objective optimization while improving the algorithm's overall scalability and adaptability. Extensive simulation experiments demonstrated that, compared with existing methods, MA-DHRL-OM achieves superior performance in optimizing key metrics such as delay, bandwidth utilization, and packet loss rate while exhibiting more stable convergence behavior and greater flexibility in OM routing decisions.



    加载中


    [1] H. Marques, H. Silva, E. Logota, J. Rodriguez, S. Vahid, R. Tafazolli, Multiview real-time media distribution for next generation networks, Comput. Networks, 118 (2017), 96–124. https://doi.org/10.1016/j.comnet.2017.03.002 doi: 10.1016/j.comnet.2017.03.002
    [2] M. L. Hu, M. Xiao, Y. Hu, C. Cai, T. P. Deng, K. Peng, Software defined multicast using segment routing in LEO satellite networks, IEEE Trans. Mob. Comput. , 23 (2024), 835–849. https://doi.org/10.1109/TMC.2022.3215976 doi: 10.1109/TMC.2022.3215976
    [3] Y. H. Chu, S. G. Rao, S. Seshan, H. Zhang, A case for end system multicast, IEEE J. Sel. Areas Commun. , 20 (2002), 1456–1471. https://doi.org/10.1109/JSAC.2002.803066 doi: 10.1109/JSAC.2002.803066
    [4] M. Hosseini, D. T. Ahmed, S. Shirmohammadi, N. D. Georganas, A survey of application-layer multicast protocols, IEEE Commun. Surv. Tutorials, 9 (2007), 58–74. https://doi.org/10.1109/COMST.2007.4317616 doi: 10.1109/COMST.2007.4317616
    [5] T. Ruso, C. Chellappan, P. Sivasankar, Ppssm: Push/pull smooth video streaming multicast protocol design and implementation for an overlay network, Multimedia Tools Appl. , 75 (2016), 17097–17119. https://doi.org/10.1007/s11042-015-2979-5 doi: 10.1007/s11042-015-2979-5
    [6] A. Sampaio, P. Sousa, An adaptable and ISP-friendly multicast overlay network, Peer-to-Peer Networking Appl. , 12 (2019), 809–829. https://doi.org/10.1007/s12083-018-0680-y doi: 10.1007/s12083-018-0680-y
    [7] Y. Zhu, B. Li, J. Guo, Multicast with network coding in application-layer overlay networks, IEEE J. Sel. Areas Commun. , 22 (2004), 107–120. https://doi.org/10.1109/JSAC.2003.818801 doi: 10.1109/JSAC.2003.818801
    [8] J. Zhang, L. Liu, L. Ramaswamy, C. Pu, Peercast: Churn-resilient end system multicast on heterogeneous overlay networks, J. Network Comput. Appl. , 31 (2008), 821–850. https://doi.org/10.1016/j.jnca.2007.05.001 doi: 10.1016/j.jnca.2007.05.001
    [9] J. Su, J. Cao, B. Zhang, A survey of the research on ALM stability enhancement, Chin. J. Comput. , 32 (2009), 576–590.
    [10] X. C. Zhang, Z. Wang, W. M. Luo, B. P. Yan, Topology-aware application layer multicast scheme, J. Software, 21 (2010), 2010–2022. https://doi.org/10.3724/SP.J.1001.2010.03594 doi: 10.3724/SP.J.1001.2010.03594
    [11] Y. Zhang, X. Nie, J. Jiang, W. Wang, K. Xu, Y. Zhao, et al., BDS+: An inter-datacenter data replication system with dynamic bandwidth separation, IEEE/ACM Trans. Networking, 29 (2021), 918–934. https://doi.org/10.1109/TNET.2021.3054924 doi: 10.1109/TNET.2021.3054924
    [12] C. Kim, Y. Kim, J. H. Yang, I. Yeom, Analysis of bandwidth efficiency in overlay multicasting, Comput. Networks, 52 (2008), 384–398. https://doi.org/10.1016/j.comnet.2007.09.020 doi: 10.1016/j.comnet.2007.09.020
    [13] H. C. Lin, H. M. Yang, An approximation algorithm for constructing degree-dependent node-weighted multicast trees, IEEE Trans. Parallel Distrib. Syst. , 25 (2014), 1976–1985. https://doi.org/10.1109/TPDS.2013.108 doi: 10.1109/TPDS.2013.108
    [14] J. Ruckert, J. Blendin, R. Hark, D. Hausheer, Flexible, efficient, and scalable software-defined over-the-top multicast for ISP environments with DynSdm, IEEE Trans. Network Serv. Manage. , 13 (2016), 754–767. https://doi.org/10.1109/TNSM.2016.2607281 doi: 10.1109/TNSM.2016.2607281
    [15] F. Coras, J. Domingo-Pascual, F. Maino, D. Farinacci, A. Cabellos-Aparicio, Lcast: Software-defined inter-domain multicast, Comput. Networks, 59 (2014), 153–170. https://doi.org/10.1016/j.bjp.2013.10.010 doi: 10.1016/j.bjp.2013.10.010
    [16] H. Zhong, F. Wu, Y. Xu, J. Cui, QoS-aware multicast for scalable video streaming in software-defined networks, IEEE Trans. Multimedia, 23 (2021), 982–994. https://doi.org/10.1109/TMM.2020.2991539 doi: 10.1109/TMM.2020.2991539
    [17] Y. Gong, W. Huang, W. Wang, Y. Lei, A survey on software defined networking and its applications, Front. Comput. Sci. , 9 (2015), 827–845. https://doi.org/10.1007/s11704-015-3448-z doi: 10.1007/s11704-015-3448-z
    [18] H. W. Da Silva, F. R. Barbalho, A. V. Neto, Cross-layer multiuser session control for optimized communications on SDN-based cloud platforms, Future Gener. Comput. Syst. , 92 (2019), 1116–1130. https://doi.org/10.1016/j.future.2017.11.016 doi: 10.1016/j.future.2017.11.016
    [19] Y. Shi, J. Wong, H. A. Jacobsen, Y. Zhang, J. Chen, Topic-oriented bucket-based fast multicast routing in SDN-like publish/subscribe middleware, IEEE Access, 8 (2020), 89741–89756. https://doi.org/10.1109/ACCESS.2020.2994268 doi: 10.1109/ACCESS.2020.2994268
    [20] J. Cao, A minimum delay spanning tree algorithm for the application-layer multicast, J. Software, 16 (2005), 1766–1773. https://doi.org/10.1360/jos161766 doi: 10.1360/jos161766
    [21] Y. Zhu, B. Li, K. Q. Pu, Dynamic multicast in overlay networks with linear capacity constraints, IEEE Trans. Parallel Distrib. Syst. , 20 (2009), 925–939. https://doi.org/10.1109/tpds.2008.155 doi: 10.1109/tpds.2008.155
    [22] Q. Liu, R. Tang, H. Ren, Y. Pei, Optimizing multicast routing tree on application layer via an encoding-free non-dominated sorting genetic algorithm, Appl. Intell. , 50 (2020), 759–777. https://doi.org/10.1007/s10489-019-01547-9 doi: 10.1007/s10489-019-01547-9
    [23] S. Y. Tseng, C. C. Lin, Y. M. Huang, Ant colony-based algorithm for constructing broadcasting tree with degree and delay constraints, Expert Syst. Appl. , 35 (2008), 1473–1481. https://doi.org/10.1016/j.eswa.2007.08.018 doi: 10.1016/j.eswa.2007.08.018
    [24] X. Wang, S. Wang, X. Liang, D. Zhao, J. Huang, X. Xu, et al., Deep reinforcement learning: A survey, IEEE Trans. Neural Networks Learn. Syst. , 35 (2024), 5064–5078. https://doi.org/10.1109/TNNLS.2022.3207346 doi: 10.1109/TNNLS.2022.3207346
    [25] F. Zhao, F. Yin, L. Wang, Y. Yu, A co-evolution algorithm with dueling reinforcement learning mechanism for the energy-aware distributed heterogeneous flexible flow-shop scheduling problem, IEEE Trans. Syst. Man Cybern. Syst. , 55 (2025), 1794–1809. https://doi.org/10.1109/TSMC.2024.3510384 doi: 10.1109/TSMC.2024.3510384
    [26] Z. Pan, D. Lei, L. Wang, A knowledge-based two-population optimization algorithm for distributed energy-efficient parallel machines scheduling, IEEE Trans. Cybern. , 52 (2022), 5051–5063. https://doi.org/10.1109/TCYB.2020.3026571 doi: 10.1109/TCYB.2020.3026571
    [27] H. Wang, B. R. Sarker, J. Li, J. Li, Adaptive scheduling for assembly job shop with uncertain assembly times based on dual Q-learning, Int. J. Prod. Res. , 59 (2021), 5867–5883. https://doi.org/10.1080/00207543.2020.1794075 doi: 10.1080/00207543.2020.1794075
    [28] X. Li, J. Tian, C. Wang, Y. Jiang, X. Wang, J. Wang, Multi-objective multicast optimization with deep reinforcement learning, Cluster Comput. , 28 (2025), 222. https://doi.org/10.1007/s10586-024-04906-5 doi: 10.1007/s10586-024-04906-5
    [29] X. Li, Y. Wang, TABDeep: A two-level action branch architecture-based deep reinforcement learning for distributed sub-tree scheduling of online multicast sessions in EON, Comput. Networks, 243 (2024), 110288. https://doi.org/10.1016/j.comnet.2024.110288 doi: 10.1016/j.comnet.2024.110288
    [30] M. Ye, C. Zhao, P. Wen, Y. Wang, X. Wang, H. Qiu, DHRL-FNMR: An intelligent multicast routing approach based on deep hierarchical reinforcement learning in SDN, IEEE Trans. Network Serv. Manage. , 21 (2024), 5733–5755. https://doi.org/10.1109/TNSM.2024.3402275 doi: 10.1109/TNSM.2024.3402275
    [31] Y. Li, Q. Zhang, H. Yao, R. Gao, X. Xin, F. R. Yu, Stigmergy and hierarchical learning for routing optimization in multi-domain collaborative satellite networks, IEEE J. Sel. Areas Commun. , 42 (2024), 1188–1203. https://doi.org/10.1109/JSAC.2024.3365878 doi: 10.1109/JSAC.2024.3365878
    [32] K. Hu, M. Li, Z. Song, K. Xu, Q. Xia, N. Sun, et al., A review of research on reinforcement learning algorithms for multi-agents, Neurocomputing, 599 (2024), 128068. https://doi.org/10.1016/j.neucom.2024.128068 doi: 10.1016/j.neucom.2024.128068
    [33] P. Wen, M. Ye, Y. Wang, Q. He, H. Qiu, A multi-agent graph reinforcement learning method for many-to-many communication routing in SDWN, Acta Electron. Sin. , 53 (2025), 1885–1905.
    [34] J. H. Wang, J. Cai, J. Lu, K. Yin, J. Yang, Solving multicast problem in cloud networks using overlay routing, Comput. Commun. , 70 (2015), 1–14. https://doi.org/10.1016/j.comcom.2015.05.016 doi: 10.1016/j.comcom.2015.05.016
    [35] S. Y. Tseng, Y. M. Huang, C. C. Lin, Genetic algorithm for delay- and degree-constrained multimedia broadcasting on overlay networks, Comput. Commun. , 29 (2006), 3625–3632. https://doi.org/10.1016/j.comcom.2006.06.003 doi: 10.1016/j.comcom.2006.06.003
    [36] L. Lin, J. Zhou, L. Zhang, Z. Ye, Overlay multicast routing algorithm with minimum overlay cost, J. Comput. Appl. , 10 (2008), 2569–2576. https://doi.org/10.3724/SP.J.1087.2008.02569 doi: 10.3724/SP.J.1087.2008.02569
    [37] Q. Liu, Y. Wang, X. Li, H. Li, Gene-pool based genetic algorithm for optimizing application layer multicast, Comput. Eng. Appl. , 55 (2019), 142–150. https://doi.org/10.3778/j.issn.1002-8331.1903-0444 doi: 10.3778/j.issn.1002-8331.1903-0444
    [38] Y. Li, N. Wang, W. Zhang, Q. Liu, F. Liu, Discrete artificial fish swarm algorithm-based one-off optimization method for multiple co-existing application layer multicast routing trees, Electronics, 13 (2024), 894. https://doi.org/10.3390/electronics13050894 doi: 10.3390/electronics13050894
    [39] J. Chae, N. Kim, Multicast tree generation using meta reinforcement learning in SDN-based smart network platforms, KSⅡ Trans. Internet Inf. Syst. , 15 (2021), 3138–3150. https://doi.org/10.3837/tiis.2021.09.003 doi: 10.3837/tiis.2021.09.003
    [40] M. Ye, H. W. Hu, Y. Wang, Q. He, X. L. Wang, P. Wen, et al., MA-CDMR: An intelligent cross domain multicast routing method based on multi-agent deep reinforcement learning in SDWN multi controller domain, Chin. J. Comput. , 48 (2025), 1417–1442. https://doi.org/10.11897/SP.J.1016.2025.01417 doi: 10.11897/SP.J.1016.2025.01417
    [41] M. Kim, H. Choo, M. W. Mutka, H. J. Lim, K. Park, On QoS multicast routing algorithms using k-minimum Steiner trees, Inf. Sci. , 238 (2013), 190–204. https://doi.org/10.1016/j.ins.2013.03.006 doi: 10.1016/j.ins.2013.03.006
    [42] Mininet-WIFI. Available from: https://mininet-wifi.github.io/ (accessed Mar.16, 2023).
    [43] iPerf. Available from: https://iperf.fr (accessed Mar. 16, 2023).
    [44] Ryu. Available from: https://ryu-sdn.org/ (accessed Mar. 16, 2023).
    [45] Y. R. Chen, A. Rezapour, W. G. Tzeng, S. C. Tsai, RL-routing: An SDN routing algorithm based on deep reinforcement learning, IEEE Trans. Network Sci. Eng. , 7 (2020), 3185–3199. https://doi.org/10.1109/TNSE.2020.3017751 doi: 10.1109/TNSE.2020.3017751
  • Reader Comments
  • © 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(64) PDF downloads(6) Cited by(0)

Article outline

Figures and Tables

Figures(11)  /  Tables(4)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog