A mathematical model of food intake

Mantana Chudtong; Andrea De Gaetano; Mantana Chudtong; Andrea De Gaetano

doi:10.3934/mbe.2021067

Mathematical Biosciences and Engineering

2021, Volume 18, Issue 2: 1238-1279. doi: 10.3934/mbe.2021067

Previous Article Next Article

Research article

A mathematical model of food intake

Mantana Chudtong ^{1,2
,
,},
Andrea De Gaetano ^1,3,4

1.
Department of Mathematics, Faculty of Science, Mahidol University, Bangkok 10400, Thailand
2.
Center of Excellence in Mathematics, the Commission on Higher Education, Si Ayutthaya Rd., Bangkok 10400, Thailand
3.
Consiglio Nazionale delle Ricerche, Istituto per la Ricerca e l'Innovazione Biomedica (CNR-IRIB), Palermo, Italy
4.
Consiglio Nazionale delle Ricerche, Istituto di Analisi dei Sistemi ed Informatica "A. Ruberti" (CNR-IASI), Rome, Italy

Received: 06 August 2020 Accepted: 30 November 2020 Published: 15 January 2021

The metabolic, hormonal and psychological determinants of the feeding behavior in humans are numerous and complex. A plausible model of the initiation, continuation and cessation of meals taking into account the most relevant such determinants would be very useful in simulating food intake over hours to days, thus providing input into existing models of nutrient absorption and metabolism. In the present work, a meal model is proposed, incorporating stomach distension, glycemic variations, ghrelin dynamics, cultural habits and influences on the initiation and continuation of meals, reflecting a combination of hedonic and appetite components. Given a set of parameter values (portraying a single subject), the timing and size of meals are stochastic. The model parameters are calibrated so as to reflect established medical knowledge on data of food intake from the National Health and Nutrition Examination Survey (NHANES) database during years 2015 and 2016.

Keywords:

Citation: Mantana Chudtong, Andrea De Gaetano. A mathematical model of food intake[J]. Mathematical Biosciences and Engineering, 2021, 18(2): 1238-1279. doi: 10.3934/mbe.2021067

Related Papers:

[1]	Ruiping Yuan, Jiangtao Dou, Juntao Li, Wei Wang, Yingfan Jiang . Multi-robot task allocation in e-commerce RMFS based on deep reinforcement learning. Mathematical Biosciences and Engineering, 2023, 20(2): 1903-1918. doi: 10.3934/mbe.2023087
[2]	Yangjie Sun, Xiaoxi Che, Nan Zhang . 3D human pose detection using nano sensor and multi-agent deep reinforcement learning. Mathematical Biosciences and Engineering, 2023, 20(3): 4970-4987. doi: 10.3934/mbe.2023230
[3]	Jin Zhang, Nan Ma, Zhixuan Wu, Cheng Wang, Yongqiang Yao . Intelligent control of self-driving vehicles based on adaptive sampling supervised actor-critic and human driving experience. Mathematical Biosciences and Engineering, 2024, 21(5): 6077-6096. doi: 10.3934/mbe.2024267
[4]	Siqi Chen, Ran Su . An autonomous agent for negotiation with multiple communication channels using parametrized deep Q-network. Mathematical Biosciences and Engineering, 2022, 19(8): 7933-7951. doi: 10.3934/mbe.2022371
[5]	Shixuan Yao, Xiaochen Liu, Yinghui Zhang, Ze Cui . An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning. Mathematical Biosciences and Engineering, 2022, 19(9): 9258-9290. doi: 10.3934/mbe.2022430
[6]	Siqi Chen, Yang Yang, Ran Su . Deep reinforcement learning with emergent communication for coalitional negotiation games. Mathematical Biosciences and Engineering, 2022, 19(5): 4592-4609. doi: 10.3934/mbe.2022212
[7]	Jia Mian Tan, Haoran Liao, Wei Liu, Changjun Fan, Jincai Huang, Zhong Liu, Junchi Yan . Hyperparameter optimization: Classics, acceleration, online, multi-objective, and tools. Mathematical Biosciences and Engineering, 2024, 21(6): 6289-6335. doi: 10.3934/mbe.2024275
[8]	Jingxu Xiao, Chaowen Chang, Yingying Ma, Chenli Yang, Lu Yuan . Secure multi-path routing for Internet of Things based on trust evaluation. Mathematical Biosciences and Engineering, 2024, 21(2): 3335-3363. doi: 10.3934/mbe.2024148
[9]	Koji Oshima, Daisuke Yamamoto, Atsuhiro Yumoto, Song-Ju Kim, Yusuke Ito, Mikio Hasegawa . Online machine learning algorithms to optimize performances of complex wireless communication systems. Mathematical Biosciences and Engineering, 2022, 19(2): 2056-2094. doi: 10.3934/mbe.2022097
[10]	Jose Guadalupe Beltran-Hernandez, Jose Ruiz-Pinales, Pedro Lopez-Rodriguez, Jose Luis Lopez-Ramirez, Juan Gabriel Avina-Cervantes . Multi-Stroke handwriting character recognition based on sEMG using convolutional-recurrent neural networks. Mathematical Biosciences and Engineering, 2020, 17(5): 5432-5448. doi: 10.3934/mbe.2020293

Abstract

1. Introduction

With the rapid development of wireless network technology, the applications of multicast communication in wireless networks are becoming increasingly widespread. This communication technology can be applied for purposes such as video live streaming, multimedia conferences, real-time data transmission, and online games. In these applications, as the number of users and the level of user demand continue to increase, attempting to use unicast communication to send the necessary data would place enormous pressure on the information sources and the network bandwidth, leading to network congestion and inability to meet user needs. In the broadcasting scenario, the transmitted information will also be received by users who do not need it, not only compromising the security of the information but also wasting considerable bandwidth. For such point-to-point applications, multicast technology can better solve the above problems. In multicast services, only one multicast message needs to be sent by the source host, and the data are then replicated and distributed to multiple target nodes upon encountering forked nodes during transmission ^[1]. Therefore, multicasting can effectively save bandwidth, reduce the network load, and improve the security of information transmission ^[2].

Multicast routing requires the construction of an optimal multicast tree from the source node to all destination nodes ^[3]. Timely acquisition of global dynamic network link state information is one of the basic prerequisites for constructing such an optimal multicast tree. Traditional wireless networks typically utilize a distributed management approach ^[4], in which network resources and functionalities are dispersed across various wireless network devices (such as access points, routers, and switches) and each device independently executes control decisions. While this approach offers flexibility, it suffers from low management efficiency and presents difficulties in achieving timely optimization and coordination of the entire network. Additionally, as the network expands in scale, the traffic data forwarded by network devices become increasingly voluminous, making it challenging for traditional network devices, for which forwarding is tightly coupled with control, to obtain real-time information on the global network status. To address the aforementioned issues, the recently emerging technology of software-defined wireless networking (SDWN) ^[5] provides an excellent solution.

SDWN combines software-defined networking (SDN) ^[6] with wireless networks. SDWN solves the problems of low network management and control efficiency and the difficulty of achieving global optimization and coordination in traditional wireless network structures by exploiting the centralized management advantages of SDN, such as centralized control logic and the decoupling of forwarding from control. By taking full advantage of these centralized management capabilities, SDWN facilitates the global optimization and coordination of network resources. SDWN enables the controller of a wireless network to obtain the global static topological structure of the network, the global network state, and the utilization rates of resources by controlling the logical concentration ^[7]. In combination with the programmability of SDWN networks, these capabilities allow the network controller to achieve unified management, integration, and virtualization of network resources and to use a northbound interface to provide on-demand allocation of network resources and services for upper-layer applications.

The classic algorithms for constructing multicast trees in traditional multicast routing include the shortest path and minimum spanning tree algorithm of Kou, Markowsky and Berman (the KMB algorithm) ^[8], the minimum cost path heuristic (MPH) algorithm ^[9], and the average distance heuristic (ADH) algorithm ^[10]. These classic multicast tree construction algorithms have been successfully applied in many fields over the past decade. However, with the continuous expansion of the network scale and the exponential growth in network traffic, these traditional multicast tree construction methods cannot adapt to the dynamic changes of link information in wireless networks, making it difficult to meet the current requirements in terms of network service quality. Moreover, as the scale of SDWN networks continues to expand, this deficiency becomes even more apparent. Therefore, designing multicast trees that adapt to the dynamic changes in network link information to meet the high-performance requirements of multicast services is an important research topic in leveraging the advantages of SDN architecture.

In recent years, artificial intelligence technology has been increasingly studied and applied in the networking field due to its strong adaptability and flexibility. Deep reinforcement learning has significant advantages in high-dimensional and complex decision-making. By combining it with the SDN architecture, researchers can fully leverage its flexibility and ability to adapt to the dynamic changes of network link information, thereby improving network efficiency and performance. Currently, most research on the application of deep reinforcement learning to SDN unicast and multicast communication is limited to discussions of single-agent reinforcement learning methods ^{[11,12,13,14]}. However, compared to multi-agent reinforcement learning, the convergence speed of these methods is slow. Consequently, in the case of frequent and dynamic changes of network link information, the single-agent approach has difficulty responding quickly to the forwarding needs of data flows.

In consideration of the above issues, this paper proposes an intelligent multicast routing method based on multiagent deep reinforcement learning, named MADRL-MR, for use in SDWN. In MADRL-MR, an SDWN framework is designed to overcome the limitations of traditional wireless networking, in which the overall network cannot be directly controlled and maintained, and to enable more convenient configuration of the network devices while improving the network performance. This framework is used to manage a wireless network and obtain its global topology and link state information. It also makes use of the adaptability and flexibility of deep reinforcement learning to adapt to the dynamic changes of network link information. To address the slow convergence speed of the construction of multicast trees using a single intelligent agent as well as the difficulty of quickly responding to data forwarding demands, a multi-agent deep reinforcement learning algorithm is designed for multicast tree construction in MADRL-MR. In this algorithm, each intelligent agent can independently learn and adapt to changes in the network state and collaborate to achieve better routing strategies. To accelerate the training speed of the multiple intelligent agents, we design corresponding transfer learning mechanisms ^[15], in which an initial set of weights is pre-trained and loaded before each intelligent agent begins training to accelerate its convergence speed.

The main contributions of this article are as follows:

1) In contrast to the traditional approach for managing and maintaining the global network state in a wireless network, we design a network architecture based on SDWN. By virtue of the centralized control logic and programmability features of SDWN, we can monitor the global static topology and network status information of a wireless network and obtain real-time link status information, such as bandwidth, delay, and packet loss rate, to achieve more efficient global optimization and coordination of the network resources.

2) In contrast to the existing method of building multicast trees with a single intelligent agent, we design and implement an intelligent multicast routing method based on multiagent deep reinforcement learning. First, we divide the problem of multicast tree construction into multiple subproblems, which are solved through collaboration among multiple intelligent agents. Second, in the design of the state space for each intelligent agent, we comprehensively consider parameters such as bandwidth, delay, the packet loss rate of wireless links, the used bandwidth, the packet error rate, the packet drop rate, the distance between access points, and the multicast tree construction status. In addition, instead of the existing method of using the k-paths approach to design the action space for an intelligent agent, we design a novel action space using the next-hop node in the network as the action. Finally, we design corresponding reward functions for the four possible scenarios encountered in multicast tree construction, which can guide the intelligent agents to select efficient multicast routes.

3) To improve the convergence efficiency and collaboration stability of the multiple intelligent agents, we design a fully decentralized training (independent learning, IL) method for multiagent systems. In addition, to enhance the convergence speed of the multiagent system, we adopt transfer learning techniques. Specifically, we transfer knowledge acquired from experts or previous tasks to the current task at the beginning of the training process, thereby reducing the initial ineffective exploration of the intelligent agents and accelerating their convergence.

The rest of this article is organized as follows. Section 2 introduces the relevant work. Section 3 analyzes the problem and introduces the SDWN intelligent multicast routing structure. Section 4 provides a detailed introduction to the MADRL-MR algorithm. Section 5 introduces the experimental setup and performance evaluation results. Section 6 introduces the conclusion and future work.

2. Related work

In this section, we mainly discuss the related work on multicast routing in SDWN and analyze the advantages and disadvantages of traditional algorithms and intelligent algorithms applied in multicast routing.

Traditional algorithms: Kou et al. ^[8] proposed a Steiner tree construction method based on a shortest path and minimum spanning tree algorithm (the KMB algorithm). Takahashi et al. ^[9] proposed the minimum cost path heuristic (MPH) algorithm. Smith et al. ^[10] designed an algorithm based on an average distance heuristic (ADH). The above three classic algorithms were initially proposed to solve the problem of constructing multicast trees, and many subsequent improvements have been developed based on these algorithms. Yu et al. ^[16] proposed an improved algorithm based on key nodes (KBMPH) by prioritizing the paths for certain key nodes. Zhou et al. ^[17] designed a delay-constrained MPH algorithm (DCMPH). Zhao et al. ^[18] studied how to reduce the cost of constructing a Steiner tree and proposed a weighted node-based MPH algorithm (NWMPH). Farzinvash et al. ^[19] decomposed the problem of multicast tree construction in a wireless mesh network into two phases, with the first phase considering delay and the second phase considering bandwidth. By combining the two phases, these authors proposed an algorithm that comprehensively considers both delay and bandwidth for the construction of multicast trees. Przewoźniczek et al. ^[20] transformed k-shortest Steiner tree problems into binary dynamic problems and solved them using the integer linear programming (ILP) method. Walkowiak et al. ^[21] used a unicast path construction method to construct a multicast tree, but its computational cost was too high. Martins et al. ^[22] transformed the multicast tree construction problem into an ILP problem and designed a heuristic algorithm with delay constraints. Zhang et al. ^[23] proposed a delay-optimized multicast routing scheme for use in the SDN context, which utilizes SDN to obtain network state information. Hu et al. ^[24] also proposed a multicast routing method based on SDN. However, the traditional algorithms mentioned above can use only a single network resource to construct a multicast tree; thus, they have poor perception of the dynamic changes of network link information and significant limitations in constructing efficient multicast routes.

Intelligent Algorithms: Annapurna et al. ^[25] proposed a Steiner tree construction method based on ant colony optimization (ACO), which optimizes the Steiner tree using bandwidth, delay, and path cost. Zhang et al. ^[26] proposed a multicast routing method based on a hybrid ant colony algorithm. This method combines the solution generation process of the ACO algorithm with the cloud model (CM) to obtain a minimum-cost multicast tree that satisfies bandwidth, delay, and delay jitter constraints. Zhang et al. ^[27] proposed a Steiner tree construction method based on particle swarm optimization (PSO), which uses the Steiner tree length as the constraint condition. Nath et al. ^[28] used gradient descent based on general PSO to accelerate the convergence speed of PSO and designed a gradient-based PSO algorithm for building a Steiner tree. Zhang et al. ^[29] proposed a multicast routing method based on a genetic algorithm (GA), in which a new crossover mechanism called leaf crossing (LC) is introduced into the GA to solve multicast quality of service (QoS) models. The above algorithms are all designed for application in traditional network structures and can use only limited network resources to construct multicast trees. Moreover, these algorithms have high computational complexity and consume a significant amount of time; thus, they have difficulty reaching convergence.

Reinforcement learning algorithms: Heo et al. ^[30] proposed a multicast tree construction technique based on reinforcement learning for use in an SDN environment. This technique abstracts the process of constructing a multicast tree as a Markov decision process (MDP), uses SDN technology to obtain global network information and applies reinforcement learning for multicast tree construction. However, this method considers only the number of hops and does not consider other network link state information. Araqi et al. ^[31] proposed a Q-learning-based multicast routing method for wireless mesh networks, which considers only channel selection and rate and does not optimize the construction of multicast trees. Tran et al. ^[32] proposed a deep Q-network (DQN)-based multicast routing method. In this method, broadcasting is first used to find the destination node, and the destination node then uses unicast communication to send data packets to the source node to complete the construction of the multicast tree. This method only considers delay and does not consider parameters such as bandwidth and packet loss rate. Chae et al. ^[33] proposed a multicast tree construction algorithm based on meta-reinforcement learning for use in the SDN context. This algorithm sets the link cost to a fixed value of 1 and does not consider changes in the link state. Zhao et al. ^[34] designed a deep reinforcement learning method for intelligent multicast routing in SDN based on a DQN, which considers only the bandwidth, delay, and packet loss rate of each link; this method has the problem of slow convergence of the intelligent agent.

Multi-agent reinforcement learning algorithms: At present, there is still little research in the literature on the application of multiagent deep reinforcement learning methods to multicast problems in wireless networks. Instead, we can refer only to other relevant literature on multiagent deep reinforcement learning algorithms. Yang et al. ^[35] proposed a software-defined urban traffic control algorithm based on multiagent deep reinforcement learning for use in a software-defined Internet of Things (SD-IoT) cooperative traffic light environment. Suzuki et al. ^[36] proposed a dynamic virtual network (VN) allocation method based on collaborative multiagent deep reinforcement learning (Coop-MADRL) to maximize the utilization of limited network resources in dynamic VNs. Wu et al. ^[37] designed a flow control and multichannel reallocation (TCCA-MADDPG) algorithm based on a multiagent deep deterministic policy gradient (MADDPG) algorithm to optimize the multichannel reallocation framework of the core backbone network based on flow control in the SDN-IoT. Bhavanasi et al. ^[38] proposed a graph convolutional network routing and deep reinforcement learning algorithm for agents, which regards the routing problem as a reinforcement learning problem with two new modifications. Duke et al. ^[39] designed a multiagent reinforcement learning framework for transient load detection and prevention in the SDN-IoT. This framework establishes one agent for multipath routing optimization and another agent for malicious DDoS traffic detection and prevention in the network, with the two agents collaborating in the same environment. Typically, similar multiagent algorithms have certain instability issues, which can result in unstable training and difficulty in convergence during the training phase. Therefore, some researchers have applied transfer learning in combination with multiagent deep reinforcement learning.

Transfer reinforcement learning: Torrey et al. ^[40] incorporated transfer learning into multiagent reinforcement learning by proposing a teacher–student framework for reinforcement learning. First, an agent is trained as a teacher agent. Then, when training a second student agent for the same task, the fixed policy of the teacher agent can provide suggestions to speed up the learning process. Parisotto et al. ^[41] defined a method of multitasking and transfer learning in deep multitasking and reinforcement learning, which guides agents to take actions in different tasks through expert experience and thus accelerates the learning speed of the agents. Silva et al. ^[42] proposed a multiagent recommendation framework in which multiple agents can advise each other while learning in a shared environment.

Considering the limitations of classical heuristic algorithms for multicast routing in wireless networks, the computational complexity of intelligent algorithms, and the slow convergence speed of reinforcement learning, we draw inspiration from a previous study of multicast routing in wired networks ^[34]. To adapt to dynamic changes in the wireless network traffic while meeting QoS requirements, this paper proposes the adoption of SDWN technology to perceive global network information and designs a multi-agent based deep reinforcement learning algorithm for the construction of multicast trees. This algorithm can overcome the shortcomings of traditional wireless networks in regard to the inability to directly control and maintain the global network and solves the problem of slow convergence of single-agent multicast tree construction methods.

3. Design of SDWN intelligent multicast routing system architecture

3.1. Multicast problem description

Multicast communication, also known as multi-unicast communication, multipoint delivery, or group communication, allows information to be simultaneously transmitted to a group of specified destination addresses. Multicast datagrams are transmitted only once on a link in a network's transport layer and are only duplicated when encountering a branching link. The data flow diagram of multicast network communication is shown in Figure 1. The data flows in multicast network communication follow a tree-shaped structure called a multicast tree (or Steiner tree), where the source node src is the root of the tree and the destination nodes dst for multicasting are the leaf nodes of the tree. The optimization objective for multicast routing is to find a multicast tree that can achieve the optimal performance.

Figure 1. Illustrate of data flow directions in a multicast tree.

DownLoad: Full-Size Img PowerPoint

The optimal multicast tree corresponds to the solution of the mathematically defined Steiner tree problem, which is a classic nondeterministic polynomial time (NP)-complete problem ^[9]. Consider a weighted undirected connected graph represented by $G(V, E, w)$ , where $V$ is the set of nodes, $E$ is the set of edges, and $w$ specifies the weights of the edges. The edge $e_{ij} \in E$ between node $i$ and node $j$ in the graph has a weight $w(e_{ij})$ . Given a subset of nodes $M \subseteq V$ , where $M$ contains the source node $src$ and a set of destination nodes $DST = \{dst_1, dst_2, \cdots, dst_n \}$ for multicasting, that is, $M = DST \bigcup \{src\}$ . The graph $G'$ is a subgraph of the graph $G$ that includes the vertex set $M$ . Additionally, $G'$ contains some nodes that are not in the vertex set $M$ , which are referred to as Steiner nodes. The objective of the optimal Steiner tree problem is to find a minimum-weight spanning tree $T = (V_{T}, E_{T})$ in the graph $G'$ that contains all of the nodes in $M$ , as shown in Eq (3.1).

$\begin{equation} \mathop {\min }\limits_{T \subseteq G, M \subseteq {V_T}} {\mkern 1mu} \sum\limits_{{e_{ij}}} {w({e_{ij}})} \end{equation}$

(3.1)

where $V_T$ denotes all the nodes in tree $T$ and $E_T$ denotes all the edges of tree $T$ .

Strictly speaking, obtaining an exact optimal solution for this NP-complete multicast tree problem is extremely difficult. Existing works have discussed how to obtain an approximately optimal solution. Accordingly, an approximate treatment can be applied by decomposing the problem into a set of distinct routes from the source node to the multiple destination nodes, as shown in Eq (3.2).

$\begin{equation} T = T\left( {{p_1}, \cdots , {p_k}, \cdots , {p_n}} \right) \end{equation}$

(3.2)

where $p_k$ is the path from the source node $src$ to $dst_k$ in the multicast tree $T$ , $p_k = (V_k, E_k)$ , $V_k$ represents all nodes in the path $p_k$ , $E_k$ represents all edges in the $p_k$ , $k = 1, 2, \cdots, n$ , $dst_k$ belongs to the destination node $DST$ of the multicast tree, and $n$ is the number of destination nodes.

If each $p_k \in T$ has the minimum cost, then the multicast tree $T$ is an end-to-end minimum cost tree. Such a multicast tree can be built by constructing each $p_k$ $(src, dst_k)$ as a unicast path and then combining these paths and removing redundant links. During the implementation process of removing redundant links, when unicast paths from the source node to all destination nodes are combined to build a multicast tree, each link of each path is added to the multicast tree one by one, and before each link is added, it will judge whether the link exists in the current multicast tree; if so, the current link will not be added to the multicast tree repeatedly. Instead, the redundant link is deleted. If it does not exist, it joins the current multicast tree. By exploiting Taking advantage of the ability to use SDN technology to monitor the global network resources, this paper calculates the minimum cost $f(p_k)$ for each path using the following parameters:

$bw_k$ is the residual bandwidth of $p_k$ , which is the minimum residual bandwidth from the source node $src$ to the destination node $dst_k$ . Its definition is given in Eq (3.3).

$\begin{equation} b{w_k} = \mathop {\min }\limits_{{e_{ij}} \in p_k} \left( {b{w_{ij}}} \right) \end{equation}$

(3.3)

where $bw_{ij}$ is the remaining bandwidth of the link $e_{ij}$ between node $i$ and node $j$ .

$delay_k$ is the total delay on $p_k$ , which is expressed as the sum of the delays on all links in $p_k$ . Its definition is given in Eq (3.4).

$\begin{equation} dela{y_k} = \sum\limits_{{e_{ij}} \in p_k} {dela{y_{ij}}} \end{equation}$

(3.4)

where $delay_{ij}$ is the delay on the link $e_{ij}$ between node $i$ and node $j$ .

$loss_k$ is the packet loss rate on $p_k$ , which is calculated as shown in Eq (3.5) since the packet loss rate on some links is 0.

$\begin{equation} los{s_k} = 1 - \prod\limits_{{e_{ij}} \in p_k} {\left( {1 - los{s_{ij}}} \right)} \end{equation}$

(3.5)

where $loss_{ij}$ is the packet loss rate on the link $e_{ij}$ between node $i$ and node $j$ .

$used\_bw_k$ is the bandwidth used on $p_k$ , which is expressed as the maximum bandwidth used from the source node $src$ to the destination node $dst_k$ . It is defined as shown in Eq (3.6).

$\begin{equation} used\_b{w_k} = \mathop {\max }\limits_{{e_{ij}} \in p_k} \left( {used\_b{w_{ij}}} \right) \end{equation}$

(3.6)

where $used\_bw_{ij}$ is the bandwidth used on the link $e_{ij}$ between node $i$ and node $j$ .

$errors_k$ is the error packet rate on $p_k$ , which is calculated via Eq (3.7).

$\begin{equation} error{s_k} = 1 - \prod\limits_{{e_{ij}} \in p_k} {\left( {1 - error{s_{ij}}} \right)} \end{equation}$

(3.7)

where $errors_{ij}$ is the packet error rate on the link $e_{ij}$ between node $i$ and node $j$ .

$drops_k$ is the drop rate on $p_k$ , which is calculated via Eq (3.8).

$\begin{equation} drop{s_k} = 1 - \prod\limits_{{e_{ij}} \in p_k} {\left( {1 - drop{s_{ij}}} \right)} \end{equation}$

(3.8)

where $drops_{ij}$ is the packet drop rate on the link $e_{ij}$ between node $i$ and node $j$ .

$distance_k$ is the average distance of each link in $p_k$ . In a wireless network, the distance between access points (APs) will affect data forwarding. The average distance can be used to measure the average energy consumed by each AP node to send data and is defined in Eq (3.9) below.

$\begin{equation} distanc{e_k} = average\left( {\sum\limits_{{e_{ij}} \in p_k} {distanc{e_{ij}}} } \right) \end{equation}$

(3.9)

where $distance_{ij}$ is the distance of the link $e_{ij}$ between node $i$ and node $j$ .

The objective function $f(p_k)$ is formulated to maximize the residual bandwidth $bw_k$ and minimize the delay $delay_k$ , the packet loss rate $loss_k$ , the used bandwidth $used\_bw_k$ , the packet error rate $errors_k$ , the packet drop rate $drops_k$ and the average distance $distance_k$ between wireless APs, as shown in the Eq (3.10).

$\begin{equation} \begin{array}{c} f\left( {{p_k}} \right) = {\beta _1}b{w_{\rm{k}}} + {\beta _2}\left( {1 - dela{y_{\rm{k}}}} \right)+ {\beta _3}\left( {1 - los{s_k}} \right)\\ \\ + {\beta _4}\left( {1 - used\_b{w_k}} \right) + {\beta _5}\left( {1 - error{s_k}} \right)\\ \\ + {\beta _6}\left( {1 - drop{s_k}} \right) + {\beta _7}\left( {1 - distanc{e_k}} \right) \end{array} \end{equation}$

(3.10)

where $\beta_l$ represents the weight of parameter, and for $l = 1, 2, \cdots, 7$ . The specific design of $\beta_l$ is described in Section 4.1, which discusses the reward function design.

The optimization objective value on each path is represented by $f(p_k)$ , and the process of constructing the multicast tree consists of finding such a path for each destination node. These tasks are independent of each other, so the problem of multicast tree construction can be mathematically expressed as the multi-objective optimization problem shown in Eq (3.11).

$\begin{equation} \max F\left( T \right) = \left[ {f\left( {{p_1}} \right), \cdots , f\left( {{p_k}} \right), } \right.\left. { \cdots f\left( {{p_n}} \right)} \right] \end{equation}$

(3.11)

where $T$ is the multicast tree that implements the communication path of the multicast network, $T = (V_{T}, E_{T})$ , $p_k$ is the optimal path for each destination node, and $p_k \in T$ , $p_k = (V_{k}, E_{k})$ , that is, $V_T = V_1 \bigcup V_2 \bigcup \cdots \bigcup V_n$ , $E_T = E_1 \bigcup E_2 \bigcup \cdots \bigcup E_n$ , $n$ is the number of destination nodes.

3.2. SDWN intelligent multicast routing architecture

The SDWN-based intelligent multicast routing strategy combines SDN with wireless networking, using multiagent reinforcement learning to achieve multicast routing. By perceiving the network link state information of the wireless network, we obtain information such as the bandwidth, delay, packet loss rate, used bandwidth, packet error rate, packet drop rate, and distance between wireless access nodes in the wireless network. We use multiagent collaboration to construct multicast paths from the source node to all destination nodes and use the southbound interface of the centralized controller to issue flow tables to the switches on the paths to achieve multicast routing. With its ability to monitor the global network link state information, SDWN enables the agents to intelligently adjust these multicast routes based on dynamic changes in the network link state information.

The overall structure of the SDWN-based intelligent multicast routing strategy is shown in Figure 2, and it is explained in further detail below.

Figure 2. SDWN-based intelligent multicast routing architecture.

DownLoad: Full-Size Img PowerPoint

① The control plane periodically retrieves network status information from the data plane.

② The application plane collects raw data on the network status and processes these data into corresponding traffic matrices.

③ The knowledge plane utilizes the processed traffic matrices from the application plane.

④ Each intelligent agent is assigned a subtask of determining the best multicast routing from the source node to one or more of the destination nodes based on the link state information.

⑤ The knowledge plane stores the multicast routes.

⑥ Before the next network traffic arrives, the control plane distributes flow tables to wireless access nodes in the data plane. Finally, the data plane completes traffic forwarding.

3.2.1. Data plane

The data plane is composed of wireless access nodes (APs) and stations (STAs), which perform a set of basic tasks, such as AP-to-controller mapping, packet routing, and site migration tasks, based on instructions issued by the controller. These APs form a multi-hop wireless network by means of wireless Mesh, and a STA is connected under each AP. Each AP in the data plane operates without knowledge of the other APs in the wireless network, completely relying on the control plane, application plane, and knowledge plane to perform related operations. It periodically interacts with the controller and transmits wireless network status information to the control plane. Since we study the route construction problem at the control level and do not involve the design of the underlying rules of the data plane, we do not consider the AP joining or leaving and the mobility of STAs in the data plane in this paper.

3.2.2. Control plane

The control plane contains a centralized controller, which controls and manages the data plane through its southbound interface and constructs a global view of the network in accordance with the network flow and state information from the wireless APs in order to further realize the scheduling of the network resources. The controller also has a northbound interface through which it can interact with the knowledge plane, which facilitates the distribution and deployment of knowledge plane policies. It includes three modules: a network topology discovery module, a link information detection module, and a flow table installation module.

● Network topology discovery module: Topology discovery is performed through the OpenFlow Discovery Protocol (OFDP), in which the controller periodically sends Link Layer Discovery Protocol (LLDP) request packets to the data plane to obtain the current network topology and collect information about the connections between network devices. Specifically, the controller sends a Features-Request message to a wireless AP to request its configuration information. Upon receiving the message, the AP encapsulates its port information, MAC address information, and datapath ID information into a Features-Reply packet, which is sent to the controller. The controller then parses this packet to establish a connection with the AP. Based on the collected information, the network topology discovery module establishes associations among the network devices and infers the network topology. It also stores the collected network device status and configuration information for future use.

● Link information detection module: This module periodically sends status request packets to the devices in the data plane. When a device receives a status request message, it encapsulates its current status information (such as the sizes of sent and received data streams, the number of dropped packets, the congestion status, and the distances to APs) into a data packet and sends it to the controller. The link information detection module then receives the reply messages and parses out the original data containing the network status information from the message packets. The parsed data are also provided to the application plane for processing.

● Flow table installation module: First, the controller receives the optimal multicast routes selected by the knowledge plane through its northbound interface. Then, before the next data streams arrive, the controller uses its southbound interface to install the flow table entries and send them to the wireless APs. Finally, the data plane forwards the traffic based on the installed flow table entries.

3.2.3. Application plane

The application plane primarily handles the data processing logic between the control plane and the knowledge plane. It mainly processes the raw network status data collected from the data plane by the control plane into the network traffic matrix that the knowledge plane requires.

The raw network status data include the numbers of transmitted packets $tx_p$ and received packets $rx_p$ for each port, the numbers of transmitted bytes $tx_b$ and received bytes $rx_b$ , the numbers of dropped packets $tx_{drop}$ and $rx_{drop}$ , he numbers of erroneous packets $tx_{err}$ and $rx_{err}$ , and the duration of time $t_{dur}$ for which the port sends data. Using the collected port status data, the application plane calculates the residual bandwidth $bw_{ij}$ , used bandwidth $used\_bw_{ij}$ , packet loss rate $loss_{ij}$ , packet error rate $errors_{ij}$ , and packet drop rate $drops_{ij}$ between node $i$ and node $j$ . The residual bandwidth $bw_{ij}$ can be calculated by subtracting the used bandwidth $used\_bw_{ij}$ from the maximum bandwidth $bw_{max}$ of the link, where the used bandwidth can be derived from $tx_b$ , $rx_b$ , and $t_{dur}$ . The calculation is shown in Eqs (3.12) and (3.13).

$\begin{equation} used\_b{w_{ij}} = \frac{{\left| {\left( {t{x_{bi}} + r{x_{bi}}} \right) - \left( {t{x_{bj}} + r{x_{bj}}} \right)} \right|}}{{{t_{durj}} - {t_{duri}}}} \end{equation}$

(3.12)

$\begin{equation} b{w_{ij}} = b{w_{max}} - used\_b{w_{ij}} \end{equation}$

(3.13)

where $tx_{bi}$ and $tx_{bj}$ represent the numbers of bytes transmitted by node $i$ and node $j$ , respectively; $rx_{bi}$ and $rx_{bj}$ represent the number of bytes received by node $i$ and node $j$ , respectively; and $t_{duri}$ and $t_{durj}$ represent the durations of data transmission by the ports of node $i$ and node $j$ , respectively.

The packet loss rate $loss_{ij}$ of the link is then calculated from the number of sent packets $tx_p$ and the number of received packets $rx_p$ , as shown in Eq (3.14).

$\begin{equation} los{s_{ij}} = \frac{{t{x_{pi}} - r{x_{pj}}}}{{t{x_{pi}}}} \end{equation}$

(3.14)

where $tx_{pi}$ is the number of packets sent by node $i$ and $rx_{pj}$ is the number of packets received by node $j$ .

The drop rate $drops_{ij}$ and error rate $errors_{ij}$ are calculated from the numbers of packets dropped when sending $(tx_{drop})$ and receiving $(rx_{drop})$ and the numbers of packets with errors when sending $(tx_{err})$ and receiving $(rx_{err})$ , respectively, as shown in Eqs (3.15) and (3.16).

$\begin{equation} drop{s_{ij}} = \frac{{t{x_{dropi}} + r{x_{dropj}}}}{{t{x_{pi}} + r{x_{pj}}}} \cdot 100\% \end{equation}$

(3.15)

$\begin{equation} error{s_{ij}} = \frac{{t{x_{erri}} + r{x_{errj}}}}{{t{x_{pi}} + r{x_{pj}}}} \cdot 100\% \end{equation}$

(3.16)

where $tx_{dropi}$ and $tx_{erri}$ represent the numbers of dropped and erroneous packets sent by node $i$ , and $rx_{dropj}$ and $rx_{errj}$ represent the numbers of dropped and erroneous packets received by node $j$ .

The raw network status data also include the round-trip delays $RTT_{rs}$ and $RTT_{rd}$ between the SDN controller and the source and destination switches, respectively, which are obtained by the SDN controller by means of the LLDP protocol and echo requests with timestamps ^[43]. The raw status data also includes the forward transmission delay $T_{fwd}$ and the reply transmission delay $T_{reply}$ among the three; in detail, $T_{fwd}$ is the total transmission delay from the controller to the source switch, from the source switch to the destination switch, and then from the destination switch back to the controller, and $T_{reply}$ is the reverse reply delay. Using the above data, the correct delay $delay_{ij}$ between the two switches can be calculated as shown in Eq (3.17).

$\begin{equation} dela{y_{ij}} = \frac{{\left( {{T_{fwd}} + {T_{reply}} - RT{T_{rs}} - RT{T_{rd}}} \right)}}{2} \end{equation}$

(3.17)

In addition, the distance $distance_{ij}$ between two wireless APs can be calculated based on their deployment coordinates. Since these parameters have different units of measurement, to avoid one parameter having a disproportionate impact on the others, the max-min normalization method ^[44] is used to normalize these parameters, as shown in Eq (3.18). In this way, all parameters are scaled to within the range of [0, 1].

$\begin{equation} {m_{ij}} = \frac{{{m_{ij}} - \min \left( {TM} \right)}}{{\max \left( {TM} \right) - \min \left( {TM} \right)}} \end{equation}$

(3.18)

where $m_{ij}$ is the normalized value of element of the parameter matrix between node $i$ and node $j$ . $\max(TM)$ and $\min(TM)$ are the maximum and minimum values in the parameter matrix, respectively.

After the calculation and normalization of these parameters, the traffic matrices required for designing the state spaces of the intelligent agents in the knowledge plane are obtained. This allows the intelligent agents to use more comprehensive network status information for learning.

3.2.4. Knowledge plane

The knowledge plane is a core module added to the SDWN architecture, and the multicast routing algorithm proposed in this paper runs on this plane. In the knowledge plane, multicast path calculation is performed through multi-agent cooperation. The knowledge plane obtains the processed traffic matrices from the application plane and converts them into training data for the agents. After training, the reward values obtained by the agents converge, that is, each agent uses these traffic matrices to seek its optimal execution strategy. The construction of the multicast paths is completed through the cooperation of multiple agents, and the multicast paths are then sent to the control plane via the northbound interface. The scheme for constructing a multicast tree through multi-agent cooperation in the knowledge plane is based on the formal description of the multicast problem given in Section 3.1. The problem of constructing a multicast tree with the minimum end-to-end costs is decomposed into the construction of multiple unicast paths from the source node to individual destination nodes. The final multicast tree is simply a collection of such unicast paths. To construct such a multicast tree, we abstract the construction process as an MDP ^[45]. The state space, action space, and reward function of each intelligent agent are designed using the global network topology and link state information. This is illustrated in Figure 3.

Figure 3. Illustrate of multi-agent cooperation for building a multicast tree.

DownLoad: Full-Size Img PowerPoint

First, the multicast routes from the source node $src$ to the multicast destination node set $DST$ , $DST = \{dst_1, \cdots, dst_6\}$ , are decomposed into 6 unicast routes $\{(src, dst_1), \cdots, (src, dst_6)\}$ . Then, these routes are randomly partitioned among three subtasks $\{(src, dst_2), (src, dst_3)\}$ , $\{(src, dst_1), (src, dst_5)\}$ , and $\{(src, dst_4), (src, dst_6)\}$ . These subtasks are randomly assigned to three different agents, i.e., $agent_1$ , $agent_2$ , and $agent_3$ , for completion to obtain the unicast paths $\{path_1, path_2, \cdots, path_6 \}$ for the corresponding destination nodes. Remove redundant links and ultimately obtain a multicast tree completed by multiple agents working together.

4. MADRL-MR: An intelligent multicast routing algorithm with multi-agent deep reinforcement learning

The flowchart of the MADRL-MR algorithm is shown in Figure 4. First, SDWN technology is used to obtain the topology and link state information of the wireless network, and form an environment with which the intelligent agents can interact is formed based on the generated network topology and traffic matrices. A set of pretrained unicast routing agent weights for all nodes is also generated. Each agent in MADRL-MR loads these pretrained weights and interacts with the established environment to obtain the current state. If the state is not a terminal state, then the agent generates an action and interacts with the environment again to obtain the next state and reward. This process is repeated until all agents reach the goal state, that is, the optimal multicast routing paths are generated to construct the multicast tree from the source to all destination nodes.

Figure 4. Flowchart of the MADRL-MR algorithm.

DownLoad: Full-Size Img PowerPoint

Each agent in MADRL-MR uses the Advantage Actor–Critic (A2C) algorithm ^[46] as its core architecture, as shown in . The actor is the policy-based neural network, and the critic is the value-based neural network. A2C is a reinforcement learning method that combines policy gradients and temporal difference learning. It uses an on-policy learning approach to interactively learn from the environment. The learning process involves a series of actions taken to proceed from the source node to the destination node. The selection of each action (i.e., from the current state to the next state) generates a probability distribution for the selection of all possible actions, yielding a policy $\pi$ . The initial actor interacts with the environment to collect data, and based on these data, the value function is estimated using the temporal difference (TD) method. The critic judges the goodness of the selected action in the current state and then updates the policy $\pi$ based on the value function. Finally, a policy will be trained to select the action with the highest reward value in each state.

Figure 5. Structure of A2C.

DownLoad: Full-Size Img PowerPoint

The following description of the proposed algorithm starts with the design of the state space, action space, and reward function for each agent and a detailed analysis of the policy gradient update process of A2C. Finally, the designed multiagent training method is introduced, and how transfer learning is used to accelerate the convergence of the designed multiagent algorithm is described.

4.1. Designed reinforcement learning agents

The MADRL-MR algorithm represents an extension from single-agent to multi-agent reinforcement learning. The design of each reinforcement learning agent includes its state space, action space, reward function, and internal structure, which are identical for all agents in the multi-agent system. Here, we introduce the design of the reinforcement learning agents.

4.1.1. State space

The state space is a description of the environment and the agent, and the agent can obtain the current state by observing the environment. In the reinforcement learning problem of interest here, the environment consists of the data plane, and the agent's state space is composed of the link state information and the constructed paths from the current source node to all destination nodes in the data plane. We transform this information into a multi-channel matrix $G_T$ , which consists of eight matrices corresponding to different channels, as shown in Figure 6.

Figure 6. The designed state matrix of each agent.

DownLoad: Full-Size Img PowerPoint

As illustrated in , we transform each of the seven types of data collected from the data plane, namely, $bw$ , $delay$ , $loss$ , $used\_bw$ , $errors$ , $drops$ and $distance$ , into an adjacency matrix of a weighted undirected graph, resulting in seven traffic matrices with different weights. Here, $n$ is the number of nodes in the network topology. We also express the constructed multicast tree as a symmetric matrix $M_{tree}$ , where $src$ is the source node and $\{dst_1, dst_2, dst_3 \}$ are the destination nodes of the tree. If the matrix element corresponding the edge between node $i$ and node $j$ is set to 1, this indicates that this edge is present in the multicast tree.

The set of all possible changes to $G_T$ is the state space $\mathcal{S}$ . Each state corresponds to a multi-channel matrix. A state transition corresponds to adding a new link to a path, i.e., adding a new link between two nodes. Based on the changes in $G_T$ , the current state $s_t$ is transformed into the next state $s_{t+1}$ . When paths to all destination nodes have been found (i.e., when the multicast tree has been constructed), $s_{t+1}$ is set to the terminal state, i.e., $s_{t+1} = None$ .

4.1.2. Action space

The action space is the set of actions that an agent can take based on its observation of the current state. In this article, the wireless AP nodes (i.e., the possible next hops) in the data plane are regarded as actions, i.e., $\mathcal{A} = \{a_1, a_2, \cdots, a_i, \cdots, a_n \} = \{N_1, N_2, \cdots, N_i, \cdots, N_k\}$ , where $N_k$ represents AP node $i$ , for $i = 1, 2, \cdots, n$ , and $a_i$ corresponds to $N_i$ . Taking a certain action means adding that action (wireless AP node) to the path from the source node to the destination node. For each state $s_i \in \mathcal{S}$ in the state space, the agent can take any action $a \in \mathcal{A}$ , and the execution of action $a$ will result in a change in the state. In theory, all nodes can be considered as actions, but not all actions can be executed. To meet the requirements of the input tensor for a neural network, some invalid actions will be generated when selecting actions. Suppose that the agent selects a node as an action $a_t$ that is not adjacent to any node in the current state $s_t$ . If the resulting state $s_{t+1}$ generated by interacting with the environment does not advance the construction of the path, then this node cannot be added to the multicast tree. Therefore, the agent's valid actions correspond to the set of adjacent nodes in the current state, i.e., the degree $dr(N_i)$ of the nodes in that state.

4.1.3. Reward function

The reward function guides the agent to choose the maximum reward in order to obtain the optimal policy. It measures the value of a certain action taken by the agent in a certain state, thus helping the agent evolve toward an optimal policy. The optimization objective of maximizing the remaining bandwidth and minimizing the delay, packet loss rate, used bandwidth, packet error rate, packet drop rate, and distance between APs is communicated to the agent through the reward function. At each time step, the agent selects an action $a_t$ in the current state $s_t$ based on its policy $\pi$ , and the environment responds to this action, resulting in a state transition to $s_{t+1}$ and the agent receiving a reward value $r_{t+1}$ . When an agent interacts with the environment, it may select either valid or invalid actions. A valid action can lead to a process state, a normal state change, or a terminal state. Thus, there are four possible outcomes that can arise from the interaction between an agent and the environment: a process state ( $PART$ ), an invalid action ( $HELL$ ), a loop ( $LOOP$ ), and a terminal state ( $END$ ).

● Process state $PART$ : When the agent executes a valid action and adds a new node to the path, the state transitions to a non-terminal process state, and the agent updates its policy to continue exploring and learning. The reward value is $R_{part}$ (Eq (4.1)). To adapt to the dynamic change of network link information and enable the agent to select the optimal combination of actions, we calculate the reward value based on the remaining bandwidth $bw_{ij}$ , the delay $delay_{ij}$ , the packet loss rate $loss_{ij}$ , the used bandwidth $used\_bw_{ij}$ , the packet error rate $errors_{ij}$ , the packet drop rate $drops_{ij}$ , and the distance $distance_{ij}$ between node $i$ and node $j$ on the network link. The weighting factors of these parameters are denoted by $\beta_l \in [0, 1]$ , for $l = 1, 2, \cdots, 7$ . These parameters are all normalized to [0, 1] using the max-min method, for which the specific calculation is shown in (3.18).

$\begin{equation} \begin{array}{c} {R_{part}} = {\beta _1}b{w_{ij}} + {\beta _2}\left( {1 - dela{y_{ij}}} \right) + {\beta _3}\left( {1 - los{s_{ij}}} \right)\\ \\ + {\beta _4}\left( {1 - used\_b{w_{ij}}} \right) + {\beta _5}\left( {1 - error{s_{ij}}} \right)\\ \\ + {\beta _6}\left( {1 - drop{s_{ij}}} \right) + {\beta _7}\left( {1 - distanc{e_{ij}}} \right) \end{array} \end{equation}$

(4.1)

● Invalid action $HELL$ : When the agent selects an invalid action, that is, selects a non-neighbor node or a node that is already in the multicast tree, the action will not be executed, and the state will remain unchanged. To discourage the agent from selecting invalid actions, a fixed penalty value of $R_{hell} = C_1$ is given in this case.

● Loop state $LOOP$ : In addition to process states, there is also a certain probability that executing a valid action will cause the path to form a loop. In this case, although the chosen action is a neighbor node of the current state, once the action is executed, the agent will be trapped in a loop and unable to explore further to find the optimal path. Therefore, the state is rolled back to $s_t$ , and a fixed penalty value of $R_{loop} = C_2$ is given.

● Terminal state $END$ : When an action is executed and the paths from the source node to all destination nodes have been found, that is, the multicast tree has been constructed, the state is set to a terminal state, that is, $s_{t+1} = None$ . In this state, the reward function of each agent calculates the reward value using the network link state of its own unicast path, as shown in Eq (4.2).

$\begin{equation} \begin{array}{c} {R_{end}} = {\beta _1}b{w_{\rm{k}}} + {\beta _2}\left( {1 - dela{y_{\rm{k}}}} \right) + {\beta _3}\left( {1 - los{s_k}} \right) \\ \\ + {\beta _4}\left( {1 - used\_b{w_k}} \right) + {\beta _5}\left( {1 - error{s_k}} \right)\\ \\ + {\beta _6}\left( {1 - drop{s_k}} \right) + {\beta _7}\left( {1 - distanc{e_k}} \right) \end{array} \end{equation}$

(4.2)

4.1.4. A2C network parameter update

The actor network serves as a policy function $\pi_\theta(a|s)$ , where the policy is parameterized as a neural network with $\theta$ representing its parameters. Given the current state, the network outputs the next action to be taken. The training objective of the network is to maximize the expected cumulative reward. The policy gradient for this network is given by Eq (4.3).

$\begin{equation} \begin{array}{c} \nabla J\left( \theta \right) = \frac{1}{N}\sum\limits_{n = 1}^N {\sum\limits_{t = 1}^{{T_n}} {\left( {{Q^{{\pi _\theta }}}\left( {s_t^n, a_t^n} \right)} \right.} } \left. { - {V^{{\pi _\theta }}}\left( {s_t^n} \right)} \right)\nabla \log {\pi _\theta }\left( {a_t^n|s_t^n} \right) \end{array} \end{equation}$

(4.3)

where $T_n$ is the maximum number of steps interacting with the environment, $Q^{\pi_\theta}(s_t^n, a_t^n)$ is the expected cumulative return and $V^{\pi_\theta} (s_t^n)$ is the expected value of $Q^{\pi_\theta} (s_t^n, a_t^n)$ resulting from performing all actions in the state $s_t^n$ .

In actor–critic (AC) algorithms, high variance can occur because not all actions with positive rewards in a single action trajectory may necessarily be optimal; they could instead be suboptimal. To address this issue, A2C introduces a baseline, represented by $V^\pi(s)$ , which is subtracted from the original reward value to calculate the advantage function, as shown in Eq (4.4).

$\begin{equation} {A^\theta }\left( {s_t^n, a_t^n} \right) = {Q^{{\pi _\theta }}}\left( {s_t^n, a_t^n} \right) - {V^{{\pi _\theta }}}\left( {s_t^n} \right) \end{equation}$

(4.4)

From (4.4), it can be seen that two types of estimates are needed: the action–value function estimate $Q^{\pi_\theta} (s_t^n, a_t^n)$ and the state–value function estimate $V^{\pi_\theta} (s_t^n)$ . The expected Q-value calculation method is based on the equation $Q^{\pi_\theta} (s_t^n, a_t^n) = E[r_t+\gamma V(s_{t+1})]$ , where $\gamma$ is a discount factor satisfying $\gamma \in [0, 1]$ . Since the next state $s_{t+1}$ is updated in the next time step after an action is taken and the reward $r_t$ is obtained in the current time step, the Q-value is calculated as the expected value of the reward plus the discounted value of the next state by introducing the TD error method, as shown in Eq (4.5).

$\begin{equation} \begin{array}{l} {Q^{{\pi _\theta }}}\left( {s_t^n, a_t^n} \right) = r_t^n + \gamma {V^{{\pi _\theta }}}\left( {s_{t + 1}^n} \right)\\ \\ T{D_{error}} = r + \gamma V\left( {{s_{t + 1}}} \right) - V\left( {{s_t}} \right)\\ \\ {A^\theta }\left( {s_t^n, a_t^n} \right) = r_t^n + \gamma {V^{{\pi _\theta }}}\left( {s_{t + 1}^n} \right) - {V^{{\pi _\theta }}}\left( {s_t^n} \right) \end{array} \end{equation}$

(4.5)

The critic network calculates the value function $V^\pi (s)$ , which represents the future payoff that the agent can expect from state $s$ , and estimates the value function of the current policy from this expected payoff, that is, it evaluates the goodness of the actor network. With the help of the value function, an AC algorithm can perform a single-step parameter update without waiting until the end of the round. The value function is calculated as shown in Eq (4.6).

$\begin{equation} {V^\pi }\left( s \right) = {E_\pi }\left\{ {{r_t}|{s_t} = s} \right\} \end{equation}$

(4.6)

The critic network parameters $\omega$ are updated using the mean squared error loss function through backward propagation of the gradient, where the mean squared error loss function is shown in Eq (4.7).

$\begin{equation} MSE = {\sum {\left( {r + \gamma {V^\pi }\left( {{s_{t + 1}}} \right) - {V^\pi }\left( {{s_t}, \omega } \right)} \right)} ^2} \end{equation}$

(4.7)

According to the policy gradient formula analyzed above, Equation (4.8) for updating the parameters of the policy function of the actor network is obtained by combining the comparative advantage function of A2C with the TD method, where $\alpha_1$ is the learning rate.

$\begin{equation} \begin{array}{r} \theta = \theta + {\alpha _1}\frac{1}{N}\sum\limits_{n = 1}^N {\sum\limits_{t = 1}^{{T_n}} {\left( {{Q^{{\pi _\theta }}}\left( {s_t^n, a_t^n} \right)} \right.} } \left. { - {V^{{\pi _\theta }}}\left( {s_t^n} \right)} \right)\nabla \log {\pi _\theta }\left( {a_t^n|s_t^n} \right) \end{array} \end{equation}$

(4.8)

4.1.5. Multi-agent training strategy for MADRL-MR design

Four main challenges are encountered in multi-agent reinforcement learning: non-stationarity of training, scalability, partial observability, and privacy and security. Based on the above challenges, multi-agent training methods can be divided into three main types: fully decentralized (IL) training, fully centralized training, and centralized training and decentralized execution (CTDE) ^[47]. Although fully centralized training alleviates the issues of partial observability and non-stationarity, it is not feasible for large-scale and real-time systems. Moreover, since the CTDE method relies on a centralized control unit that collects training information from each agent, it also has difficulty scaling to environments with large numbers of agents. With an increasing number of agents, a centralized critic network will suffer from increasingly high variance, and the value function will have difficulty converging. Therefore, a fully decentralized training method is adopted in this paper, as shown in Figure 7.

Figure 7. Fully decentralized training.

DownLoad: Full-Size Img PowerPoint

This method is a direct extension from the single-agent scenario to the multiple-agent scenario, in which each agent independently optimizes its policy without considering non-stationarity issues. To address the convergence challenges of this method for the agents, we adopt the strategy of transfer reinforcement learning. In practice, the IL method has achieved satisfactory results for several resource allocation and control problems in wireless communication networks ^[48,49,50].

4.1.6. Reinforcement learning with transfer learning mechanisms

To accelerate the training speed of the multi-agent system and address the issues of instability and difficulty in convergence during the training phase, this paper applies transfer learning in combination with reinforcement learning. Transfer learning (TL) allows knowledge acquired from experts or other processes to be transferred to the current task, which accelerates learning. The applications of transfer learning in reinforcement learning can be divided into the following three main categories depending on the transfer setting ^[51]: 1) fixed-domain transfer from a single source task to a target task, 2) fixed-domain migration across multiple source tasks to target tasks, and 3) transfer between source and target tasks in different domains.

This paper adopts the second approach, which involves fixed-domain transfer across multiple source tasks to a target task in the same task domain. Specifically, a pretraining process is conducted to obtain the initial weights of an intelligent agent for single-broadcast routing that covers all source nodes and destination nodes with the same state space, action space, and reward function. In MADRL-MR, each intelligent agent loads these initial weights before learning, in a process called knowledge transfer, to reduce ineffective exploration at the beginning of training. Then, during the training process, the algorithm parameters are adjusted based on the different tasks of the multiple intelligent agents to accelerate their convergence. This approach aims to enable stable coordination among multiple intelligent agents and make them more adaptable to dynamic changes in the network link information.

4.2. Design of the MADRL-MR algorithm

In the MADRL-MR algorithm, the paths between the input source node $src$ and the destination nodes $DST$ are first divided among several subtasks such that each agent is assigned different tasks of establishing paths from $src$ to multiple destination nodes $dst\in DST$ . Second, based on the current network topology $graph$ (the environment), each agent learns the optimal unicast paths from $src$ to multiple $dst$ nodes. Finally, once all agents have completed their tasks, the learned paths from $src$ to all $dst$ nodes are combined to obtain the optimal multicast tree from $src$ to $DST$ in $graph$ . The detailed implementation of the MADRL-MR algorithm for multicast routing with multiagent deep reinforcement learning is shown in Algorithm 1.

Algorithm 1 MADRL-MR
Require: network topology $G(V, E)$ , traffic matrix $TM$ , source and multicast destination node $(src, DST)$ , weight factor $\beta_l, l = 1, 2, ...7$ , actor learning rate $\alpha_1$ , critic learning rate $\alpha_2$ , reward discount factor $\gamma$ , batch-size $k$ , update frequency $update_{time}$ , number of agents $n$ , training episodes $episodes$ , pre-training weights of actor and critic $\hat{\theta}$ and $\hat{\omega}$
Ensure: optimal multicast tree for $tree(src, DST)$
1: Initialize actor network $\theta$
2: Initialize critic network $\omega$
3: Initialize buffer capacity $B$
4: Load pre-training weights $\theta = \hat{\theta}$ , $\omega = \hat{\omega}$
5: Assign destination nodes to each agent randomly and equally
6: for $episode \leftarrow 1$ to $episodes$ do
7: for $TM$ in Network Information Storage do
8: Reset environment with $(src, DST)$
9: The agent obtains the initial state $s_t$
10: while True do
11: Choose an action $a_t$ from $s_t$ by sampling the output action probability
12: Execute action $a_t$ and observe reward $r_t$ and next state $s_{t+1}$
13: Store $(s_t, a_t, r_{t}, s_{t+1})$ in $B$
14: if $len(B)\geq k$ then
15: for $i \leftarrow 1$ to $update_{time}$ do
16: Sample batch $k$ data
17: Enter $data(s_t)$ and $data(s_{t+1})$ in the critic network to get $V^\pi(s_t)$ and $V^\pi(s_{t+1})$
18: Calculate $TD_{error}$ $TD_{error} \leftarrow r+\gamma V^\pi(s_{t+1})-V^\pi(s_t)$
19: Calculate MSE as gradient update of critic network parameters $\omega$ . $MSE \leftarrow \sum(r+\gamma V^\pi(s_{t+1})-V^\pi(s_t, \omega))^2$
20: Update actor network parameters $\theta$ according to Eq (4.8)
21: Empty buffer $B$
22: end for
23: end if
24: if done then//The path of all destination nodes has been found
25: Build a multicast tree
26: Break
27: end if
28: $s_t \leftarrow s_{t+1}$
29: end while
30: end for
31: end for

The algorithm takes as input a network topology $G(V, E)$ , a traffic matrix $TM$ , the source node and destination nodes $(src, DST)$ for multicasting, and hyperparameters for the reinforcement learning algorithm. The output of the algorithm is an optimal multicast tree from the source node $src$ o the set of destination nodes $DST$ . Lines 1–3 initialize the actor network parameters, the critic network parameters, and the experience buffer, respectively, which are discarded after each update. Line 4 uses transfer learning technology to load the pretrained agent weights for all source nodes to all destination nodes obtained before the start of training, i.e., performs knowledge transfer. On line 5, the destination nodes in the multicast group are equally and randomly split among several subtasks according to the number of input agents, and one subtask is assigned to each agent. Lines 8 and 9 initialize the environment to obtain the initial state $s_t$ . Lines 11–13 input the state $s_t$ into the actor network and select an action $a_t$ based on importance sampling. Then, the selected action is performed to interact with the environment to obtain the reward value $r_t$ and the next state $s_{t+1}$ , and the experience $(s_t, a_t, r_t, s_{t+1})$ is stored in the experience buffer. Lines 14–18 involve learning from the experiences stored in the experience buffer by inputting $s_t$ and $s_{t+1}$ into the critic network to obtain $V^\pi (s_t)$ and $V^\pi (s_{t+1})$ . The TD error, $TD_{error}$ , is then calculated using (4.5). Line 19 calculates the mean squared error loss function (4.7) to be used for the gradient update of the critic network parameters $\omega$ . Line 20 updates the actor network parameters $\theta$ according to (4.8) based on $V^\pi (s_t)$ , $V^\pi (s_{t+1})$ and $TD_{error}$ . Line 21 clears the experience buffer. Lines 24–27 judge whether all agents have found the desired paths from the source node to the destination nodes and obtain the optimal multicast tree by removing redundant links from these paths. Finally, line 28 updates the state $s_t$ to proceed to the next episode.

5. Experimental setup and performance evaluation

This section describes the experimental settings used in this study and the corresponding performance evaluation. First, the experimental environment is introduced. Second, the performance metrics for algorithm evaluation are defined. Then, the tuning and setting of the algorithm hyperparameters during the experimental process are described. Finally, the comparison experiments were carried out with the Double Duel Deep Q-network (DQN) using preferential experience replay and the classical Steiner tree construction algorithm KMB, and the results of the comparison experiments were discussed respectively.

5.1. Experimental environment

For the experimental environment in this study, we used Mininet-WIFI 2.3.1b as the simulation platform for the SDWN network. Mininet-WIFI ^[52] is a branch of the Mininet SDN network emulator that extends the functionality of Mininet by adding virtualized wireless APs based on standard Linux wireless drivers and the 80211_hwsim wireless simulation driver. The SDWN controller used in the experiment is Ryu 4.3.4 ^[53]. The experiment was conducted on a server with hardware consisting of a 64-core processor and a GeForce RTX 3090 graphics card and with Ubuntu 18.04.6 as the software environment. The Iperf ^[54] tool was used to send User Datagram Protocol (UDP) packets.

The design of the wireless network topology graph is inspired by the literature ^[11]. We designed three network topologies a network topology consisting of 10, 14 and 21 wireless nodes, alias Node10Net, Node14Net, and Node21Net, respectively, 14 wireless nodes to test the performance of MADRL-MR, as shown in Figure 8. The parameters of the network links were randomly generated following a uniform distribution. The ranges of the random link bandwidth and delay values were 5–40 Mbps and 1–10 ms, respectively, while the distances between wireless APs were set within the range of 30–120 m.

Figure 8. Wireless network topology. (a) 10 nodes topology whose alias is Node10Net. (b) 14 nodes topology whose alias is Node14Net. (c) 21 nodes topology whose alias is Node21Net.

DownLoad: Full-Size Img PowerPoint

To more accurately simulate a real environment, we used the Iperf traffic generator tool to simulate the network traffic situation 24 hours a day, as shown in Figure 9. The horizontal axis represents time, and the vertical axis represents the average traffic sent by each node in units of Mbit/s. The traffic distribution conforms to a typical network traffic distribution at different times of day.

Figure 9. Flows sent by Iperf.

DownLoad: Full-Size Img PowerPoint

5.2. Performance metrics

As performance indicators, we use the convergence status of the intelligent agents' reward values as well as commonly used performance metrics for routing, such as the instantaneous throughput, delay, and packet loss rate. In addition, we use the remaining bandwidth, tree length, and average distance between wireless APs in the multicast tree as evaluation metrics for our algorithm.

1) The calculation of the reward value is described in Section 4.1, specifically, the reader is referred to the formula for the reward function.

2) For the three commonly used evaluation metrics of instantaneous throughput, delay, and packet loss rate, based on the simulated 24-hour network traffic, we use their average values in different time periods to represent the network performance, as shown in Eq (5.1).

$\begin{equation} \begin{array}{c} \overline {throughput} = \frac{{\sum\nolimits_i {\sum\nolimits_j {throughpu{t_{ij}}} } }}{{\Delta t}}\\ \\ \overline {delay} = \frac{{\sum\nolimits_i {\sum\nolimits_j {dela{y_{ij}}} } }}{{\Delta t}}\\ \\ \overline {loss} = \frac{{\sum\nolimits_i {\sum\nolimits_j {los{s_{ij}}} } }}{{\Delta t}} \end{array} \end{equation}$

(5.1)

where $avg\_throughput$ , $avg\_delay$ , and $avg\_loss$ represent the average throughput, average delay and average packet loss rate over a time duration $\Delta t$ time, respectively, and $throughput_{ij}$ is the throughput from node $i$ to node $j$ .

3) For the remaining bandwidth, tree length, and average distance between wireless APs in the multicast tree, we use multiple measurements and obtain the average value as the corresponding evaluation metric, as shown in Eq (5.2).

$\begin{equation} \begin{array}{c} \overline {b{w_{tree}}} = average\frac{{\sum\nolimits_n {\sum\nolimits_{ij \in tree} {b{w_{ij}}} } }}{{n \cdot E}}\\ \\ \overline {le{n_{tree}}} = average\frac{{\sum\nolimits_n {le{n_{tree}}} }}{n}\\ \\ \overline {dis{t_{tree}}} = average\frac{{\sum\nolimits_n {\sum\nolimits_{ij \in tree} {distanc{e_{ij}}} } }}{{n \cdot E}} \end{array} \end{equation}$

(5.2)

where $\overline {b{w_{tree}}}$ and $\overline {dis{t_{tree}}}$ represent the average remaining bandwidth per link in the multicast tree and the average distance between wireless APs in the multicast tree, respectively. $\overline {le{n_{tree}}}$ is the average length of the multicast tree after multiple measurements. $bw_{ij}$ and $distance_{ij}$ are the remaining bandwidth and the distance, respectively, from node $i$ to node $j$ in the multicast tree. $n$ is the number measurements performed at a given time. $E$ is the number of edges in the multicast tree.

5.3. Transfer learning performance and parameter settings

First, the impact of transfer learning on the convergence of the multi-agent SDWN-based intelligent multicast routing algorithm is analyzed, as shown in Figure 10. The convergence of the reward values is significantly faster with transfer learning than without, and the reward values are also higher with transfer learning. When transfer learning is used in reinforcement learning, a set of pretrained initial weights for connecting all source nodes to all destination nodes in different environments is loaded before each agent starts learning. This is also known as knowledge transfer and endows the agents with some decision-making ability at the beginning of training. In this way, the agents can reach convergence faster than they would if each agent needed to learn from scratch, and it also solves the problem of slower convergence with an increasing number of agents. Therefore, applying transfer learning in multiagent reinforcement learning endows the MADRL-MR intelligent multicast routing algorithm with a stronger learning ability, enables the agents to learn efficient behaviors more quickly, and accelerates the convergence of the reward values. This confirms that transfer learning can improve the performance of the MADRL-MR algorithm.

Figure 10. Comparison between reward values achieved with and without TL.

DownLoad: Full-Size Img PowerPoint

The most important prerequisite for using deep reinforcement learning to select the optimal multicast routes is the setting of the hyperparameters. We use a multicast group with a complex set of possible paths as a representative example, with node 3 as the source node and multicast destinations of $\{6, 7, 8, 9, 11, 13\}$ . The more complex the set of possible paths to each destination node is, the more choices the agent can explore, and since the multicast tree constructed from the source node to all multicast destination nodes is not unique, testing the effectiveness of the algorithm becomes more challenging with more complex path situations.

The hyperparameter settings for the reward values $R_{part}$ , $R_{hell}$ , $R_{loop}$ , and $R_{end}$ will affect the convergence speed of the agents. If these hyperparameters are not set appropriately, the agents' reward values may fail to converge, i.e., the agents may be unable to find the optimal strategy.

The first step is to set the penalty values $R_{hell}$ and $R_{loop}$ . Since the selected actions may be invalid or create loops, this will greatly affect the construction of the multicast tree. Too many invalid actions and loops will affect the convergence speed of the agents and may even lead to nonconvergence. Therefore, the setting of the penalty values is crucial. To reduce the influence of the other two reward values on the setting of the penalty values, we set the weighting factors in the calculation formulas of $R_{part}$ and $R_{end}$ to 1. Additionally, since all reward calculation parameters are normalized to [0, 1], we initially set both penalty values to -1. We then evaluated the performance achieved under various settings of these two penalty values based on the approximate round when convergence began and the total reward value and adjusted the penalty values multiple times accordingly. The results are shown in Table 1.

Table 1. Penalty setting.

$R_{hell}$	$R_{loop}$	Episode	Reward
-1	-1	450	-23
-1	-0.7	670	-30
-1	-0.5	390	12
-1	-0.1	810	-15
-0.7	-0.5	250	22
-0.5	-0.5	380	-9
-0.1	-0.5	460	-5

| Show Table

DownLoad: CSV

In multiple rounds of adjustment, we first fixed the value of $R_{hell}$ to -1 and adjusted the value of $R_{loop}$ . The results showed that with $R_{loop} = -0.5$ , convergence started at approximately the 390th episode, with a converged value of approximately 12. Second, we fixed $R_{loop}$ at 0.5 and adjusted $R_{hell}$ . It was found that $R_{hell} = -0.7$ gave the best result, with the agents starting to converge at approximately the 250th episode and achieving a convergence value of 22, which was the best result among the tested parameter settings. Therefore, we set the values of $R_{hell}$ and $R_{loop}$ to -0.7 and -0.5, respectively.

Next, $R_{part}$ and $R_{end}$ need to be set because the reward values for these two cases are calculated based on the traffic matrix of the network links, as shown in (4.1) and (4.2). The design of these two reward values mainly involves setting the weight ratios of the seven network link parameters. A different weighting factors should be set for each network link parameter to represent the influence of that parameter on the construction of the multicast tree. We set the weights for the seven parameters, namely, remaining bandwidth, delay, packet loss rate, used bandwidth, packet error rate, packet drop rate, and distance between APs, to [0.7, 0.3, 0.1, 0.1, 0.1, 0.1, 0.1]. In detail, since our goal is to construct a multicast tree with the main influencing factors being the remaining bandwidth, delay, and packet loss rate, we set the weights of the first three parameters to 0.7, 0.3, and 0.1, respectively. The remaining parameters, namely, the used bandwidth, packet error rate, packet drop rate, and distance between APs, are equally important in building an optimal multicast tree, but compared to the first three parameters, we consider them to be supplementary factors. Therefore, we set the weights of the used bandwidth, packet error rate, packet drop rate, and distance between APs all to 0.1. Thus, the initial weights of all parameters are [0.7, 0.3, 0.1, 0.1, 0.1, 0.1, 0.1].

To evaluate the efficacy of these parameter settings, we first set all parameter weights to 1 and conducted comparative experiments. As shown in Figure 11(a), weights of [0.7, 0.3, 0.1, 0.1, 0.1, 0.1, 0.1] achieve better convergence and yield higher reward values compared to setting all parameter weights to 1.

Figure 11. The weight factor settings of the

$R_{part}$ and

$R_{end}$ reward functions uniformly compare the weights we set [0.7, 0.3, 0.1, 0.1, 0.1, 0.1, 0.1], where (a) is a comparison chart with all weights of 1; (b) is a comparison chart with adjustments to the weights of the last four parameters; (c) sets the first three parameters to 0.3, 0.3, 0.6, and the remaining 0.1; (d) is a comparison chart with increasing the delay weight of 0.7 and decreasing the packet loss weight of 0.1; (e) A comparison chart with the first three parameters set to 0.1, 0.3, 0.7; (f) a comparison chart with the first three parameters set to 0.7, 0.1, 0.3.

DownLoad: Full-Size Img PowerPoint

Then, to test the influence of the last four parameters on multicast tree construction, we set the parameters to [0.1, 0.1, 0.1, 0.7, 0.7, 0.6, 0.1] and compared the results with those of the initial weight ratios we set. As shown in Figure 11(b), the initial weight ratios we set show more stable convergence, so we set the weights of the last four parameters smaller.

Next, we tested changing the weight values of the first three parameters to [0.3, 0.3, 0.6, 0.1, 0.1, 0.1, 0.1]. As shown in Figure 11(c), although the convergence is relatively stable in both cases, the initial weight ratios we set yielded higher reward values and faster convergence.

Finally, to further verify the influence of the three main factors (remaining bandwidth, delay, and packet loss rate), we increased the weight of the delay parameter, setting the weights to [0.3, 0.7, 0.1, 0.1, 0.1, 0.1, 0.1]. As shown in Figure 11(d), the obtained reward value decreased slightly.

Similarly, we increased the weight of the packet loss rate and decreased the weight of the remaining bandwidth, setting the weights to [0.1, 0.3, 0.7, 0.1, 0.1, 0.1, 0.1]. As shown in Figure 11(e), tthe converged reward value achieved under this setting was closer to that achieved with the initial weight values, but the initial weight values we set are still better.

To further compare the importance of the delay and packet loss rate, we then swapped their weights, setting them to [0.7, 0.1, 0.3, 0.1, 0.1, 0.1, 0.1]. As shown in Figure 11(f), the initial weight values still yielded a higher reward value.

Through multiple adjustments of the parameter weights, the results consistently showed that the initial weights [0.7, 0.3, 0.1, 0.1, 0.1, 0.1, 0.1] offer the best convergence behavior and the highest reward values. These findings validate the reasonableness and accuracy of our initial weight setting.

The learning rate is a hyperparameter that controls the speed at which a neural network adjusts its weights based on the loss gradient, directly impacting how quickly an agent can converge to the optimal value. Generally, a higher learning rate leads to faster learning of the neural network, while a lower learning rate may cause the model to become trapped in a local optimum. However, if the learning rate is too high, this can cause oscillation in the loss function during the parameter optimization process, leading to failure to converge. Therefore, setting a proper learning rate is crucial. The algorithm used in this paper is A2C, which involves two neural networks, the actor network and the critic network. To optimize the learning rates of the actor network $(\alpha_1)$ and the critic network $(\alpha_2)$ , we fixed one learning rate and adjusted the other. First, we set $\alpha_2 = 3e-3$ and adjusted $\alpha_1$ . The results are shown in Figure 12.

Figure 12. Learning rate

$\alpha_1$ .

DownLoad: Full-Size Img PowerPoint

Based on the results of adjusting $\alpha_1$ , it was found that when $\alpha_1$ was set to $1e-5$ or $1e-6$ , the learning rate was too low for the network to converge easily. When $\alpha_1$ was set to 1e-3 or 1e-4, the reward value converged, but the convergence effect and reward value obtained with $\alpha_1 = 1e-3$ were the best. Then, $\alpha_1$ was fixed while $\alpha_2$ was adjusted, as shown in . The reward value converged under all tested values of $\alpha_2$ , but when $\alpha_2$ was set to $3e-3$ or $3e-4$ , he convergence speed was faster.

Figure 13. Learning rate

$\alpha_2$ .

DownLoad: Full-Size Img PowerPoint

Based on the results of the above two comparative experiments, we set $\alpha_1 = 1e-3$ and $\alpha_2 = 3e-3$ .

Based on the characteristics of the Markov process, we set a reward discount factor that discounts the rewards obtained in the future, with a greater discount for rewards from further ahead. This is because we wish to prioritize the current reward and avoid infinite rewards. By adjusting the discount factor (another hyperparameter), we can obtain intelligent agents with different performance, as shown in Figure 14.

Figure 14. Discount factor.

DownLoad: Full-Size Img PowerPoint

According to the results shown in Figure 14, setting the discount factor to 0.9 was found to yield the best performance.

The purpose of the batch size hyperparameter is to control the number of samples selected by the model during each training iteration, which can affect the degree and speed of model optimization. From Figure 15, we can see that the convergence situation is similar with batch sizes of 16, 32, and 64, whereas the performance obtained with a batch size of 128 is the worst.

Figure 15. Batch size.

DownLoad: Full-Size Img PowerPoint

Experiments on setting the update frequency of the neural networks of the intelligent agents, as shown in , were conducted by adjusting the $update\_time$ parameter to 1, 10, 100, and 1000. When this parameter was set to 1000, the intelligent agents were unable to reach the terminal state, so an additional parameter adjustment experiment with $update\_time$ set to 5 was added. The results in this figure show that the best convergence effect and the highest reward value were obtained when $update\_time$ was set to 10.

Figure 16. Update frequency.

DownLoad: Full-Size Img PowerPoint

Multi-agent reinforcement learning is used in this paper to find the optimal policy; therefore, another very important hyperparameter is the number of agents. The convergence speed and the final converged reward value are different with different numbers of agents. It is not generally true that more agents are better. In fact, the larger the number of agents, the harder it is for the multi-agent system to converge. In Figure 17, the experimental results show that when the number of agents is set to 3, the convergence of the reward value is the best.

Figure 17. Number of agents.

DownLoad: Full-Size Img PowerPoint

5.4. Comparative experiment

To evaluate the performance of MADRL-MR algorithm, we compared it with deep reinforcement learning based on value function using double dueling deep Q network (DQN) and priority experience replay (PER) in 10 nodes, 14 nodes and 21 nodes wireless network topologies respectively. As shown in Figure 18(a)–(c), the MADRL-MR algorithm achieves higher reward values and faster convergence speed in all three different topologies compared with Double-Dueling DQN. The results show that the MADRL-MR algorithm shows better performance in terms of reward acquisition and convergence speed. As shown in Figure 18, the MADRL-MR algorithm achieved higher reward values, faster convergence, and better performance throughout the entire training process compared to double dueling DQN.

Figure 18. Comparison between MADRL-MR and Dueling DQN. (a) Node10Net reward value. (b) Node14Net reward value. (c) Node21Net reward value.

DownLoad: Full-Size Img PowerPoint

We also compared the results of our algorithm with the experimental results of constructing Steiner trees using the classic KMB algorithm in three wireless network topologies: Node10Net, Node14Net and Node21Net. To demonstrate the influence of the network link parameters on multicast tree construction, we implemented three versions of the KMB algorithm using the residual bandwidth, delay, and packet loss rate as weights. We used the average throughput, delay, packet loss rate, residual bandwidth, tree length, and average distance between wireless APs in the multicast tree as performance evaluation indicators. The results are shown in Figures 19–21.

Figure 19. The performance comparison of MADRL-MR with

$KMB_{bw}$ ,

$KMB_{delay}$ , and

$KMB_{loss}$ in Node10Net. (a) Node10Net throughput; (b) Node10Net delay; (c) Node10Net loss; (d) Node10Net bandwidth; (e) Node10Net length; (f) Node10Net distance.

DownLoad: Full-Size Img PowerPoint

Figure 20. The performance comparison of MADRL-MR with

$KMB_{bw}$ ,

$KMB_{delay}$ , and

$KMB_{loss}$ in Node14Net. (a) Node14Net throughput; (b) Node14Net delay; (c) Node14Net loss; (d) Node14Net bandwidth; (e) Node14Net length; (f) Node14Net distance.

DownLoad: Full-Size Img PowerPoint

Figure 21. The performance comparison of MADRL-MR with

$KMB_{bw}$ ,

$KMB_{delay}$ , and

$KMB_{loss}$ in Node21Net. (a) Node21Net throughput; (b) Node21Net delay; (c) Node21Net loss; (d) Node21Net bandwidth; (e) Node21Net length; (f) Node21Net distance.

DownLoad: Full-Size Img PowerPoint

Figures 19(a), and present the network throughput experiment comparing MADRL-MR with the KMB algorithm using the bandwidth, delay, and packet loss rate as weights. It can be observed that as time progresses and the simulated network traffic grows, the network throughput under the proposed intelligent multicast routing algorithm is significantly higher than those under $KMB_{bw}$ , $KMB_{delay}$ , and $KMB_{loss}$ . For example, in Node14Net, the average On average, it is 58.71% higher than that under $KMB_{bw}$ and 31.8% higher than that under $KMB_{delay}$ .

Figures 19(b), and compare the average link delay of the multicast trees constructed by MADRL-MR, $KMB_{bw}$ , $KMB_{delay}$ , and $KMB_{loss}$ . The results show that the average link delay of MADRL-MR in all network topologies is smaller than $KMB_{bw}$ and $KMB_{loss}$ , which is very close to the value of $KMB_{delay}$ . For example, in Node14Net, as As the network traffic increases, the average link delay of MADRL-MR is 53.52% and 48.53% lower than those of $KMB_{bw}$ and $KMB_{loss}$ respectively, and is close to the value for $KMB_{delay}$ . This indicates that MADRL-MR achieves good performance in terms of the average link delay.

Figures 19(c), and compare the average packet loss rates on the links of the multicast trees constructed by MADRL-MR, $KMB_{bw}$ , $KMB_{delay}$ , and $KMB_{loss}$ . The results show that the average link packet loss rate of MADRL-MR is small in all network topologies. For example, in Node14Net, MADRL-MR performs slightly worse than $KMB_{loss}$ in terms of the average link packet loss rate, although the values of the two are very close. However, MADRL-MR outperforms $KMB_{bw}$ and $KMB_{delay}$ by 50.32% and 37.3% on average, respectively, indicating that MADRL-MR generally has a lower packet loss rate.

Figures 19(d), and compare the average link bandwidths of the multicast trees constructed by MADRL-MR, $KMB_{bw}$ , $KMB_{delay}$ , and $KMB_{loss}$ . The results show that MADRL-MR is significantly better than $KMB_{delay}$ and $KMB_{loss}$ in the average link bandwidth of multicast trees in all network topologies, and slightly better than $KMB_{bw}$ . For example, in Node14Net, MADRL-MR performs significantly better than $KMB_{delay}$ and $KMB_{loss}$ in terms of the average link bandwidth and exhibits an average improvement of 16.96% compared to $KMB_{bw}$ .

Figures 19(e), and compare the average lengths of the multicast trees constructed by MADRL-MR, $KMB_{bw}$ , $KMB_{delay}$ , and $KMB_{loss}$ . The results show that in Node10Net and Node14Net, the multicast tree constructed by MADRL-MR is longer on average than those constructed by the other three algorithms, reflecting the fact that our algorithm considers more parameters when constructing the multicast tree and considers more nodes when selecting nodes to join the multicast paths. However, it is smaller than $KMB_{loss}$ in Node21Net, although we consider more nodes, the proposed algorithm will make a compromise choice in the case of balancing length and performance.

Figures 19(f), and compare the average distances between wireless AP nodes in the multicast trees constructed by MADRL-MR, $KMB_{bw}$ , $KMB_{delay}$ , and $KMB_{loss}$ . The results show that MADRL-MR achieves good results in terms of the average distance between AP nodes of the multicast tree constructed in all network topologies. For example, in Node14Net, although Despite the longer average multicast tree length of MADRL-MR shown in , the distance between AP nodes does not show the same trend. As seen in , the MADRL-MR algorithm constructs multicast trees with a shorter average distance between AP nodes than $KMB_{bw}$ , $KMB_{delay}$ , and $KMB_{loss}$ , indicating that the proposed algorithm considers the distance between wireless AP nodes and achieves good results.

6. Conclusions

In this paper, we have introduced MADRL-MR, an intelligent multicast routing method based on multi-agent deep reinforcement learning in an SDWN environment. First, we addressed the issues of traditional wireless networks, such as the difficulty of controlling and maintaining nodes and the tight coupling of data forwarding and logic control in traditional network devices, making it difficult to achieve compatibility with other devices and software. To overcome these issues, we chose to utilize the decoupling of forwarding and control and the global perception capabilities in SDWN. Second, traditional multicast routing algorithms cannot effectively use the link information of the entire network to construct a multicast tree. Moreover, in deep reinforcement learning, multicast tree construction by a single agent has a slow convergence rate, leading to difficulty in responding quickly to the dynamic change of network link information. MADRL-MR effectively utilizes network link information and rapidly constructs the optimal multicast tree through mutual cooperation among multiple agents. Finally, to speed up multiagent training, the use of transfer learning techniques was proposed to accelerate the convergence rate of the agents.

In MADRL-MR, the design of the agents is based on the traffic matrix and the process of multicast tree construction. The state space is designed based on these factors. The design of the action space is different from that in other algorithms that use the k-paths approach because in k-paths, the paths are fixed, and the optimal path is chosen from among these fixed paths; however, the fixed path chosen in this way is not guaranteed to be the best. Therefore, a novel next-hop method is adopted instead to design the action space in this article. The agents explore and gradually construct the optimal multicast routes. For this purpose, a reward function is designed and calculated based on the traffic matrix of the network links.

The results of a large number of comparative experiments show that the proposed MADRL-MR algorithm offers better performance than three versions of the classic KMB algorithm implemented using the residual bandwidth, delay, and packet loss rate as weights. Additionally, in the network where the link information changes in real time, MADRL-MR can quickly deploy multicast routing solutions.

Furthermore, with the development and promotion of SDWN technology, the size of networks is becoming increasingly larger. A single controller will not be able to meet the needs of large-scale SDWN networks. Therefore, in the future, we will consider designing an intelligent multicast routing algorithm based on multi-agent deep reinforcement learning under multi-controller SDWN scenarios. And the mobility problem of data plane STAs and the joining and leaving of aps in SDWN will be considered. Additionally, reducing computation cost and the improving search efficiency will also be considered.

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgments

This work was supported in part by National Natural Science Foundation of China (Nos. 62161006, 62172095), the subsidization of Innovation Project of Guangxi Graduate Education (No. YCSW2023310), Key Laboratory of Cognitive Radio and Information Processing, Ministry of Education (Guilin University of Electronic Technology) (No. CRKL220103), and Guangxi Key Laboratory of Wireless Wideband Communication and Signal Processing (Nos. GXKL06220110, GXKL06230102).

Conflict of interest

The authors declare there is no conflict of interest.

References

[1]	C. C. Chow, K. D. Hall, The dynamics of human body weight change, PLOS Comput. Biol., 4 (2008).
[2]	WHO obesity and overweight, 2018. Available from: http://www.who.int/mediacentre/factsheets/fs311/en/.
[3]	H. Gu, Sh. Shao, J. Liu, Zh. Fan, Y. Chen, J. Ni, et.al., Age- and sex-associated impacts of body mass index on stroke type risk: a 27-year prospective cohort study in a low-income population in china, Front. Neurol., 10 (2019).
[4]	K. Cheng, Health oriented lifelong nutrition controls: preventing cardiovascular diseases caused by obesity, SM J. Nutr. Metab., 6 (2020), 1–5.
[5]	K. Mc Namara, H. Alzubaidi, J. K. Jackson, Cardiovascular disease as a leading cause of death: how are pharmacists getting involved?, Integr. Pharm. Res. Pract., 8 (2019), 1–11.
[6]	E. J. Benjamin, P. Muntner, A. Alonso, M. S. Bittencourt, C. W. Callaway, A. P. Carson, et. al., Heart disease and stroke statistics—2019 update: a report from the american heart association, Circulation, 139 (2019), 56–528
[7]	N. Taghizadeh,, H. M. Boezen, J. P. Schouten, C. P. Schröder, E. G. E. de Vries, J. M. Vonk, BMI and lifetime changes in BMI and cancer mortality risk, PLoS ONE, 10 (2015).
[8]	K. Bhaskaran, I. Douglas, H. Forbes, I. dos Santos-Silva, D. A. Leon, and L. Smeeth, Bodymass index and risk of 22 specific cancers: a population-based cohort study of 5.24 million UK adults, Lancet, 384 (2014), 755–765.
[9]	A. S. Barnes, The epidemic of obesity and diabetes, Tex. Heart I. J., 38 (2011), 142–144.
[10]	A. Golay and J. Ybarra, Link between obesity and type 2 diabetes, Best Pract. Res. Cl. En., 19 (2005), 649–663.
[11]	A. S. Al-Goblan, M. A. Al-Alfi, and M. Z. Khan, Mechanism linking diabetes mellitus and obesity, Diabetes Metab. Syndr. Obes., 7 (2014), 587–591.
[12]	N. A. Roper, R. W. Bilous, W. F. Kelly, N. C. Unwin, and V. M. Connolly, Cause-specific mortality in a population with diabetes: south tees diabetes mortality study, Diabetes Care, 25 (2002), 43–48.
[13]	M. Tancredi, A. Rosengren, A.-M. Svensson, M. Kosiborod, A. Pivodic, S. Gudbjörnsdottir, et. al., Excess mortality among persons with type 2 diabetes, New Engl. J. Med., 373 (2015), 1720–1732.
[14]	M. Kalligeros, F. Shehadeh, E. K. Mylona, G. Benitez, C. G. Beckwith, P. A. Chan, et.al., Association of obesity with disease severity among patients with coronavirus disease 2019, Obesity, 28 (2020), 1200–1204.
[15]	R. A. DeFronzo, R. C. Bonadonna, E. Ferrannini, Pathogenesis of NIDDM: a balanced overview, Diabetes Care, 15 (1992), 318–368.
[16]	E. Archer, C. J. Lavie, and J. O. Hill, The contributions of 'diet', 'genes', and physical activity to the etiology of obesity: contrary evidence and consilience, Prog. Cardiovasc. Dis., 61 (2018), 89–102.
[17]	A. D. Baron, G. Brechtel, P. Wallace, S. V. Edelman, Rates and tissue sites of non-insulin- and insulin-mediated glucose uptake in humans, Am. J. Physiol. Endoc. M., 255 (1988), 769–774.
[18]	R. A. DeFronzo, D. Tripathy, Skeletal muscle insulin resistance is the primary defect in type 2 diabetes, Diabetes Care, 32 (2009), S157–S163.
[19]	E. Archer, G. Pavela, S. McDonald, C. J. Lavie, and J. O. Hill, Cell-specific "competition for calories" drives asymmetric nutrient-energy partitioning, obesity, and metabolic diseases in human and non-human animals, Front. Physiol., 9 (2018), 1053.
[20]	R. A. DeFronzo, The triumvirate: $\beta-$ cell, muscle, liver: a collusion responsible for NIDDM, Diabetes, 37 (1988), 667–687.
[21]	A. V. Greco, G. Mingrone, A. Giancaterini, M. Manco, M. Morroni, S. Cinti, et. al., Insulin resistance in morbid obesity: reversal with intramyocellular fat depletion, Diabetes, 51 (2002), 144–151.
[22]	O. T. Hardy, M. P. Czech, S. Corvera, What causes the insulin resistance underlying obesity?, Curr. Opin. Endocrinol., 19 (2012), 81–87.
[23]	C. Roberts-Toler, B. T. O'Neill, A. M. Cypess, Diet-induced obesity causes insulin resistance in mouse brown adipose tissue: dio causes bat insulin resistance, Obesity, 23 (2015), 1765–1770.
[24]	R. Firth, P. Bell, H. Marsh, I. Hansen, R. Rizza, Postprandial hyperglycemia in patients with noninsulin-dependent diabetes mellitus, J. Clin. Invest., 77 (1986), 1525–1532.
[25]	Understanding satiety: feeling full after a meal - British nutrition foundation, 2018. Available from: https://www.nutrition.org.uk/healthyliving/fuller/understanding-satiety-feeling-full-after-a-meal.html.
[26]	E. Bilman, E. van Kleef, H. van Trijp, External cues challenging the internal appetite control system—overview and practical implications, Cr. Rev. Food Sci., 57 (2017), 2825–2834.
[27]	EO. G. Edholm, J. G. Fletcher, E. M. Widdowson, R. A. McCance, The energy expenditure and food intake of individual men, Brit. J. Nutr., 9 (1955), 286–300.
[28]	C. B. Saper, T. C. Chou, J. K. Elmquist, The need to feed: homeostatic and hedonic control of eating, Neuron, 36 (2002), 199–211.
[29]	E. Mamontov, Modelling homeorhesis by ordinary differential equations, Math. Comput. Model., 45 (2007), 694–707.
[30]	D. F. Marks, Homeostatic theory of obesity, Health Psychology Open, 2 (2015), 1–30.
[31]	R. D. Palmiter, Is dopamine a physiologically relevant mediator of feeding behavior?, Trends Neurosci., 30 (2007), 375–381.
[32]	D. E. Cummings, Ghrelin and the short- and long-term regulation of appetite and body weight, Physiol. Behav., 89 (2006), 71–84.
[33]	J. Vartiainen, Ghrelin, obesity and type 2 diabetes: genetic, metabolic and epidemiological studies, Ph.D thesis, University of Oulu, 2009.
[34]	I. Nilsson, C. Lindfors, S. O. Fetissov, T. Hökfelt, J. E. Johansen, Aberrant agouti-related protein system in the hypothalamus of the anx/anx mouse is associated with activation of microglia, J. Comp. Neurol., 507 (2008), 1128–1140.
[35]	A. M. Chao, A. M. Jastrebo, M. A. White, C. M. Grilo, R. Sinha, Stress, cortisol, and other appetite-related hormones: prospective prediction of 6-month changes in food cravings and weight, Obesity (Silver Spring, Md.), 25 (2017), 713–720.
[36]	A. Uchida, J. M. Zigman, and M. Perello, Ghrelin and eating behavior: evidence and insights from genetically-modified mouse models, Front. Neurosci., 7 (2013), 121.
[37]	Y. Sun, S. Ahmed, R. G. Smith, Deletion of ghrelin impairs neither growth nor appetite, Mol. Cell. Biol., 23 (2003), 7973–7981.
[38]	K. E. Wortley, K. D. Anderson, K. Garcia, J. D. Murray, L. Malinova, R. Liu, M. Moncrieffe, et.al., Genetic deletion of ghrelin does not decrease food intake but influences metabolic fuel preference, P. Natl. Acad. Sci. USA., 101 (2004), 8227–8232.
[39]	K. E. Wortley, J.-P. del Rincon, J. D. Murray, K. Garcia, K. Iida, M. O. Thorner, et. al., Absence of ghrelin protects against early-onset obesity, J. Clin. Invest., 115 (2005), 3573–3578.
[40]	B. De Smet, I. Depoortere, D. Moechars, Q. Swennen, B. Moreaux, K. Cryns, et. al., Energy homeostasis and gastric emptying in ghrelin knockout mice, J. Pharmacol. Exp. Ther., 316 (2006), 431–439.
[41]	K. Dezaki, H. Sone, M. Koizumi, M. Nakata, M. Kakei, H. Nagai, et. al., Blockade of pancreatic islet-derived ghrelin enhances insulin secretion to prevent high-fat diet-induced glucose intolerance, Diabetes, 55 (2006), 3486–3493.
[42]	P. T. Pfluger, H. Kirchner, S. Günnel, B. Schrott, D. Perez-Tilve, S. Fu, et. al., Simultaneous deletion of ghrelin and its receptor increases motor activity and energy expenditure, Am. J. Physiol. Gastrointest. Liver Physiol., 294 (2008), 610–618.
[43]	T. Sato, M. Kurokawa, Y. Nakashima, T. Ida, T. Takahashi, Y. Fukue, et. al., Ghrelin deficiency does not influence feeding performance., Regul. Peptides, 145 (2008), 7–11.
[44]	Y. Sun, P. Wang, H. Zheng, R. G. Smith, Ghrelin stimulation of growth hormone release and appetite is mediated through the growth hormone secretagogue receptor, P. Natl. Acad. Sci. USA., 101 (2004), 4679–4684.
[45]	A. Abizaid, Z.-W. Liu, Z. B. Andrews, M. Shanabrough, E. Borok, J. D. Elsworth, et. al., Ghrelin modulates the activity and synaptic input organization of midbrain dopamine neurons while promoting appetite, J. Clin. Invest., 116 (2006), 3229–3239.
[46]	I. D. Blum, Z. Patterson, R. Khazall, E. W. Lamont, M. W. Sleeman, T. L. Horvath, et. al., Reduced anticipatory locomotor responses to scheduled meals in ghrelin receptor deficient mice, Neuroscience, 164 (2009), 351–359.
[47]	L. Lin, P. K. Saha, X. Ma, I. O. Henshaw, L. Shao, B. H. J. Chang, E. D. Buras, et. al., Ablation of ghrelin receptor reduces adiposity and improves insulin sensitivity during aging by regulating fat metabolism in white and brown adipose tissues, Aging Cell, 10 (2011), 996–1010.
[48]	X. Ma, L. Lin, G. Qin, X. Lu, M. Fiorotto, V. D. Dixit, et. al., Ablations of ghrelin and ghrelin receptor exhibit differential metabolic phenotypes and thermogenic capacity during aging, PLOS ONE, 6 (2011), 1–10.
[49]	M. D. Klok, S. Jakobsdottir, M. L. Drent, The role of leptin and ghrelin in the regulation of food intake and body weight in humans: a review, Obes. Rev., 8 (2007), 21–34.
[50]	D. P. Figlewicz, S. B. Evans, J. Murphy, M. Hoen, D. G. Baskin, Expression of receptors for insulin and leptin in the ventral tegmental area/substantia nigra (vta/sn) of the rat, Brain Res., 964 (2003), 107–115.
[51]	D. P. Figlewicz, P. Szot, M. Chavez, S. C. Woods, R. C. Veith, Intraventricular insulin increases dopamine transporter mrna in rat vta/substantia nigra, Brain Res., 644 (1994), 331–334.
[52]	What is non-diabetic hypoglycemia?, 2019. Available from: https://www.webmd.com/diabetes/non-diabetic-hypoglycemia.
[53]	Hypoglycemia: signs, risks, causes, and how to raise low blood sugar \|everyday health, 2019. Available from: https://www.everydayhealth.com/hypoglycemia/guide/.
[54]	Low blood sugar? 8 warning signs if you have diabetes, 2019. Available from: https://health.clevelandclinic.org/low-blood-sugar-8-warning-signs-diabetes/.
[55]	Understanding hypoglycemia, 2019. Available from: https://www.diabetesselfmanagement.com/managing-diabetes/blood-glucose-management/understanding-hypoglycemia/.
[56]	The effects of low blood sugar on your body, 2019. Available from: https://www.healthline.com/health/low-blood-sugar-effects-on-body.
[57]	Hypoglycemia (low blood sugar): causes and treatment, 2019. Available from: https://www.medicalnewstoday.com/articles/166815.php.
[58]	Non-diabetic hypoglycemia: symptoms, causes, diagnosis, treatment, 2019. Available from: https://www.webmd.com/diabetes/non-diabetic-hypoglycemia#1.
[59]	Polyphagia: the relationship between hunger and diabetes, 2019. Available from: https://www.thediabetescouncil.com/.
[60]	J. L. Jameson, L. J. D. Groot, Endocrinology: adult and pediatric, 7th edition, Elsevier Health Sciences, 2015.
[61]	J. R. Hupp, M. R. Tucker, E. Ellis, Contemporary oral and maxillofacial surgery, 1st edition, Elsevier Health Sciences, 2013.
[62]	D. A. Schatz, M. Haller, M. Atkinson, Type 1 diabetes, an issue of endocrinology and metabolism clinics of north america, 1st edition, Elsevier Health Sciences, 2010.
[63]	L. A. Fleisher, M. F. Roizen, J. Roizen, Essence of anesthesia practice, 4th edition, Elsevier Health Sciences, 2017.
[64]	G. Cheney, Medical management of gastrointestinal disorders, 1st edition, Year Book, 1950.
[65]	M, Manthappa, How to manage your diabetes and lead a normal life, 1st edition, Peacock Books, 2009.
[66]	R. K. Bernstein, Hunger–a common symptom of hypoglycemia, Diabetes Care, 16 (1993), 1049.
[67]	C. Kenny, When hypoglycemia is not obvious: diagnosing and treating under-recognized and undisclosed hypoglycemia, Prim. Care Diabetes, 8 (2014), 3–11.
[68]	J. Morales, D. Schneider, Hypoglycemia, Am. J. Med., 127 (2014), 17–24.
[69]	L. C. Perlmuter, B. P. Flanagan, P. H. Shah, S. P. Singh, Glycemic control and hypoglycemia, Diabetes Care, 31 (2008), 2072–2076.
[70]	B. Schultes, K. M. Oltmanns, W. Kern, H. L. Fehm, J. Born, A. Peters, Modulation of hunger by plasma glucose and metformin, J. Clin. Endocrinol. Metab., 88 (2003), 1133–1141.
[71]	B. Schultes, A. Peters, M. Hallschmid, C. Benedict, V. Merl, K. M. Oltmanns, et. al., Modulation of food intake by glucose in patients with type 2 diabetes, Diabetes Care, 28 (2005), 2884–2889.
[72]	H. A. J. Gielkens, M. Verkijk, W. F. Lam, C. B. H. W. Lamers, A. A. M. Masclee, Effects of hyperglycemia and hyperinsulinemia on satiety in humans, Metabolism, 47 (1998), 321–324.
[73]	B. Schultes, A. K. Panknin, M. Hallschmid, K. Jauch-Chara, B.Wilms, F. de Courbière, et. al., Glycemic increase induced by intravenous glucose infusion fails to affect hunger, appetite, or satiety following breakfast in healthy men, Appetite, 105 (2016), 562–566.
[74]	J. M. McMillin, Blood glucose, 3rd edition, Butterworths, Boston, 1990.
[75]	J. Yokrattanasak, A. De Gaetano, S. Panunzi, P. Satiracoo, W. M. Lawton, Y. Lenbury, A simple, realistic stochastic model of gastric emptying, PLoS ONE, 11 (2016).
[76]	L. K. Phillips, C. K. Rayner, K. L. Jones, M. Horowitz, Measurement of gastric emptying in diabetes, J. Diabetes Complicat., 28 (2014), 894–903.
[77]	S. G. Cao, H. Wu, Z. Z. Cai, Dose-dependent effect of ghrelin on gastric emptying in rats and the related mechanism of action, The Kaohsiung J. Med. Sci., 32 (2016), 113–117.
[78]	L. X. Yu, G. L. Amidon, A compartmental absorption and transit model for estimating oral drug absorption, Int. J. Pharm., 186 (1999), 119–125.
[79]	K. Ogungbenro, L. Aarons, A semi-mechanistic gastric emptying pharmacokinetic model for (13)C-octanoic acid: an evaluation using simulation, Eur. J. Pharm. Sci., 45 (2012), 302–310.
[80]	J. D. Berke, S. E. Hyman, Addiction, dopamine, and the molecular mechanisms of memory, Neuron, 25 (2000), 515–532.
[81]	R. A. Wise, Addictive drugs and brain stimulation reward, Annu. Rev. Neurosci., 19 (1996), 319–340.
[82]	P. M. Milner, Brain-stimulation reward: a review, Can. J. Psychol., 45 (1991), 1–36.
[83]	J. M. Liebman, Discriminating between reward and performance: a critical review of intracranial self-stimulation methodology, Neurosci. Biobehav. R., 7 (1983), 45–72.
[84]	R. A. Wise, Brain reward circuitry: insights from unsensed incentives, Neuron, 36 (2002), 229–240.
[85]	A. E. Kelley, V. P. Bakshi, S. N. Haber, T. L. Steininger, M. J.Will, M. Zhang, Opioid modulation of taste hedonics within the ventral striatum, Physiol. Behav., 76 (2002), 365–377.
[86]	R. Coccurello, M. Maccarrone, Hedonic eating and the "delicious circle": from lipid-derived mediators to brain dopamine and back, Front. Neurosci-Switz., 12 (2018).
[87]	S. L. Teegarden, T. L. Bale, Decreases in dietary preference produce increased emotionality and risk for dietary relapse, Biol. Psychiat., 61 (2007), 1021–1029.
[88]	B. G. Hoebel, N. M. Avena, M. E. Bocarsly, P. Rada, Natural addiction: a behavioral and circuit model based on sugar addiction in rats, J. Addict. Med., 3 (2009), 33–41.
[89]	J.W. Dalley, B. J. Everitt, T.W. Robbins, Impulsivity, compulsivity, and top-down cognitive control, Neuron, 69 (2011), 680–694.
[90]	E. N. Pothos, V. Davila, D. Sulzer, Presynaptic recording of quanta from midbrain dopamine neurons and modulation of the quantal size, J. Neurosci., 18 (1998), 4106–4118.
[91]	S. Fulton, P. Pissios, R. P. Manchon, L. Stiles, L. Frank, E. N. Pothos, et. al., Leptin regulation of the mesoaccumbens dopamine pathway, Neuron, 51 (2006), 811–822.
[92]	K. Toshinai, Y. Date, N. Murakami, M. Shimada, M. S. Mondal, T. Shimbara, J. L. Guan, et. al., Ghrelin-induced food intake is mediated via the orexin pathway, Endocrinology, 144 (2003), 1506–1512.
[93]	H. Y. Chen, M. E. Trumbauer, A. S. Chen, D. T. Weingarth, J. R. Adams, E. G. Frazier, et. al., Orexigenic action of peripheral ghrelin is mediated by neuropeptide y and agouti-related protein, Endocrinology, 145 (2004), 2607–2612.
[94]	S. Luquet, C. T. Phillips, R. D. Palmiter, NPY/AgRP neurons are not essential for feeding responses to glucoprivation, Peptides, 28 (2007), 214–225.
[95]	K. Bugarith, T. T. Dinh, A. J. Li, R. C. Speth, S. Ritter, Basomedial hypothalamic injections of neuropeptide Y conjugated to saporin selectively disrupt hypothalamic controls of food intake, Endocrinology, 146 (2005), 1179–1191.
[96]	Y. Date, T. Shimbara, S. Koda, K. Toshinai, T. Ida, N. Murakami, et. al., Peripheral ghrelin transmits orexigenic signals through the noradrenergic pathway from the hindbrain to the hypothalamus, Cell Metab., 4 (2006), 323–331.
[97]	M. L. Westwater, P. C. Fletcher, H. Ziauddeen, Sugar addiction: the state of the science, Eur. J. Nutr., 55 (2016), 55–69.
[98]	P. C. Fletcher, P. J. Kenny, Food addiction: a valid concept?, Neuropsychopharmacol., 43 (2018), 2506–2513.
[99]	T. L. Davidson, S. Jones, M. Roy, R. J. Stevenson, The cognitive control of eating and body weight: it's more than what you "think", Front. Psychol., 10 (2019).
[100]	A. De Gaetano, T. A. Hardy, A novel fast-slow model of diabetes progression: insights into mechanisms of response to the interventions in the diabetes prevention program, PLOS ONE, 14 (2019), 1–39.
[101]	J. Ha, L. S. Satin, A. S. Sherman, A mathematical model of the pathogenesis, prevention, and reversal of type 2 diabetes, Endocrinology, 157 (2016), 624–635.
[102]	A. Borri, S. Panunzi, A. De Gaetano, A glycemia-structured population model, J. Math. Biol., 73 (2016), 39–62.
[103]	P. Palumbo, S. Ditlevsen, A. Bertuzzi, A. De Gaetano, Mathematical modeling of the glucose-insulin system: a review, Math. Biosci., 244 (2013), 69–81.
[104]	I. Ajmera, M. Swat, C. Laibe, N. L. Nov'ere, V. Chelliah, The impact of mathematical modeling on the understanding of diabetes and related complications, CPT: Pharmacometrics Syst. Pharmacol., 2 (2013).
[105]	T. Hardy, E. Abu-Raddad, N. Porksen, A. De Gaetano, Evaluation of a mathematical model of diabetes progression against observations in the diabetes prevention program, Am. J. Physiol. Endoc. M., 303 (2012), 200–212.
[106]	J. Ribbing, B. Hamrén, M. K. Svensson, M. O. Karlsson, A model for glucose, insulin, and beta-cell dynamics in subjects with insulin resistance and patients with type 2 diabetes, J. Clin. Pharmacol., 50 (2010), 861–872.
[107]	A. De Gaetano, T. Hardy, B. Beck, E. Abu-Raddad, P. Palumbo, J. Bue-Valleskey, et. al., Mathematical models of diabetes progression, Am. J. Physiol. Endoc. M., 295 (2008), 1462–1479.
[108]	C. C. Mason, R. L. Hanson, W. C. Knowler, Progression to type 2 diabetes characterized by moderate then rapid glucose increases, Diabetes, 56 (2007), 2054–2061.
[109]	W. de Winter, J. DeJongh, T. Post, B. Ploeger, R. Urquhart, I. Moules, et. al., A mechanism-based disease progression model for comparison of long-term effects of pioglitazone, metformin and gliclazide on disease processes underlying type 2 diabetes mellitus, J. Pharmacokinet. Phar., 33 (2006), 313–343.
[110]	A. Bagust, M. Evans, S. Beale, P. D. Home, A. S. Perry, M. Stewart, A model of long-term metabolic progression of type 2 diabetes mellitus for evaluating treatment strategies, PharmacoEconomics, 24 Suppl 1 (2006), 5–19.
[111]	B. Topp, K. Promislow, G. deVries, R. M. Miura, D. T. Finegood, A model of beta-cell mass, insulin, and glucose kinetics: pathways to diabetes, J. Theor. Biol., 206 (2000), 605–619.
[112]	T. Okura, R. Nakamura, Y. Fujioka, S. Kawamoto-Kitao, Y. Ito, K. Matsumoto, et. al., Body mass index $\geq23$ is a risk factor for insulin resistance and diabetes in japanese people: a brief report, PLOS ONE, 13 (2018).
[113]	Y. H. Cheng, Y. C. Tsao, I. S. Tzeng, H. H. Chuang, W. C. Li, T. H. Tung, et. al., Body mass index and waist circumference are better predictors of insulin resistance than total body fat percentage in middle-aged and elderly Taiwanese, Medicine, 96 (2017).
[114]	J. A. Hawley, Exercise as a therapeutic intervention for the prevention and treatment of insulin resistance, Diabetes Metab. Res. Rev., 20 (2004), 383–393.
[115]	R. N. Bergman, Y. Z. Ider, C. R. Bowden, C. Cobelli, Quantitative estimation of insulin sensitivity., Am. J. Physiol. Endoc. M., 236 (1979).
[116]	G. Toffolo, R. N. Bergman, D. T. Finegood, C. R. Bowden, C. Cobelli, Quantitative estimation of beta cell sensitivity to glucose in the intact organism: a minimal model of insulin kinetics in the dog, Diabetes, 29 (1980), 979–990.
[117]	C. Dalla Man, R. A. Rizza, C. Cobelli, Meal simulation model of the glucose-insulin system, IEEE trans. Biomed. Eng., 54 (2007), 1740–1749.
[118]	W. Liu, F. Tang, Modeling a simplified regulatory system of blood glucose at molecular levels, J. Theor. Biol., 252 (2008), 608–620.
[119]	W. Liu, C. Hsin, F. Tang, A molecular mathematical model of glucose mobilization and uptake, Math. Biosci., 221 (2009), 121–129.
[120]	Z. Wu, C. K. Chui, G. S. Hong, S. Chang, Physiological analysis on oscillatory behavior of glucose–insulin regulation by model with delays, J. Theor. Biol., 280 (2011), 1–9.
[121]	M. Lombarte, M. Lupo, G. Campetelli, M. Basualdo, A. Rigalli, Mathematical model of glucose–insulin homeostasis in healthy rats, Math. Biosci., 245 (2013), 269–277.
[122]	A. C. Pratt, J. A. D. Wattis, A. M. Salter, Mathematical modelling of hepatic lipid metabolism, Math. Biosci., 262 (2015), 167–181.
[123]	J. Girard, The incretins: from the concept to their use in the treatment of type 2 diabetes. part a: incretins: concept and physiological functions, Diabetes Metab., 34 (2008), 550–559.
[124]	J. J. Holst, C. F. Deacon, T. Vilsbøll, T. Krarup, S. Madsbad, Glucagon-like peptide-1, glucose homeostasis and diabetes, Trends Mol. Med., 14 (2008), 161–168.
[125]	J. J. Holst, T. Vilsbøll, C. F. Deacon, The incretin system and its role in type 2 diabetes mellitus, Mol. Cell. Endocrinol., 297 (2009), 127–136.
[126]	K. Kazakos, Incretin effect: GLP-1, GIP, DPP4, Diabetes Res. Clin. Pr., 93 (2011), 32–36.
[127]	J. J. Holst, C. F. Deacon, Is there a place for incretin therapies in obesity and prediabetes?, Trends Endocrin. Met., 24 (2013), 145–152.
[128]	S. Masroor, M. G. J. van Dongen, R. Alvarez-Jimenez, K. Burggraaf, L. A. Peletier, M. A. Peletier, Mathematical modeling of the glucagon challenge test, J. Pharmacokinet. Phar., 46 (2019), 553–564.
[129]	S. J. Russell, F. H. El-Khatib, M. Sinha, K. L. Magyar, K. McKeon, L. G. Goergen, et. al., Outpatient glycemic control with a bionic pancreas in type 1 diabetes, New Engl. J. Med., 371 (2014), 313–325.
[130]	G. Zhao, D. Wirth, I. Schmitz, M. Meyer-Hermann, A mathematical model of the impact of insulin secretion dynamics on selective hepatic insulin resistance, Nat. Commun., 8 (2017), 1–10.
[131]	A. De Gaetano, O. Arino, Mathematical modelling of the intravenous glucose tolerance test, J. Math. Biol., 40 (2000), 136–168.
[132]	Y. Lenbury, S. Ruktamatakul, S. Amornsamarnkul, Modeling insulin kinetics: responses to a single oral glucose administration or ambulatory-fed conditions, Biosystems., 59 (2001), 15–25.
[133]	A. Mukhopadhyay, A. De Gaetano, O. Arino, Modeling the intra-venous glucose tolerance test: a global study for a single-distributed-delay model, Discrete Contin. Dyn. S., 4 (2004), 407.
[134]	U. Picchini, A. De Gaetano, S. Panunzi, S. Ditlevsen, G. Mingrone, A mathematical model of the euglycemic hyperinsulinemic clamp, Theor. Biol. Med. Model., 2 (2005), 44.
[135]	U. Picchini, S. Ditlevsen, A. De Gaetano, Modeling the euglycemic hyperinsulinemic clamp by stochastic differential equations, J. Math. Biol., 53 (2006), 771–796.
[136]	S. Panunzi, P. Palumbo, A. De Gaetano, A discrete single delay model for the intra-venous glucose tolerance test, Theor. Biol. Med. Model., 4 (2007), 35.
[137]	D. V. Giang, Y. Lenbury, A. De Gaetano, P. Palumbo, Delay model of glucose–insulin systems: global stability and oscillated solutions conditional on delays, J. Math. Anal. Appl., 343 (2008), 996–1006.
[138]	J. Li, M. Wang, A. De Gaetano, P. Palumbo, S. Panunzi, The range of time delay and the global stability of the equilibrium for an IVGTT model, Math. Biosci., 235 (2012), 128–137.
[139]	P. Palumbo, P. Pepe, S. Panunzi, A. De Gaetano, Time-delay model-based control of the glucose-insulin system, by means of a state observer., Eur. J. Control, 6 (2012), 591–606.
[140]	P. Toghaw, A. Matone, Y. Lenbury, A. De Gaetano, Bariatric surgery and T2DM improvement mechanisms: a mathematical model, Theor Biol Med Model, 9 (2012).
[141]	A. De Gaetano, S. Panunzi, A. Matone, A. Samson, J. Vrbikova, B. Bendlova, et. al., Routine OGTT: a robust model including incretin effect for precise identification of insulin sensitivity and secretion in a single individual, PLOS ONE, 8 (2013).
[142]	P. Palumbo, G. Pizzichelli, S. Panunzi, P. Pepe, A. De Gaetano, Model-based control of plasma glycemia: tests on populations of virtual patients, Math Biosci, 257 (2014), 2–10.
[143]	K. Juagwon, Y. Lenbury, A. De Gaetano, P. Palumbo, Application of modified watanabe's approach for reconstruction of insulin secretion rate during OGTT under non-constant fraction of hepatic insulin extraction, Int. J. Math. Comp. Simul., 7 (2013), 304–313.
[144]	A. De Gaetano, S. Panunzi, D. Eliopoulos, T. Hardy, G. Mingrone, Mathematical modeling of renal tubular glucose absorption after glucose load, PLOS ONE, 9 (2014).
[145]	S. Sakulrang, E. J. Moore, S. Sungnul, A. De Gaetano, A fractional differential equation model for continuous glucose monitoring data, Adv. Differ. Equ-NY., (2017).
[146]	P. Palumbo, A. De Gaetano, An islet population model of the endocrine pancreas, J. Math. Biol., 61 (2010), 171–205.
[147]	A. De Gaetano, C. Gaz, P. Palumbo, S. Panunzi, A unifying organ model of pancreatic insulin secretion, PLOS ONE, 10 (2015).
[148]	A. De Gaetano, C. Gaz, S. Panunzi, Consistency of compact and extended models of glucose-insulin homeostasis: the role of variable pancreatic reserve, PLOS ONE, 14 (2019).
[149]	T. Hardy, E. Abu-Raddad, N. Porksen, A. De Gaetano, Evaluation of a mathematical model of diabetes progression against observations in the diabetes prevention program, Am. J. Physiol. Endocrinol. Metab., 303 (2012), 200–212.
[150]	A. Hinsberger, B. K. Sandhu, Digestion and absorption, Current Paediatrics, 14 (2004), 605–611.
[151]	A. D. Jackson, J. McLaughlin, Digestion and absorption, Surgery, 27 (2009), 231–236.
[152]	I. Campbell, Digestion and absorption, Anaesth. Intens. Care Med., 13 (2012), 62–63.
[153]	P. R. Kiela, F. K. Ghishan, Physiology of intestinal absorption and secretion, Best Practice & Research Clinical Gastroenterology, 30 (2016), 145–159.
[154]	A. B. Strathe, A. Danfær, A. Chwalibog, A dynamic model of digestion and absorption in pigs, Anim. Feed Sci. Tech., 143 (2008), 328–371.
[155]	S. Salinari, A. Bertuzzi, G. Mingrone, Intestinal transit of a glucose bolus and incretin kinetics: a mathematical model with application to the oral glucose tolerance test, Am. J. Physiol. Endoc. M., 300 (2011), 955–965.
[156]	M. Taghipoor, G. Barles, C. Georgelin, J. R. Licois, P. Lescoat, Digestion modeling in the small intestine: impact of dietary fiber, Math. Biosci., 258 (2014), 101–112.
[157]	E. D. Lehmann, T. Deutsch, , A physiological model of glucose-insulin interaction in type 1 diabetes mellitus, J. Biomed. Eng., 14 (1992), 235–242.
[158]	A. Roy, R. S. Parker, Dynamic modeling of exercise effects on plasma glucose and insulin levels, J. Diabetes Sci. Tech., 1 (2007), 338–347.
[159]	R. Hovorka, V. Canonico, L. J. Chassin, U. Haueter, M. Massi-Benedetti, M. O. Federici, et. al., Nonlinear model predictive control of glucose concentration in subjects with type 1 diabetes, Physiol. Meas., 25 (2004), 905–920.
[160]	G. M. Barnwell, F. S. Stafford, Mathematical model for decision-making neural circuits controlling food intake, B. Psychonomic Soc., 5 (1975), 473–476.
[161]	R. C. Boston, P. J. Moate, K. C. Allison, J. D. Lundgren, A. J. Stunkard, Modeling circadian rhythms of food intake by means of parametric deconvolution: results from studies of the night eating syndrome, Am. J. Clin. Nutr., 87 (2008), 1672–1677.
[162]	F. Cameron, G. Niemeyer, B. A. Buckingham, Probabilistic evolving meal detection and estimation of meal total glucose appearance, J. Diabetes Sci. Tech., 3 (2009), 1022–1030.
[163]	N. P. Balakrishnan, L. Samavedham, G. P. Rangaiah, Personalized mechanistic models for exercise, meal and insulin interventions in children and adolescents with type 1 diabetes, J. Theor. Biol., 357 (2014), 62–73.
[164]	M. Jacquier, F. Crauste, C. O. Soulage, H. A. Soula, A predictive model of the dynamics of body weight and food intake in rats submitted to caloric restrictions, PLOS ONE, 9 (2014).
[165]	A. L. Murillo, M. Safan, C. Castillo-Chavez, E. D. C. Phillips, D. Wadhera, Modeling eating behaviors: the role of environment and positive food association learning via a ratatouille effect, Math. Biosci. Eng., 13 (2016), 841–855.
[166]	NHANES 2015-2016 dietary data, 2019. Available from: https://wwwn.cdc.gov/Nchs/Nhanes/Search/DataPage.aspx?Component=Dietary&CycleBeginYear=201.
[167]	D. E. Cummings, J. Q. Purnell, R. S. Frayo, K. Schmidova, B. E. Wisse, D. S. Weigle, A preprandial rise in plasma ghrelin levels suggests a role in meal initiation in humans, Diabetes, 50 (2001), 1714–1719.
[168]	P. Toghaw, A. Matone, Y. Lenbury, A. De Gaetano, Bariatric surgery and T2DM improvement mechanisms: a mathematical model, Theor. Biol. Med. Model., 9 (2012).
[169]	L. Beaugerie, B. Flourié, P. Marteau, P. Pellier, C. Franchisseur, J. C. Rambaud, Digestion and absorption in the human intestine of three sugar alcohols, Gastroenterology, 99 (1990), 717–723.
[170]	Y. Tsuchida, S. Hata, Y. Sone, Effects of a late supper on digestion and the absorption of dietary carbohydrates in the following morning, J. Physiol. Anthropol., 32 (2013).
[171]	R. M. Atkinson, B. J. Parsons, D. H. Smyth, The intestinal absorption of glucose, J. Physiol., 135 (1957), 581–589.
[172]	Big mac®: calories and nutrition \|mcdonald's, 2019. Available from: https://www.mcdonalds.com/us/en-us/product/big-mac.htm.
[173]	The nutritional content of beer, 2018. Available from: http://www.dummies.com/food-drink/drinks/beer/ the-nutritional-content-of-beer/.
[174]	T. M. S. Wolever, Carbohydrate and the regulation of blood glucose and metabolism, Nutr. Rev., 61 (2003), 40–48.
[175]	P. J. Randle, P. B. Garland, C. N. Hales, E. A. Newsholme, The glucose fatty-acid cycle. its role in insulin sensitivity and the metabolic disturbances of diabetes mellitus, Lancet, 1 (1963), 785–789.
[176]	S. C.Walpole, D. Prieto-Merino, P. Edwards, J. Cleland, G. Stevens, I. Roberts, The weight of nations: an estimation of adult human biomass, BMC Public Health, 12 (2012), 439.
[177]	M. E. Clegg, M. Pratt, O. Markey, A. Shafat, C. J. K. Henry, Addition of different fats to a carbohydrate food: impact on gastric emptying, glycaemic and satiety responses and comparison with in vitro digestion, Food Res. Int., 48 (2012), 91–97.
[178]	J. G. Moore, P. E. Christian, J. A. Brown, C. Brophy, F. Datz, A. Taylor, et. al., Influence of meal weight and caloric content on gastric emptying of meals in man, Digest. Dis. Sci., 29 (1984), 513–519.
[179]	L. Achour, S. Méance, A. Briend, Comparison of gastric emptying of a solid and a liquid nutritional rehabilitation food, Eur. J. Clin. Nutr., 55 (2001), 769–772.
[180]	I. Locatelli, A. Mrhar, M. Bogataj, Gastric emptying of pellets under fasting conditions: a mathematical model, Pharm. Res., 26 (2009), 1607–1617.
[181]	Fasting blood sugar levels, 2017. Available from: https://www.diabetes.co.uk.
[182]	R Core Team, R: a language and environment for statistical computing, 1st edition, Vienna, Austria, 2019.
[183]	M. Macht, G. Simon, Emotions and eating in everyday life, Appetite, 35 (2000), 65–71.
[184]	L. Canetti, E. Bachar, E. M. Berry, Food and emotion, Behav. Process., 60 (2002), 157–164.
[185]	S. Peciña, K. S. Smith, Hedonic and motivational roles of opioids in food reward: implications for overeating disorders, Pharmacol. Biochem. Behav., 97 (2010), 34–46.
[186]	S. Fulton, Appetite and reward, Front. Neuroendocrin., 31 (2010), 85–103.
[187]	A. H. Sam, R. C. Troke, T. M. Tan, G. A. Bewick, The role of the gut/brain axis in modulating food intake, Neuropharmacology, 63 (2012), 46–56.
[188]	E. L. Gibson, Emotional influences on food choice: sensory, physiological and psychological pathways, Physiol. Behav., 89 (2006), 53–61.
[189]	D. A. Zellner, S. Loaiza, Z. Gonzalez, J. Pita, J. Morales, D. Pecora, et. al., Food selection changes under stress, Physiol. Behav., 87 (2006), 789–793.
[190]	P. M. A. Desmet, H. N. J. Schifferstein, Sources of positive and negative emotions in food experience, Appetite, 50 (2008), 290–301.
[191]	M. Macht, How emotions affect eating: a five-way model, Appetite, 50 (2008), 1–11.
[192]	L. E. Stoeckel, R. E. Weller, E. W. Cook, D. B. Twieg, R. C. Knowlton, J. E. Cox, Widespread reward-system activation in obese women in response to pictures of high-calorie foods, NeuroImage, 41 (2008), 636–647.
[193]	E. Näslund, P. M. Hellström, Appetite signaling: from gut peptides and enteric nerves to brain, Physiol. Behav., 92 (2007), 256–262.
[194]	L. Brondel, M. Romer, V. Van Wymelbeke, N. Pineau, T. Jiang, C. Hanus, et. al., Variety enhances food intake in humans: role of sensory-specific satiety, Physiol. Behav., 97 (2009), 44–51.
[195]	R. C. Havermans, N. Siep, A. Jansen, Sensory-specific satiety is impervious to the tasting of other foods with its assessment, Appetite, 55 (2010), 196–200.
[196]	G. Finlayson, A. Arlotti, M. Dalton, N. King, J. E. Blundell, Implicit wanting and explicit liking are markers for trait binge eating. a susceptible phenotype for overeating, Appetite, 57 (2011), 722–728.
[197]	R. J. Stevenson, M. Mahmut, K. Rooney, Individual differences in the interoceptive states of hunger, fullness and thirst, Appetite, 95 (2015), 44–57.
[198]	H. Wang, J. Li, Y. kuang, Mathematical modeling and qualitative analysis of insulin therapies, Math. Biosci., 210 (2007), 17–33.
[199]	D. M. Thomas, A. Ciesla, J. A. Levine, J. G. Stevens, C. K. Martin, A mathematical model of weight change with adaptation, Math. Biosci. Eng., 6 (2009), 873–887.
[200]	C. L. Chen, H. W. Tsai, Modeling the physiological glucose–insulin system on normal and diabetic subjects, Comput. Meth. Prog. Bio., 97 (2010), 130–140.
[201]	C. C. Y. Noguchi, E. Furutani, S. Sumi, Enhanced mathematical model of postprandial glycemic excursion in diabetics using rapid-acting insulin, 2012 Proceedings of SICE Annual Conference (SICE), Akita, (2012), 566–571.
[202]	H. Zheng, H. R. Berthoud, Eating for pleasure or calories, Curr. Opin. Pharmacol., 7 (2007), 607–612.
[203]	O. B. Chaudhri, C. J. Small, S. R. Bloom, The gastrointestinal tract and the regulation of appetite, Drug Discov. Today, 2 (2005), 289–294.
[204]	B. M. McGowan, S. R. Bloom, Gut hormones regulating appetite and metabolism, Drug Discov. Today, 4 (2007), 147–151.
[205]	B. Meister, Neurotransmitters in key neurons of the hypothalamus that regulate feeding behavior and body weight, Physiol. Behav., 92 (2007), 263–271.
[206]	S. Higgs, J. Thomas, Social influences on eating, Curr. Opin. Behav. Sci., 9 (2016), 1–6.
[207]	S. Griffioen-Roose, G. Finlayson, M. Mars, J. E. Blundell, C. de Graaf, Measuring food reward and the transfer effect of sensory specific satiety, Appetite, 55 (2010), 648–655.
[208]	K. C. Berridge, T. E. Robinson, J. W. Aldridge, Dissecting components of reward: 'liking', 'wanting', and learning, Curr. Opin. Pharmacol., 9 (2009), 65–73.
[209]	K. C. Berridge, 'liking' and 'wanting' food rewards: brain substrates and roles in eating disorders, Physiol. Behav., 97 (2009), 537–550.
[210]	K. C. Berridge, C. Y. Ho, J. M. Richard, A. G. DiFeliceantonio, The tempted brain eats: pleasure and desire circuits in obesity and eating disorders, Brain Res., 1350 (2010), 43–64.
[211]	R. C. Havermans, "You say it's liking, i say it's wanting …". on the difficulty of disentangling food reward in man, Appetite, 57 (2011), 286–294.
[212]	R. C. Havermans, How to tell where 'liking' ends and 'wanting' begins, Appetite, 58 (2012), 252–255.
[213]	G. Finlayson, M. Dalton, Current progress in the assessment of 'liking' vs. 'wanting' food in human appetite. comment on "you say it's liking, i say it's wanting...". on the difficulty of disentangling food reward in man, Appetite, 58 (2012), 373–378; 252–255.
[214]	P. W. J. Maljaars, H. P. F. Peters, D. J. Mela, A. a. M. Masclee, Ileal brake: a sensible food target for appetite control. a review, Physiol. Behav., 95 (2008), 271–281.
[215]	H. S. Shin, J. R. Ingram, A. T. McGill, S. D. Poppitt, Lipids, CHOs, proteins: can all macronutrients put a 'brake' on eating?, Physiol. Behav., 120 (2013), 114–123.
[216]	A. M. Wren, L. J. Seal, M. A. Cohen, A. E. Brynes, G. S. Frost, K. G. Murphy, et. al., Ghrelin enhances appetite and increases food intake in humans, J. Clin. Endocrinol. Metab., 86 (2001), 5992.
[217]	K. A. Levin, Study design III: cross-sectional studies, Evid. Based Dent., 7 (2006), 24–25.
[218]	J. Tack, K. J. Lee, Pathophysiology and treatment of functional dyspepsia, J. Clin. Gastroenterol., 39 (2005), 211–216.
[219]	S. A. Murray, M. Kendall, K. Boyd, A. Sheikh, Illness trajectories and palliative care, BMJ, 330 (2005), 1007–1011.
[220]	M. Binn, C. Albert, A. Gougeon, H. Maerki, B. Coulie, M. Lemoyne, et. al., Ghrelin gastrokinetic action in patients with neurogenic gastroparesis, Peptides, 27 (2006), 1603–1606.
[221]	A. Abizaid, T. L. Horvath, Brain circuits regulating energy homeostasis, Regul. Peptides, 149 (2008), 3–10.
[222]	M. Traebert, T. Riediger, S. Whitebread, E. Scharrer, H. A. Schmid, Ghrelin acts on leptin-responsive neurones in the rat arcuate nucleus, J. Neuroendocrinol., 14 (2002), 580–586.
[223]	Y. C. L. Tung, A. K. Hewson, S. L. Dickson, Actions of leptin on growth hormone secretagogue-responsive neurones in the rat hypothalamic arcuate nucleus recorded in vitro, J. Neuroendocrinol., 13 (2001), 209–215.
[224]	N. Sáinz, J. Barrenetxe, M. J. Moreno-Aliaga, J. A. Martínez, Leptin resistance and diet-induced obesity: central and peripheral actions of leptin, Metabolism, 64 (2015), 35–46.
[225]	NHANES - participants - why I was selected, 2019. Available from: https://www.cdc.gov/nchs/nhanes/participant/participant-selected.htm.
[226]	E. Archer, G. A. Hand, S. N. Blair, Validity of U.S. nutritional surveillance: national health and nutrition examination survey caloric energy intake data, 1971–2010, PLoS ONE, 8 (2013).
[227]	E. Archer, G. Pavela, C. J. Lavie, The inadmissibility of what we eat in America and NHANES dietary data in nutrition and obesity research and the scientific formulation of national dietary guidelines, Mayo Clin. Proc., 90 (2015), 911–926.
[228]	E. Archer, C. J. Lavie, J. O. Hill, The failure to measure dietary intake engendered a fictional discourse on diet-disease relations, Front. Nutr., 5 (2018), 105.

This article has been cited by:

1.	Seetha S, Esther Daniel, S Durga, Jennifer Eunice R, Andrew J, PDSCM: Packet Delivery Assured Secure Channel Selection for Multicast Routing in Wireless Mesh Networks, 2023, 11, 2227-7080, 130, 10.3390/technologies11050130
2.	Ioannis Koulouras, Ilias Bobotsaris, Spiridoula V. Margariti, Eleftherios Stergiou, Chrysostomos Stylios, Assessment of SDN Controllers in Wireless Environment Using a Multi-Criteria Technique, 2023, 14, 2078-2489, 476, 10.3390/info14090476

Reader Comments

Your name:*

Email:*
© 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Mathematical Biosciences and Engineering

4.4

Metrics

Article views(20316) PDF downloads(901) Cited by(13)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(9) / Tables(2)

Mathematical Biosciences and Engineering

A mathematical model of food intake

Related Papers:

Abstract

1. Introduction

2. Related work

3. Design of SDWN intelligent multicast routing system architecture

3.1. Multicast problem description

3.2. SDWN intelligent multicast routing architecture

3.2.1. Data plane

3.2.2. Control plane

3.2.3. Application plane

3.2.4. Knowledge plane

4. MADRL-MR: An intelligent multicast routing algorithm with multi-agent deep reinforcement learning

4.1. Designed reinforcement learning agents

4.1.1. State space

4.1.2. Action space

4.1.3. Reward function

4.1.4. A2C network parameter update

4.1.5. Multi-agent training strategy for MADRL-MR design

4.1.6. Reinforcement learning with transfer learning mechanisms

4.2. Design of the MADRL-MR algorithm

5. Experimental setup and performance evaluation

5.1. Experimental environment

5.2. Performance metrics

5.3. Transfer learning performance and parameter settings

5.4. Comparative experiment

6. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Mathematical Biosciences and Engineering

A mathematical model of food intake

Related Papers:

Abstract

1. Introduction

2. Related work

3. Design of SDWN intelligent multicast routing system architecture

3.1. Multicast problem description

3.2. SDWN intelligent multicast routing architecture

3.2.1. Data plane

3.2.2. Control plane

3.2.3. Application plane

3.2.4. Knowledge plane

4. MADRL-MR: An intelligent multicast routing algorithm with multi-agent deep reinforcement learning

4.1. Designed reinforcement learning agents

4.1.1. State space

4.1.2. Action space

4.1.3. Reward function

4.1.4. A2C network parameter update

4.1.5. Multi-agent training strategy for MADRL-MR design

4.1.6. Reinforcement learning with transfer learning mechanisms

4.2. Design of the MADRL-MR algorithm

5. Experimental setup and performance evaluation

5.1. Experimental environment

5.2. Performance metrics

5.3. Transfer learning performance and parameter settings

5.4. Comparative experiment

6. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog