Review

HIV Research with Men who Have Sex with Men (MSM): Advantages and Challenges of Different Methods for Most Appropriately Targeting a Key Population

  • Received: 11 February 2017 Accepted: 16 May 2017 Published: 17 May 2017
  • The difficulty in accessing hard-to-reach populations as men who have sex with men presents a dilemma for HIV surveillance as their omission from surveillance systems leaves significant gaps in our understanding of HIV/AIDS epidemics. Several methods for recruiting difficult-to-access populations and collecting data on trends of HIV prevalence and behavioural factors for surveillance and research purposes have emerged. This paper aims to critically review different sampling approaches, from chain-referral and venue-based to respondent-driven, time-location and internet sampling methods, focusing on its main advantages and challenges for conducting HIV research among key populations, such as men who have sex with men. The benefits of using these approaches to recruit participants must be weighed against privacy concerns inherent in any social situation or health condition. Nevertheless, the methods discussed in this paper represent some of the best efforts to effectively reach most-at-risk subgroups of men who have sex with men, contributing to obtain unbiased trends of HIV prevalence and HIV-related risk behaviours among this population group.

    Citation: Ana Gama, Maria O. Martins, Sónia Dias. HIV Research with Men who Have Sex with Men (MSM): Advantages and Challenges of Different Methods for Most Appropriately Targeting a Key Population[J]. AIMS Public Health, 2017, 4(3): 221-239. doi: 10.3934/publichealth.2017.3.221

    Related Papers:

    [1] Purabi R. Ghosh, Derek Fawcett, Michael Platten, Shashi B. Sharma, John Fosu-Nyarko, Gerrard E. J. Poinern . Sustainable green chemical synthesis of discrete, well-dispersed silver nanoparticles with bacteriostatic properties from carrot extracts aided by polyvinylpyrrolidone. AIMS Materials Science, 2020, 7(3): 269-287. doi: 10.3934/matersci.2020.3.269
    [2] Harikrishnan Pulikkalparambil, Jyotishkumar Parameswaranpillai, Jinu Jacob George, Krittirash Yorseng, Suchart Siengchin . Physical and thermo-mechanical properties of bionano reinforced poly(butylene adipate-co-terephthalate), hemp/CNF/Ag-NPs composites. AIMS Materials Science, 2017, 4(3): 814-831. doi: 10.3934/matersci.2017.3.814
    [3] Takayuki Aoyama, Mari Aoki, Isao Sumita, Atsushi Ogura . Effects of particle size of aluminum powder in silver/aluminum paste on n-type solar cells. AIMS Materials Science, 2018, 5(4): 614-623. doi: 10.3934/matersci.2018.4.614
    [4] Nikolaos D. Papadopoulos, Polyxeni Vourna, Pinelopi P. Falara, Panagiota Koutsaftiki, Sotirios Xafakis . Dual-function coatings to protect absorbent surfaces from fouling. AIMS Materials Science, 2023, 10(6): 981-1003. doi: 10.3934/matersci.2023053
    [5] Timothy K. Mulenga, Albert U. Ude, Chinnasamy Vivekanandhan . Concise review on the mechanical characteristics of hybrid natural fibres with filler content. AIMS Materials Science, 2020, 7(5): 650-664. doi: 10.3934/matersci.2020.5.650
    [6] Abdulkader A. Annaz, Saif S. Irhayyim, Mohanad L. Hamada, Hashim Sh. Hammood . Comparative study of mechanical performance between Al–Graphite and Cu–Graphite self-lubricating composites reinforced by nano-Ag particles. AIMS Materials Science, 2020, 7(5): 534-551. doi: 10.3934/matersci.2020.5.534
    [7] Nhung Thi-Tuyet Hoang, Anh Thi-Kim Tran, Nguyen Van Suc, The-Vinh Nguyen . Antibacterial activities of gel-derived Ag-TiO2-SiO2 nanomaterials under different light irradiation. AIMS Materials Science, 2016, 3(2): 339-348. doi: 10.3934/matersci.2016.2.339
    [8] Falah Mustafa Al-Saraireh . Cold-curing mixtures based on biopolymer lignin complex for casting production in single and small-series conditions. AIMS Materials Science, 2023, 10(5): 876-890. doi: 10.3934/matersci.2023047
    [9] Mohamed Samy El-Feky, Passant Youssef, Ahmed Maher El-Tair, Sara Ibrahim, Mohamed Serag . Effect of nano silica addition on enhancing the performance of cement composites reinforced with nano cellulose fibers. AIMS Materials Science, 2019, 6(6): 864-883. doi: 10.3934/matersci.2019.6.864
    [10] Leonid A. Kaledin, Fred Tepper, Yuly Vesga, Tatiana G. Kaledin . The effect of the surface roughness and ageing characteristics of boehmite on the removal of biological particles from aqueous solution. AIMS Materials Science, 2019, 6(4): 498-508. doi: 10.3934/matersci.2019.4.498
  • The difficulty in accessing hard-to-reach populations as men who have sex with men presents a dilemma for HIV surveillance as their omission from surveillance systems leaves significant gaps in our understanding of HIV/AIDS epidemics. Several methods for recruiting difficult-to-access populations and collecting data on trends of HIV prevalence and behavioural factors for surveillance and research purposes have emerged. This paper aims to critically review different sampling approaches, from chain-referral and venue-based to respondent-driven, time-location and internet sampling methods, focusing on its main advantages and challenges for conducting HIV research among key populations, such as men who have sex with men. The benefits of using these approaches to recruit participants must be weighed against privacy concerns inherent in any social situation or health condition. Nevertheless, the methods discussed in this paper represent some of the best efforts to effectively reach most-at-risk subgroups of men who have sex with men, contributing to obtain unbiased trends of HIV prevalence and HIV-related risk behaviours among this population group.


    One of the most advanced artificial intelligence (AI) technologies of the last decade is deep learning, which uses neural networks to process and understand complex data. In the last few decades, we have seen deep learning advance from just a theory to something hugely useful. Our research is centered on developing technology to apply deep learning research to fields like epidemiology by building on the so-called disease-informed neural networks (DINNs) designed by [1].

    The concept of artificial neural networks originated in the 1940s when neuroscientists Warren McCulloch and Walter Pitts noticed that human brain neurons work similarly to operators producing results based on thresholds [2]. In 1943, they envisioned introducing intelligence to the world [3]. Their approach involved mimicking the brain's functionality using neurons as units, inspired by the structure of human brain neurons transmitting signals to make decisions, thereby pioneering an artificial neural network [4]. According to [2], this neural network model they developed generated interest in the AI community despite limitations at that time. A breakthrough came in the 1970s with Seppo Linnainmaa's discovery of backpropagation, which greatly improved network performance. It was not until 2012, when students made advancements in the ImageNet competition, that neural networks gained recognition and popularity [2]. The achievement propelled networks, like convolutional neural networks (CNNs) and long short-term memory (LSTM) networks, to the forefront of AI research. The advancements in technology have allowed deep learning systems to tackle pattern recognition issues in fields marking a step forward in the field of AI [5].

    Epidemiological models, like the susceptible infected recovered (SIR) model, have played a role in understanding how diseases spread [6,7]. However, they are sometimes challenged in capturing the details of real-world data [8]. They face two key limitations in epidemic prediction: limited accuracy when fitting complex disease patterns, and difficulty in handling incomplete data [6,9]. Specifically, their rigid parameter estimation approach often struggles to capture the nonlinear dynamics seen in avian influenza transmission between bird and human populations [10,11]. Today, the challenge and urgency in understanding the dynamics of infectious diseases, as exemplified by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, has led to the use of advanced AI techniques. The development of DINNs, which combines AI techniques with these models, represents an innovative integration of epidemiological data into the neural network framework and can improve prediction accuracy through advanced data analysis methods, improved parameter estimations, and modeling complex relationships between different variables [12,13]. DINNs directly address the limitations mentioned earlier in two ways. First, they achieve higher prediction accuracy by simultaneously optimizing model parameters while enforcing epidemiological constraints through a unified loss function [1,14]. This dual optimization will allow our DINN to better capture the complex transmission patterns of avian influenza. Second, DINNs handle modeling complexities more effectively through their neural network architecture [15,16], which can automatically adapt to nonlinear incidence rates and varying transmission patterns without requiring mathematical reformulation of the underlying equations.

    While deep learning models have shown promise, there are still obstacles to overcome, such as issues related to data quality and quantity, ensuring model transparency, and effectively incorporating real-time data [9,13]. Tackling these challenges presents opportunities for advancing modeling and creating resilient and understandable models [12].

    The solid mathematical foundation underlying deep learning is crucial, as it enhances the ability of neural networks to process visual information and enables them to solve complex mathematical puzzles, thus expanding what computers can do beyond previous limitations. This collaboration between deep learning and mathematics is essential, as highlighted by noted researchers [4,17]. Despite its capabilities, deep learning as a field is still maturing, with ongoing research focused on uncovering its theoretical foundations to ensure these systems are both effective and equitable [17].

    Building on these ideas, the DINNs are designed to utilize the detailed compartmentalization from classical epidemiology and the learning capabilities of neural networks to offer more precise predictions and effective management strategies for infectious diseases. The ongoing research on DINNs goes beyond modifying existing network structures; it involves establishing a reliable methodology that can aid real-world decision-making in public health [15].

    This work will explore the mathematics behind neural networks, discuss their development over time, and explain how they are being used to address the challenges presented by infectious diseases using DINNs. By combining theoretical analysis with real-world applications in predicting diseases, this study intends to add to academic understanding and effective approaches in tackling worldwide health emergencies. It is important to note that for the remainder of this work, whenever we refer to "neuron", we are actually referring to an "artificial neuron".

    Research questions

    1) How can traditional epidemiological models be enhanced using deep neural networks?

    2) What mathematical foundations are needed to integrate epidemiological parameters into neural network architectures?

    3) How can the integration of classical epidemiological models with modern neural network architectures improve public health decision-making?

    Here, we state some important definitions and concepts that are useful for this research and that have been adapted from [17,18,19].

    Definition 1.1. An artificial neuron with weights w1,,wnR, bias bR, and activation function ρ:RR is defined as the function f:RnR given by

    f(x1,,xn)=ρ(ni=1xiwi+b)=ρ((x,w)+b), (1.1)

    where w=(w1,,wn) and x=(x1,,xn).

    Some common activation functions include sigmoid, Tanh, rectified linear unit (ReLU), etc. [20]. Every activation function comes with its benefits. For example, ReLU assists in addressing the issue of vanishing gradients often encountered in networks [18], and it is inspired by the thresholding behavior of neurons where only positive signals are transmitted. We can find more activation functions together with their benefits in [18] or other sources before selecting a particular activation function. For this research, we used ReLU and Tanh.

    Figure 1.  Graphical representation of an artificial neuron f:RnR.

    Layer: In a neural network, a layer is a collection of nodes or neurons that operate parallelly, receiving either the input data or the output from the preceding layer. Neural networks can consist of input, hidden, and output layers.

    Definition 1.2. (Artificial neural network) Let dN be the dimension of the input layer, let L be the number of layers, let N0:=d,N,=1,,L, be the dimensions of the hidden and last layer, let ρ:RR be a nonlinear activation function, and, for =1,,L, let T be the affine-linear functions

    T:RN1RN,Tx=W()x+b(),

    with W()RN×N1 being the weight matrices and b()RN the bias vectors of the th layer. Then Φ:RdRNL given by

    Φ(x)=TLρ(TL1ρ(ρ(T1(x)))),xRd,

    is called a (deep) neural network. This composition of affine-linear functions with a nonlinear activation function, is referred to as a neural network of depth L.

    Figure 2.  Deep Neural Network Φ:R10R2 of depth 4. We plotted this using the software in [21].

    Definition 1.3. (Feed forward neural networks (FFNNs) process): Let a(0)=xRd be the input vector. For each value of , from 1, to L, the outputs of the layer denoted by are determined by the activations a().

    a()=ρ(W()a(1)+b()),

    where W()RN×N1 is the weight matrix, a(1)RN1 is the activation vector from the previous layer, and b()RN is the bias vector.

    Optimization algorithms: Optimization algorithms are crucial for training neural networks. They iteratively adjust the network's weights to minimize the loss function, which measures the discrepancy between the predicted output and the actual target. Commonly used optimization algorithms include stochastic gradient descent (SGD), adaptive gradient gradient algorithm (AdaGrad), root mean square propagation (RMSprop), and Adam optimizer (adaptive moment estimation)[22].

    During optimization, we aim to minimize the loss function. The common loss functions include the mean squared error (MSE) and cross-entropy loss, represented mathematically as:

    L=1nni=1L(yi,ˆyi),

    where L is the loss function, yi is the actual target, and ˆyi is the predicted output [18].

    For regression tasks, the MSE is often used and it is defined as:

    L(y,ˆy)=(yˆy)2.

    For classification tasks, the cross-entropy loss is commonly used and it is defined as:

    L(y,ˆy)=Cc=1yclog(ˆyc),

    where C is the number of classes, yc is a binary indicator (0 or 1) if class label c is the correct classification for observation y, and ˆyc is the predicted probability of observation y being in class c.

    During training, we use the backpropagation method to determine fine-tuning network parameters. Its primary objective is to minimize the loss function, which gauges the variance between the output and the desired target [19].

    Mathematically, the gradient of the loss L with respect to the weights w(l)ji is computed as:

    Lw(l)ji=La(l)ja(l)jz(l)jz(l)jw(l)ji,i,j=1,...,n

    For our work, we used the Adam optimizer as an optimization algorithm that combines the advantages of two other extensions of SGD, that is to say, AdaGrad and RMSprop. From AdaGrad, Adam adapts the learning rates for each parameter based on the magnitude of past squared gradients, ensuring infrequent parameters receive larger updates, which is especially useful for sparse data. In addition, from RMSprop, Adam improves on AdaGrad by using an exponentially decaying average of squared gradients to prevent the learning rate from shrinking too much, which ensures that the learning process remains efficient even in the later stages of training.

    Adam maintains two moving averages of the gradients, which are used to adapt the learning rate per parameter.

    The updated rules for Adam (used in our code) are as follows:

    1) Compute the moving averages of the gradient (mt) and the squared gradient (vt) as

    mt=β1mt1+(1β1)Lvt=β2vt1+(1β2)(L)2.

    2) Correct the bias in the estimates as

    ˆmt=mt1βt1ˆvt=vt1βt2.

    3) Update the weights as

    w(l)ji=w(l1)jiηˆmtˆvt+ϵ.

    Here, mt and vt are the first and second moment estimates, respectively, and β1, β2, and ϵ are hyperparameters. The default values recommended by the authors are β1=0.9, β2=0.999, and ϵ=108 [23].

    Expressivity of DNNs: Expressivity in neural networks aims to understand the extent to which aspects of a neural network's architecture affect its performance in deep learning. Methods from applied harmonic analysis and approximation theory are typically employed to study this aspect [17]. The universal approximation theorem states that a neural network with at least one hidden layer and nonlinear activation functions can approximate any continuous function to arbitrary precision, given sufficient neurons [24]. This theorem underscores the power and flexibility of neural networks in function approximation, asserting that a feedforward network with a single hidden layer containing a finite number of neurons can approximate any continuous function on compact subsets of Rn, under mild assumptions on the activation function [25,26].

    Theorem 1. (Universal approximation theorem)[17] Let dN, KRd be compact, f:KR be continuous, and ρ:RR be continuous and not a polynomial. Then, for each ϵ>0, there exist NN, ak,bkR, and wkRd, 1kN, such that

    fNk=1akρ(wk,bk)ϵ.

    In short, it states that each continuous function on a compact domain can be approximated to an arbitrary accuracy by a shallow neural network.

    In this study, we examine an SI-SIR type of compartmental model focusing on avian influenza disease using a DINN. We begin by presenting the mathematical model formulation we will consider.

    1) Avian influenza introduction and current context

    Avian influenza, commonly known as bird flu, is a disease that mainly impacts birds but can also affect humans and other animals. This disease is caused by influenza A viruses from the Orthomyxoviridae family, known for their ability to change and create strains with levels of severity and ease of transmission [27].

    Avian influenza viruses are categorized into two groups based on how harmful they are to poultry, that is to say, low pathogenic avian influenza (LPAI) and highly pathogenic avian influenza (HPAI). HPAI strains like H5N1 and H7N9 have led to outbreaks in birds and have occasionally crossed over to humans, causing respiratory problems and high mortality rates [28,29].

    The transmission of avian influenza viruses to humans can happen through direct contact with infected birds, contaminated surroundings, or, less frequently, through human-to-human contact. The risk of these viruses causing pandemics is a concern in public health since they can undergo genetic changes that make them more easily transmissible among people [30]. Therefore, it is vital to understand how avian influenza spreads and develop strategies to control it in order to prevent and manage outbreaks. Recent occurrences have emphasized the significance of bird flu, as a public health issue. For example, on March 25, 2024, the Centers for Disease Control and Prevention (CDC) in the United States announced instances of H5N1 outbreaks with cases detected in animals like dairy cows. This marked the discovery of these avian flu strains in cattle [31].

    There are worries about the virus's ability to change and spread easily among people, leading to increased efforts in monitoring and controlling outbreaks [32].

    In Canada, authorities like the Canadian Food Inspection Agency (CFIA) and the Public Health Agency of Canada (PHAC) are taking steps to address the threat of avian influenza by strengthening surveillance and biosecurity measures. Following the detection of H5N1 in U.S. dairy cattle, they have increased testing of imported dairy cattle, checking milk for traces, and conducting voluntary testing on asymptomatic cows to ensure safety [33]. These actions are aimed at preventing the virus from entering Canadian livestock and safeguarding the food supply. The rapid spread of the H5N1 clade 2.3.4.4b virus globally has caused concern with detections in wild bird species and domestic poultry, leading to mass culling efforts to contain it in North America and Canada where it has impacted commercial and non-commercially bird populations significantly [34].

    Inspired by all these, we decided to choose this disease for our implementation.

    2) Model equations for avian influenza

    In our study, we use a version of the SI-SIR model to examine how avian influenza transmission occurs. This model is based on the work of Yu and Ma [11] which looks at nonlinear incidence and recovery rates in both deterministic and stochastic environments [11]. Unlike the model's varying recovery rate in their study, our simplified version assumes a constant recovery rate for humans. This simplification allows us to focus on key transmission dynamics without dealing with the complexities of changing recovery rates. The SI-SIR model's differential equations are as follows:

    dSadt=raSa(1SaKa)βaIaSa1+bIa (2.1)
    dIadt=βaIaSa1+bIa(μa+δa)Ia (2.2)
    dShdt=ΠhβhIaSh1+cI2hμhSh (2.3)
    dIhdt=βhIaSh1+cI2h(μh+δh+γ)Ih (2.4)
    dRhdt=γIhμhRh, (2.5)

    where the model variables and parameters are summarized in the tables below.

    It is important to note that the transmission rate from avian to avian (βa) was varied in the study conducted by [11] to examine different scenarios.

    When βa=0.8×108, the basic reproduction number is R0=0.54<1, leading to disease extinction.

    When βa=2.6×108, R0=1.75>1, resulting in disease persistence. According to [11], increasing the parameter βa impacts whether the disease persists or is eliminated, resulting in number of infected individuals. It is also important to note that in [11], a comprehensive stability analysis of this model was performed, proving that the system exhibits a forward bifurcation behavior. Their analysis demonstrated that when R0<1, the disease-free equilibrium is globally asymptotically stable, and when R0>1, the endemic equilibrium is globally asymptotically stable. This mathematical property ensures the model has a clear threshold behavior, with no possibility of backward bifurcation or multiple endemic equilibria. These stability properties support our use of this model framework for examining disease dynamics through DINNs.

    This model helps in understanding how avian influenza spreads and identifies strategies for controlling it. It explores how avian and human populations interact, taking into account transmission patterns and consistent recovery rates. While the original model by [11] offers insights by incorporating variable recovery rates, it also adds complexity. Through this approach, our goal is to offer an explanation of the fundamental mechanisms behind avian influenza spread and provide valuable insights, for shaping public health policies aimed at managing outbreaks.

    To illustrate the dynamics of our avian influenza model under different scenarios, we vary the avian to avian transmission rate (βa) to examine its impact on disease dynamics, as this parameter plays an important role in determining whether the disease persists or becomes extinct. We then conduct numerical simulations similar to one of those conducted by [11] using the same parameters as used in their study.

    Figure 3 shows the results of these simulations for different values of βa. The left subplot illustrates scenarios where the basic reproduction number is R0<1, leading to disease extinction, while the right subplot shows cases where R0>1, resulting in disease persistence.

    Figure 3.  Avian influenza dynamics for different βa values.

    In our simulations, we used the initial conditions as specified in Table 1 and the parameter values as detailed in Table 2. The time span for the simulation was set from 0 to 100,000 days to better understand the long-term behavior of the pandemic. Additionally, it is worth noting the computational complexity underlying our implementation. The system is solved through numerical integration over 100,000 days sampled at 1000 evenly spaced points, with each time point tracking five coupled state variables. This generates a dataset of 5000 data points in total, for all the 5 compartments, in capturing the transmission dynamics. The integration process handles multiple nonlinear terms simultaneously, including density-dependent growth and varying transmission rates. We examine six distinct transmission scenarios (βa ranging from 0.8×108 to 2.6×108), each requiring separate numerical integration and demonstrating the model's robustness in capturing both disease extinction and persistence regimes. This computational framework ensures accurate solutions across different parameter regimes while maintaining the biological constraints inherent in the epidemiological system.

    Table 1.  Model variables and initial values.
    Variable Description Initial Value Reference
    Sa Susceptible avian population 9900 [10]
    Ia Infectious avian population 100 [10]
    Sh Susceptible human population 9999 [10]
    Ih Infectious human population 1 [10]
    Rh Recovered human population 0 [10]

     | Show Table
    DownLoad: CSV
    Table 2.  Model parameters.
    Parameter Description Value Reference
    ra Intrinsic growth rate of avian population 5×103 [10]
    Ka Carrying capacity for avian population 5×104 [10]
    βa Transmission rate from avian to avian Varied: 0.8×108 to 2.6×108 [11]
    βh Transmission rate from avian to human 8×107 [10]
    μa Natural death rate for avian population 3.4246×104 [10]
    b Nonlinear incidence coefficient (avian) 0.001 [11]
    μh Natural death rate for human population 3.91×105 [10]
    δa Disease-induced death rate for avian population 4×104 [10]
    c Nonlinear incidence coefficient (human) 0.01 [11]
    γ Constant recovery rate of human population 0.1 [10]
    δh Disease-induced death rate for human population 0.077 [10]
    Πh Recruitment rate of human population 30 [10]

     | Show Table
    DownLoad: CSV

    As depicted in Figure 3, when βa1.3×108, the number of infected humans initially increases but eventually declines to zero, indicating disease extinction. Conversely, for βa1.6×108, the number of infected humans stabilizes at a nonzero value, signifying disease persistence.

    These simulations confirm the findings of [11], that is to say, the avian-to-avian transmission rate βa is a critical parameter in determining the long-term behavior of the epidemic.

    Higher values of βa lead to a larger number of infected humans at equilibrium, underscoring the importance of controlling avian-to-avian transmission in managing avian influenza outbreaks.

    The data generated from these simulations will be instrumental in training and validating our DINN model, enabling us to assess its effectiveness in capturing both disease extinction and persistence scenarios in the next section.

    The combination of machine learning methods, with modeling has become a tool for comprehending and forecasting disease patterns [16]. DINNs, as introduced by [1], are a technique merging neural network adaptability with the expert knowledge inherent in epidemiological models. DINNs are a type of neural network that incorporates knowledge of disease mechanisms and biology into their architecture and learning process. By leveraging insights from disease pathology, DINNs aim to improve the diagnosis, prognosis, and treatment of diseases.

    DINNs are a specialized type of physics-informed neural networks (PINNs) designed for epidemiological applications[14]. This approach uses effectively the universal approximation capabilities of neural networks while integrating the foundational structure of disease transmission models. This combination enables the DINNs to grasp the dynamics while upholding the core principles of epidemiology [35].

    A key strength of the DINNs is their capacity to learn from both the data and the governing equations of disease models simultaneously. This dual learning approach improves the prediction accuracy in scenarios with noisy data, which is common during the early stages of disease outbreaks [35].

    Although important work on DINNs has been done by [1] and other related works by [16] and [36], these studies have inspired us to further optimize the DINN model and apply it to a nonlinear avian influenza model for our study. Some of the choices made in our research such as the number of neurons in each layer of the network, learning rate, search range, and number of epochs, are based on the findings by [1].

    For our avian influenza model, we apply the DINN framework as illustrated in Figure 4. Our specific definition of the DINN, as illustrated in Figure 4, consists of a neural network defined in our code as net_si_sir with 12 hidden layers, each containing 64 neurons and using a ReLU activation function. The network takes time t as input and output the normalized compartment values ˆSa(t),ˆIa(t),ˆSh(t),ˆIh(t),ˆRh(t).

    Figure 4.  A DINN architecture for avian influenza model.

    We incorporate 12 learnable parameters corresponding to the avian model parameters (ra, Ka, μa, δa, Πh, βh, βa, μh, δh, b, c, and γ). These parameters are initialized randomly and constrained to biologically feasible ranges using a tanh transformation. For example, the avian-to-avian transmission rate βa is constrained to the range [0.8×108,2.6×108], which is within the values used in our previous simulations in Section 2.1.

    All computational experiments were performed on a high-performance virtual machine provided by the Laboratory of Mathematical Parallel Systems (LAMPS) together with one health modeling network for infectious diseases (OMNI) at York University. The system specifications are as follows:

    Device name: omni.hpc1

    Operating system: Microsoft Windows Server 2022 Standard, Version 21H2 (OS Build 20348.2700)

    Processor (CPU): AMD EPYC 7513 32-Core Processor at 2.60 GHz (Max Clock Speed: 2.95 GHz)

    Cores and logical processors: 16 cores, 16 logical processors

    Memory (RAM): 80.0 GB of installed physical memory; 75.2 GB available

    Storage: 699 GB total disk size with 643 GB available

    Virtualization platform: VMware 20.1 (System Manufacturer: VMware, Inc.)

    This high-performance computing environment facilitated the efficient training and evaluation of our DINN model. The substantial processing power and memory were essential for handling the computational demands of our DINN training, especially when integrating physics-informed components.

    In our implementation, we combine both data loss and physics loss in the loss function of our DINN to ensure that the model not only fits the data but also adheres to the underlying dynamics of the SI-SIR model for avian influenza.

    The loss function is formulated as:

    Total Loss=Data Loss+Physics Loss. (3.1)

    Data Loss: This measures the discrepancy between the network's predictions and the available synthetic data using the MSE:

    Data Loss=1NNi=1((ˆSa,iSa,i)2+(ˆIa,iIa,i)2+(ˆSh,iSh,i)2+(ˆIh,iIh,i)2+(ˆRh,iRh,i)2) (3.2)

    Here, ˆSa,i,ˆIa,i,ˆSh,i,ˆIh,i,ˆRh,i are the network's predictions at time ti, and Sa,i,Ia,i,Sh,i,Ih,i,Rh,i are the true values.

    Physics loss: This quantifies how well the network's predictions satisfy the differential equations of our SI-SIR model. Using automatic differentiation, we compute the time derivatives of the network outputs and compare them with the righthand sides of the SI-SIR model equations:

    Physics Loss=1NNi=1(f21(ti)+f22(ti)+f23(ti)+f24(ti)+f25(ti)), (3.3)

    where the residuals fj(ti) are defined as:

    f1(ti)=dˆSadt(raˆSa(1ˆSaKa)βaˆIaˆSa1+bˆIa) (3.4)
    f2(ti)=dˆIadt(βaˆIaˆSa1+bˆIa(μa+δa)ˆIa) (3.5)
    f3(ti)=dˆShdt(ΠhβhˆIaˆSh1+cˆI2hμhˆSh) (3.6)
    f4(ti)=dˆIhdt(βhˆIaˆSh1+cˆI2h(μh+δh+γ)ˆIh) (3.7)
    f5(ti)=dˆRhdt(γˆIhμhˆRh). (3.8)

    By minimizing the Total Loss in Eq (3.1), the DINN learns to produce predictions that not only fit the data but also satisfy the SI-SIR model dynamics.

    By carefully integrating the automatic differentiation capabilities of PyTorch and structuring the loss function to include both data fidelity and adherence to physical laws, our DINN effectively learns the dynamics of the avian influenza SI-SIR model. The combination of data loss and physics loss guides the neural network to produce accurate predictions that are consistent with both the observed data and the underlying epidemiological model. In our code, we utilize PyTorch's automatic differentiation capabilities to compute the time derivatives of the network outputs with respect to time t. The implementation details that describe the automatic differentiation capabilities of the PyTorch algorithm can be found in the Appendix.

    We use PyTorch for the training process, employing the Adam optimizer and a learning rate scheduler that decreases the rate by 10% every 1000 epochs. The network undergoes training for 50,000 epochs to ensure it reaches convergence.

    Throughout the training phase, the model not only learns to fit the given data but also adheres to the underlying equations of the SI-SIR model. This dual learning approach enables DINN to capture both the dynamics and the essential epidemiological principles governing avian influenza spread.

    Once trained, the DINN can forecast how avian influenza outbreaks evolve, potentially revealing insights that may not be clearly visible from the original differential equation model. Additionally, the trained network can offer parameter estimates that can be compared with values obtained through traditional estimation methods.

    By applying this DINN architecture to our avian influenza model, we aim to enhance predictions of disease spread in situations where data may be scarce or uncertain. This strategy combines modeling and machine learning strengths, potentially leading to precise forecasts and informed public health decisions [37].

    In this section, we provide an account of how we implemented our DINN approach for modeling avian influenza. We describe the process of training, analyze the outcomes of parameter estimation, and assess how well the model predicts disease patterns in scenarios. Our focus is on how the model captures both disease disappearance and persistence while evaluating its performance across varying transmission rates.

    In this section, we describe the training setup of our DINN model discussing the selected optimization method and the hyperparameters used. We then proceed to evaluate the accuracy of estimating parameters of the model by comparing the DINN's estimations with the target values and then provide a brief discussion regarding the conclusion and implication of the study.

    In the table below, we compare the final estimated parameters by our DINN model with their target values from the original model (see Eqs (2.1)–(2.5)). It is important to note that these results were obtained with specific hyperparameters: a learning rate of 1×104, 1000 time points, and a parameter search range of approximately 1000% for the same architecture as illustrated in Figure 4.

    In the Figure below, we show the evolution of the loss function over the course of training.

    We conduct training for our DINN model for over 50,000 epochs using the Adam optimizer with a learning rate scheduler. The training process lasts around 28 minutes in total, and Figure 5 illustrates how the loss function evolved during training.

    Figure 5.  Loss vs. Epochs during DINN Training.

    In this section, we review the outcomes from Section 4.1 of our DINN training process and assess how accurately parameters are estimated. We discuss the impact of hyperparameters on model performance, analyze the variation in parameter estimations across runs, and interpret how the loss function behaves during training. Furthermore, we investigate how the model reacts to parameters and discuss what our discoveries mean for applications of DINN in epidemiological modeling.

    It's important to stress that the values shown in Table 3 heavily rely on the chosen hyperparameters. The learning rate, number of time points, and parameter search range all have impacts on the DINN performance and the resulting parameter estimations. Thoughtful consideration of these hyperparameters is crucial for implementing the DINN model. For example, we noticed that when using a 0% search range (where the parameter was limited to its target value), DINN consistently replicated the target values regardless of epochs. This underscores how crucial the search range is to both the model's behavior and outcomes.

    Table 3.  Comparison of the DINN's estimated parameters with target values (learning rate: 1×104, time points: 1000).
    Parameter DINN Estimate Target Value Relative Error Search Range
    ra 0.004777 0.005000 4.46% [0.0045, 0.0055]
    Ka 50,005 50,000 0.01% [0,100,000]
    μa 0.00029792 0.00034246 13.01% [0.0002, 0.0004]
    δa 0.0004085 0.0004000 2.13% [0.0003, 0.0005]
    Πh 31 30 3.33% [70,130]
    βh 0.0000009723 0.0000008000 21.54% [0.0000006, 0.000001]
    βa 0.00000002402 0.000000026 7.62% [8×109, 2.6×108]
    μh 0.00003732 0.00003910 4.55% [0.000029, 0.000049]
    δh 0.070335 0.077000 8.66% [0.05, 0.09]
    b 0.000930 0.001000 7.00% [0.001, 0.003]
    c 0.01111 0.01000 11.10% [0.01, 0.03]
    γ 0.12409 0.10000 24.09% [0.15, 0.45]

     | Show Table
    DownLoad: CSV

    The search ranges for each parameter specified in our code are displayed in the rightmost column of Table 3. These ranges were established using the "tanh" function to keep the parameters within biologically reasonable limits while allowing for exploration around the desired values. For instance, the search range for ra was defined as [0.0045,0.0055], centered around its target value of 0.005. The broad search ranges (sometimes covering orders of magnitude) were selected to assess the DINNs capability to converge on accurate parameter values from an initial distribution. However, this decision also introduces variability in the outcomes as the model needs to navigate a large parameter space.

    The DINN model exhibited a strong performance in estimating most parameters, showing particularly high accuracy for the Ka, ra, δa, and Πh parameters. The carrying capacity Ka was estimated with outstanding precision, displaying just a 0.01% relative error. Parameters like μa, βh, and γ demonstrated errors which may indicate areas where the model is less sensitive or where there might be interactions between parameters that make precise estimation more challenging. It is important to note that for parameters with higher relative errors, the DINN estimates fall within the defined search ranges and biologically feasible limits. This suggests that the DINN approach can offer insights into parameter values even when exact estimation proves challenging.

    During our project, we conducted several simulations with varying hyperparameters. Like the findings of [1], in their simulations with other infectious diseases, we noticed that using lower learning rates and more time points resulted in longer training durations. For instance, when we had 100,000 time points and a learning rate of 1×106, the training process took around 5 hours and 39 minutes. However, when we reduced the time points to 1000 and increased the learning rate to 1×104, we were able to reduce the training time to 28 minutes for 50,000 epochs. Conversely, keeping our time points at 1000 and lowering the learning rate to 1×106 led to a training duration of 29 minutes for 50,000 epochs.

    In our DINN model's normalization and loss calculation stages, we introduced an epsilon value (self.epsilon 1×108). This tiny constant helped to prevent division by zero and enhanced numerical stability, especially when working with parameters that might have approached very small values.

    During our simulations, we noticed variations in how parameters were estimated across runs even when using the same code. While some parameters consistently matched the target values closely, others showed differences. This variability stayed consistent across runs without any changes to the code, indicating that factors like the nature of optimization, the complexity of parameter space, and wide search ranges all play a role in this behavior.

    These results underscore the importance of multiple runs and conducting analyses in implementing DINN. The variation in outcomes emphasizes the need for robust validation methods, possibly using ensemble techniques to enhance parameter estimation reliability. Additionally, it highlights the role of tuning hyperparameters in defining appropriate search ranges for each parameter to achieve accurate and consistent results with DINN models.

    In Figure 5 we see that, initially, there is a decrease in loss, indicating that DINN quickly grasps the dynamics of the avian influenza model. Following this drop, the decline in loss becomes more gradual, suggesting adjustments are being made to fine-tune model parameters. The continuous decrease in loss signifies that our DINN is effectively learning to balance both data-driven and physics-informed aspects within its loss function. We think that the smoothness of the curve, with no fluctuations, indicates a stable training process. It's interesting to note that the loss does not level off completely after 50,000 epochs. This could mean that additional training could lead to improvements or it might indicate that the model has struck a balance between fitting the data and satisfying the physical constraints set by the differential equations.

    The final loss value reached is 0.000006, which reflects both the error in fitting the data and ensuring that the model's differential equations are satisfied. We believe that this low loss value indicates that our DINN has effectively learned to replicate the dynamics of the avian influenza model while adhering to physical principles.

    To determine the optimal depth for our DINN, we experimented with neural networks of varying depths: 8, 10, and 12 layers. We trained each network using the same hyperparameters and compared their performance in terms of parameter estimation accuracy and computational efficiency.

    From Table 4, the estimates of the critical parameters like ra, Ka, and Πh, for the 12-layer network are significantly closer to the target values. While the 12-layer network requires slightly more training time than the 8-layer and 10-layer networks, it remains computationally feasible, with a training time of approximately 28 minutes. In fact, the 12-layer network consistently outperforms the 8-layer and 10-layer networks in terms of lower relative errors across most parameters. This means that increasing our number of layers to above 12 will help us have lower percentage relative errors, at least lower than the higher ones as seen in γ, from 24.09% to a lower percentage value.

    Table 4.  Comparison of DINN performance with different network depths.
    Parameter Target Value 8 Layers 10 Layers 12 Layers
    ra 0.005000 0.003044 0.003692 0.004777
    Relative Error (%) 39.12% 26.16% 4.46%
    Ka 50000.0 66220.4 50055.8 50005.0
    Relative Error (%) 32.44% 0.11% 0.01%
    μa 0.00034246 0.00028369 0.00030127 0.00029792
    Relative Error (%) 17.15% 12.04% 13.01%
    δa 0.0004000 0.0003941 0.0004117 0.0004085
    Relative Error (%) 1.48% 2.93% 2.13%
    Πh 30.00 20.14 35.38 31.00
    Relative Error (%) 32.87% 17.93% 3.33%
    βh 8×107 9.564×107 9.734×107 9.723×107
    Relative Error (%) 19.55% 21.68% 21.54%
    βa 2.6×108 1.951×108 2.442×108 2.402×108
    Relative Error (%) 24.96% 6.08% 7.62%
    μh 0.00003910 0.00003797 0.00003753 0.00003732
    Relative Error (%) 2.89% 4.01% 4.55%
    δh 0.077000 0.071843 0.070740 0.070335
    Relative Error (%) 6.70% 8.13% 8.66%
    b 0.001000 0.001635 0.000947 0.000930
    Relative Error (%) 63.50% 5.30% 7.00%
    c 0.01000 0.01628 0.01075 0.01111
    Relative Error (%) 62.80% 7.50% 11.10%
    γ 0.10000 0.14560 0.13007 0.12409
    Relative Error (%) 45.60% 30.07% 24.09%
    Average Relative Error 26.84% 12.81% 9.72%
    Training Time 19 min 26 s 23 min 40 s 28 min 5 s

     | Show Table
    DownLoad: CSV

    Calculation of Relative Errors: The relative error for each parameter is calculated using the formula:

    Relative Error (%)=(|Estimated ValueTarget Value|Target Value)×100%

    For example, for ra in the 8-layer network:

    Relative Error=(|0.0030440.005000|0.005000)×100%=39.12%

    Therefore, based on the comparative analysis in Table 4, a 12-layer DINN is the most suitable choice for our framework because it balances accuracy and computational efficiency and we are happy with the results obtained with this choice of layers. This decision is supported by the empirical evidence from our experiments, but we expect better performance as we increase the number of layers of our DINN despite the high computational power that will be required in such a case. This expectation can be supported by the work done by [1].

    In this section, we evaluate the DINN model's ability to predict disease dynamics under various scenarios, focusing on its performance in capturing both disease extinction and persistence.

    To generate the prediction plots in Figure 6, we use the same DINN methodology as described in Section 4.1.

    Figure 6.  DINN predictions vs. true values for disease extinction (R0<1) and persistence (R0>1), showing the model performance across different βa values.

    For each βa value (8.0×109,1.0×108,1.3×108,1.6×108,2.0×108, and 2.6×108), we follow these steps:

    1) We load the corresponding synthetic data generated from the original SI-SIR model.

    2) We train a separate DINN model for each βa value, using the loaded data.

    3) After training, we use each DINN model to predict the number of infected humans (Ih) over the same time period as the training data.

    4) We plot both the true values from the original model (solid lines) and the DINN predictions (dashed lines) on the same graph.

    The results are displayed in two sections: the left side illustrates scenarios where the disease becomes extinct (R0<1), while the right side shows scenarios where the disease persists (R0>1). Each line on the graph is color-coded and marked with its value of βa for comparison.

    This method enabled us to compare the predictions made by the DINN model with values across various transmission rates, showcasing the model's capacity to capture both extinction and persistence dynamics.

    In this section, we present the forecasts generated by the DINN model for avian-to-avian transmission rates (βa) and compare them with the true data obtained from the SI SIR model. Additionally, we evaluate the model's precision by exploring the compartments of the system and analyzing its performance under varying transmission rates.

    In the left panel of Figure 6, we can see that the number of infected individuals changes over time for the three βa values; 8.0×109, 1.0×108, and 1.3×108. Our findings are:

    ● Initially, there is a spike in infections that later decreases to zero. This trend indicates that the disease is declining, as expected when R0<1.

    ● Our DINN model (represented by dashed lines) closely mirrors the values (solid lines), capturing the initial peak and subsequent decline accurately.

    ● The model performs well across different βa values, suggesting that it is robust to changes in the transmission rate.

    The right panel of Figure 6 shows what happens with higher βa values, that is to say, 1.6×108, 2.0×108, and 2.6×108. The obtained results vary from when R0<1, as elaborated.

    ● Instead of dying out, the number of infected individuals levels off at a steady state. This aligns with what we would expect when R0>1 where the disease sticks around in the population.

    ● Again, our DINN model predictions (dashed lines) match up well with the true values (solid lines). It's good to see this consistency even as we crank up the βa values.

    ● The model captures the different steady-state levels for each βa value, showing it can adapt to various epidemic scenarios.

    We also examine how effectively our model performs for different compartments at certain βa values. The comparison of RMSE losses is displayed in the figure below.

    By analyzing the graph in Figure 7, we can conclude the following;

    Figure 7.  DINN model performance for different βa values.

    ● The Ia compartment (population) poses the greatest challenge for our model as it exhibits the highest relative MSE loss, particularly at lower βa values. This difficulty may stem from the complexity of predicting the infectious avian population when transmission rates are reduced. Nonetheless, there could be factors not addressed in this research.

    ● Despite some variation, the overall total loss stays pretty low. This suggests that our DINN is generally doing a good job across different βa values.

    ● As the avian-to-avian transmission rate increases (βa), there are changes in relative MSE losses at various parts of the DINN model. The shifts in error characteristics show how the framework behaves to the shifts in patterns of the disease. Thus, when the transmission rates are high, this substantially influences the speed and area where the diseases spread – starting with birds and then humans. Our introduced DINN model is able to capture such changes as depicted by the losses above. Therefore, the above adaptive response indicates that our model is flexible, in a way that it is able to attend to conditions and make predictions correspondingly depending on transmission rate.

    In recent studies, DINNs have been designed to utilize the detailed compartmentalization from classical epidemiology and the learning capabilities of neural networks to offer more precise predictions and effective management strategies for infectious diseases.

    The DINN designed in our work managed to capture the dynamics of both disease extinction and persistence scenarios. Our DINN model is not perfect but we believe it is a considerable improvement in the combination of neural networks with epidemiological modeling. While we are still in the process of improvement and expansion around this very approach, it has the potential to become a tool of great potential for the understanding of and prediction of the complex dynamics of diseases.

    Concerning the lower values of βa, we noticed that the typical outbreak curve ratio grew to the maximum and then had periodic declines to zero. The DINN behaved almost like this curve, which is essential for forecasting the possible extinction of a disease. On the other hand, when βa was increased, the disease prevalence remained in the population at a certain level in the long run. This flexibility is significant, as this suggests that our proposed DINN could, in principle, have to deal with a lot of real-world scenarios.

    The rate at which avian influenza spreads within bird populations is heavily influenced by the transmission rate among birds (βa) in the SI-SIR model for influenza. Changes in βa can have an impact, on the progression of the epidemic. Ultimately we decide whether the disease will fade away or continue to exist in the population. Therefore, analyzing βa provides essential insights into the effectiveness of intervention strategies aimed at reducing transmission among birds, which is a key factor in controlling the spread of avian influenza to human populations.

    Moreover, we observed the sensitivity of the DINN especially in the Ia compartment as the number of new cases of transmission fell. We believe that the model seems to have struggled to approximate the population of infected birds, especially when rates were lower. This could possibly be worth further investigation.

    It was very informative to look at how hyperparameters act. The modification of the learning rate with the amount of time points affected the performance of the DINN. It is like a situation! Since the decision is made by us, or by some of us at least, we have some levers that we can pull if we want to increase the model's efficiency.

    On the parameter constraints, we were satisfied with the way the tanh function was performing. It tried to remain biologically realistic but also allowed the model to be somewhat loose. That said, though, it was refreshing to see how different the parameter estimates are in different runs, given that one is really optimizing a very complex function here.

    Last but not least, the loss that was observed during the course of the training was encouraging, too. It confirms that our DINN is developing as expected and manages to strike a balance in the model between learning the data and respecting the differential equations.

    In summation, we say our DINN implementation for avian influenza modeling shows great promise in model predictions to inform the public health sectors. We have created a tool that can successfully simulate different disease scenarios (including adapting to varying transmission rates) with a high degree of accuracy.

    Our main reason for excitement is that the model was able to predict both extinction and persistence scenarios with high accuracy. Such flexibility in the model can prove to be highly beneficial in the public health sector, especially for planning, as it can anticipate a variety of possible outcomes.

    Our investigation into using DINNs for modeling avian influenza has revealed some avenues for future research. One challenge we encountered with the Ia compartment at low transmission rates is a potential area we might look into for future refinements, as there is always room for improvement. One could investigate deeper into this by:

    1) Experimenting with various neural network architectures or exploring the use of a network or a transformer-based model.

    2) Adjusting the loss function to prioritize giving importance to the Ia component during training.

    Moreover, although our synthetic data has served well in demonstrating the concept, there may be a desire to validate our model in real-world scenarios. In future work, we plan to compare our DINN approach against other learning frameworks such as sparse identification of nonlinear dynamics (SINDy) and neural ordinary differential equations (ODEs). This will help us better understand the relative advantages, computational costs, and robustness to noisy data of each method, which will, in the end, help us make more informed choices for epidemiological modeling and prediction. A study could be conducted to answer the questions below.

    1) How does our DINN handle noisy or incomplete data, which is a challenge in the field of epidemiology?

    2) How could one explore the methods for filling in missing data or reducing noise? This could be beneficial as part of the data preparation process.

    We intend to incorporate advanced uncertainty quantification techniques, such as Bayesian methods or bootstrapping, to derive confidence intervals for the model parameters. This will offer a statistically rigorous measure of reliability for our DINN-based estimates, complementing the multiple-run approach and biologically constrained parameter ranges presented in this work. We also plan to use LSTMs, as a substitute for the FFNNs employed in this research project. LSTMs are adept at handling time series information, and they have the capability to capture sequential patterns by retaining knowledge across extended sequences. Enhancing the model's capacity to grasp patterns in epidemiological data by employing an LSTM-based DINN could result in more accurate forecasts of disease trends.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    W.A.W acknowledges financial support from the NSERC Discovery Grant (Appl No.: RGPIN-2023-05100) used to support the student. W.A.W also acknowledges support from the Canadian Institute for Health Research (CIHR) under the Mpox and Other Zoonotic Threats Team Grant (FRN. 187246). HZ acknowledges financial support from the Natural Sciences and Engineering Research Council of Canada and the Public Health Agency of Canada (NSERC-PHAC). Nickson Golooba would like to express his gratitude for the financial support received from Woldegebriel Assefa Woldegerima and Huaiping Zhu, and for their invaluable guidance and mentorship throughout this research. Appreciation is also extended to the Canadian "One Health Modelling Network for Emerging Infections" (OMNI) for their support and provision of computational resources.

    The authors declare that there is no conflict of interest.

    PyTorch algorithm for automatic differentiation

    In our code, we utilize PyTorch's automatic differentiation capabilities to compute the time derivatives of the network outputs with respect to time t. The key steps in our implementation are as follows:

    Computing Time Derivatives:

    To compute the time derivatives dˆSadt,dˆIadt,dˆShdt,dˆIhdt,dˆRhdt, we use the following procedure:

    ● Define masks self.m1,self.m2,self.m3,self.m4,self.m5, which are zero tensors except for a one in the position corresponding to the variable of interest.

    ● Perform a backward pass with respect to the input time t using these masks to compute the gradients.

    Code Snippet: Here is the relevant portion of our code:

    # Compute gradients for each compartment with respect to time:

    # S_a_t

    si_sir_hat.backward(self.m1, retain_graph = True)

    S_a_hat_t = self.t.grad.clone()

    self.t.grad.zero_()

    # I_a_t

    si_sir_hat.backward(self.m2, retain_graph = True)

    I_a_hat_t = self.t.grad.clone()

    self.t.grad.zero_()

    # S_h_t

    si_sir_hat.backward(self.m3, retain_graph = True)

    S_h_hat_t = self.t.grad.clone()

    self.t.grad.zero_()

    # I_h_t

    si_sir_hat.backward(self.m4, retain_graph = True)

    I_h_hat_t = self.t.grad.clone()

    self.t.grad.zero_()

    # R_h_t

    si_sir_hat.backward(self.m5, retain_graph = True)

    R_h_hat_t = self.t.grad.clone()

    self.t.grad.zero_()

    In this code:

    si_sir_hat is the output of the neural network, which is a tensor containing ˆSa,ˆIa,ˆSh,ˆIh,ˆRh for all time points.

    ● The backward function computes the gradient of the network outputs with respect to the input t.

    ● The masks self.m1,,self.m5 ensure that we compute the gradient for only one output at a time.

    ● We clone the gradients and reset them after each computation to avoid accumulation.

    Calculating the Residuals:

    Once we have the time derivatives, we calculate the residuals fj(ti) by substituting the network's predictions and the computed derivatives into the SI-SIR model equations.

    In our code, this is implemented as:

    # Unnormalize the predictions

    S_a = self.S_a_min + (self.S_a_max - self.S_a_min) * S_a_hat

    I_a = self.I_a_min + (self.I_a_max - self.I_a_min) * I_a_hat

    S_h = self.S_h_min + (self.S_h_max - self.S_h_min) * S_h_hat

    I_h = self.I_h_min + (self.I_h_max - self.I_h_min) * I_h_hat

    R_h = self.R_h_min + (self.R_h_max - self.R_h_min) * R_h_hat

    # Compute the residuals (normalized)

    f1_hat = S_a_hat_t -

    (self.r_a * S_a * (1 - S_a / (self.K_a + self.epsilon))-

    (self.beta_a * I_a * S_a) / (1 + self.b * I_a + self.epsilon)) /

    (self.S_a_max - self.S_a_min + self.epsilon)

    f2_hat = I_a_hat_t - ((self.beta_a * I_a * S_a) /

    (1 + self.b * I_a + self.epsilon) - (self.mu_a + self.delta_a) * I_a) /

    (self.I_a_max - self.I_a_min + self.epsilon)

    f3_hat = S_h_hat_t - (self.Pi_h - (self.beta_h * I_a * S_h) /

    (1 + self.c * I_h**2 + self.epsilon) - self.mu_h * S_h) /

    (self.S_h_max - self.S_h_min + self.epsilon)

    f4_hat = I_h_hat_t - ((self.beta_h * I_a * S_h) /

    (1 + self.c * I_h**2 + self.epsilon) -

    (self.mu_h + self.delta_h + self.gamma) * I_h) /

    (self.I_h_max - self.I_h_min + self.epsilon)

    f5_hat = R_h_hat_t - (self.gamma * I_h - self.mu_h * R_h) /

    (self.R_h_max - self.R_h_min + self.epsilon)

    Here:

    S_a_hat_t,I_a_hat_t, are the time derivatives of the normalized compartment values.

    ● We un-normalize the predictions before substituting them into the SI-SIR equations.

    ● The residuals fj are computed in their normalized form to maintain consistency with the normalized predictions and derivatives.

    ϵ is a small constant added to prevent division by zero.

    Total Loss Calculation:

    Finally, we compute the total loss by summing the data loss and the physics loss:

    loss = (torch.mean(torch.square(self.S_a_hat - S_a_pred)) +

        torch.mean(torch.square(self.I_a_hat - I_a_pred)) +

        torch.mean(torch.square(self.S_h_hat - S_h_pred)) +

        torch.mean(torch.square(self.I_h_hat - I_h_pred)) +

        torch.mean(torch.square(self.R_h_hat - R_h_pred)) +

        torch.mean(torch.square(f1_hat)) +

        torch.mean(torch.square(f2_hat)) +

        torch.mean(torch.square(f3_hat)) +

        torch.mean(torch.square(f4_hat)) +

        torch.mean(torch.square(f5_hat))

        )

    In this expression:

    ● The first five terms correspond to the data loss (MSE between the normalized true values and predictions).

    ● The last five terms correspond to the physics loss (MSE of the residuals).

    Optimization: During training, we minimize the total loss using an optimizer (for example, Adam optimizer) and update the network parameters accordingly.

    self.optimizer.zero_grad()

    loss.backward()

    self.optimizer.step()

    self.scheduler.step()

    By incorporating both data and physics losses, the DINN is guided to produce solutions that are consistent with both the observed data and the underlying physical laws described by the differential equations.

    Normalization and Stability Considerations: Normalization is crucial in our implementation to ensure numerical stability and effective training:

    ● We normalize the compartment values to the range [0, 1] using the maximum and minimum values from the data.

    ● During the computation of residuals, we un-normalize the predictions to substitute them into the SI-SIR equations.

    ● We use a small epsilon (ϵ=1×108) to prevent division by zero.

    Parameter Constraints: We enforce biologically feasible ranges for the model parameters using the hyperbolic tangent (tanh) function:

    Parameter=tanh(Parametertilda)×Scale+Shift (6.1)

    For example:

    @property

    def r_a(self):

     return torch.tanh(self.r_a_tilda) * 0.01 + 0.005

     # Range: [0.0045, 0.0055]

    This approach ensures that during training, the parameters stay within specified bounds reflecting realistic biological values.

    [1] ECDC/WHO Regional Office for Europe (2016) HIV/AIDS Surveillance in Europe 2015. European Centre for Disease Prevention and Control.
    [2] Faugier J, Sargeant M (1997) Sampling hard to reach populations. J Adv Nurs 26: 790-797. doi: 10.1046/j.1365-2648.1997.00371.x
    [3] UNAIDS/WHO (2007) AIDS epidemic update: December 2007. The Joint United Nations Programme on HIV/AIDS.
    [4] WHO (2014) Consolidated Guidelines on HIV Prevention, Diagnosis, Treatment and Care for Key Populations. World Health Organization.
    [5] Baral S, Sifakis F, Cleghorn F, et al. (2007) Elevated risk for HIV infection among men who have sex with men in low- and middle-income countries 2000–2006: a systematic review. PLoS Med 4: e339. doi: 10.1371/journal.pmed.0040339
    [6] EMIS Network (2013) EMIS 2010: The European Men-Who-Have-Sex-With-Men Internet Survey - Findings from 38 countries. European Centre for Disease Prevention and Control.
    [7] Cáceres CF, Konda K, Segura ER, et al. (2008) Epidemiology of male same-sex behaviour and associated sexual health indicators in low- and middle-income countries: 2003–2007 estimates. Sex Transm Infect 84: i49-56. doi: 10.1136/sti.2008.030569
    [8] Ekouevi DK, Dagnra CY, Goilibe KB, et al. (2014) HIV seroprevalence and associated factors among men who have sex with men in Togo. Rev Epidemiol Sante Publique 62: 127-134. doi: 10.1016/j.respe.2013.11.074
    [9] Grulich AE, Kaldor JM (2008) Trends in HIV incidence in homosexual men in developed countries. Sex Health 5: 113-118. doi: 10.1071/SH07075
    [10] Jaffe HW, Valdiserri RO, De Cock KM (2007) The reemerging HIV/AIDS epidemic in men who have sex with men. JAMA 298: 2412-2414. doi: 10.1001/jama.298.20.2412
    [11] Likatavicius G, Klavs I, Devaux I, et al. (2008) An increase in newly diagnosed HIV cases reported among men who have sex with men in Europe, 2000–6: implications for a European public health strategy. Sex Transm Infect 84: 499-505. doi: 10.1136/sti.2008.031419
    [12] ECDC/WHO (2010) HIV surveillance in Europe 2009: European Centre for Disease Prevention and Control.
    [13] UNAIDS (2009) UNAIDS Action Framework: Universal Access for Men who have Sex with Men and Transgender People. The Joint United Nations Programme on HIV/AIDS.
    [14] Chen SY, Gibson S, Katz MH, et al. (2002) Continuing increases in sexual risk behavior and sexually transmitted diseases among men who have sex with men: San Francisco, Calif, 1999–2001, USA. Am J Public Health 92: 1387-1388. doi: 10.2105/AJPH.92.9.1387
    [15] Koblin BA, Husnik MJ, Colfax G, et al. (2006) Risk factors for HIV infection among men who have sex with men. AIDS 20: 731-739. doi: 10.1097/01.aids.0000216374.61442.55
    [16] Magnani R, Sabin K, Saidel T, et al. (2005) Review of sampling hard-to-reach and hidden populations for HIV surveillance. AIDS 19: S67-72.
    [17] Centers for Disease Control and Prevention (2001) Updated guidelines for evaluating public health surveillance systems: Recommendations from the guidelines working group. Morb Mortal Wkly Rep 50: 1-35.
    [18] Brookmeyer R (2010) Measuring the HIV/AIDS epidemic: approaches and challenges. Epidemiol Rev 32: 26-37. doi: 10.1093/epirev/mxq002
    [19] Lohr SL (2010) Sampling: Design and Analysis. Brooks/Cole, Cengage Learning.
    [20] Johnston L, Sabin K, Prybylski D (2010) Update for sampling most-at-risk and hidden populations for HIV biological and behavioral surveillance. J HIV AIDS Surveill Epidemiol 2: 1-12.
    [21] Mills S, Saidel T, Magnani R, et al. (2004) Surveillance and modelling of HIV, STI, and risk behaviours in concentrated HIV epidemics. Sex Transm Infect 80: ii57-62. doi: 10.1136/sti.2003.008128
    [22] Pisani E, Lazzari S, Walker N, et al. (2003) HIV surveillance: a global perspective. J Acquir Immune Defic Syndr 32: S3-11. doi: 10.1097/00126334-200302011-00002
    [23] WHO/UNAIDS (2002) Second generation surveillance for HIV. World Health Organization.
    [24] Zaba B, Slaymaker E, Urassa M, et al. (2005) The role of behavioral data in HIV surveillance. AIDS 19: S39-52. doi: 10.1097/01.aids.0000172876.74886.86
    [25] Mills S, Saidel T, Bennett A, et al. (1998) HIV risk behavioral surveillance: a methodology for monitoring behavioral trends. AIDS 12: S37-46. doi: 10.1097/00002030-199806000-00001
    [26] Sadler GR, Lee H-C, Lim RS-H, et al. (2010) Recruitment of hard-to-reach population subgroups via adaptations of the snowball sampling strategy. Nurs Health Sci 12: 369-374. doi: 10.1111/j.1442-2018.2010.00541.x
    [27] Gorbach PM, Murphy R, Weiss RE, et al. (2009) Bridging sexual boundaries: men who have sex with men and women in a street-based sample in Los Angeles. J Urban Health 86: 63-76. doi: 10.1007/s11524-009-9370-7
    [28] Heckathorn DD (1997) Respondent-driven sampling: A new approach to the study of hidden populations. Soc Probl 44: 174-199. doi: 10.2307/3096941
    [29] MacKellar D, Valleroy L, Karon J, et al. (1996) The Young Men's Survey: methods for estimating HIV seroprevalence and risk factors among young men who have sex with men. Public Health Rep 111: 138-144.
    [30] Platt L, Wall M, Rhodes T, et al. (2006) Methods to recruit hard-to-reach groups: comparing two chain referral sampling methods of recruiting injecting drug users across nine studies in Russia and Estonia. J Urban Health 83: i39-53. doi: 10.1007/s11524-006-9101-2
    [31] Semaan S, Lauby J, Liebman J (2002) Street and network sampling in evaluation studies of HIV risk-reduction interventions. AIDS Rev 4: 213-223.
    [32] Yeka W, Maibani-Michie G, Prybylski D, et al. (2006) Application of respondent driven sampling to collect baseline data on FSWs and MSM for HIV risk reduction interventions in two urban centres in Papua New Guinea. J Urban Health 83: i60-72. doi: 10.1007/s11524-006-9103-0
    [33] WHO (2011) Guidelines on surveillance among populations most at risk for HIV. World Health Organization.
    [34] Atkinson R, Flint J (2001) Accessing hidden and hard-to-reach populations: Snowball research strategies. Soc Res Updat 33: 1-4.
    [35] Kendall C, Kerr LRFS, Gondim RC, et al. (2008) An empirical comparison of respondent-driven sampling, time location sampling, and snowball sampling for behavioral surveillance in men who have sex with men, Fortaleza, Brazil. AIDS Behav 12: S97-104. doi: 10.1007/s10461-008-9390-4
    [36] Guo Y, Li X, Fang X, et al. (2011) A comparison of four sampling methods among men having sex with men in China: implications for HIV/STD surveillance and prevention. AIDS Care 23: 1400-1409. doi: 10.1080/09540121.2011.565029
    [37] Muhib FB, Lin LS, Stueve A, et al. (2001) A venue-based method for sampling hard-to-reach populations. Public Health Rep 116: 216-222. doi: 10.1093/phr/116.S1.216
    [38] Gama A, Abecasis A, Pingarilho M, et al. (2016) Cruising venues as a context for HIV risky behavior among men who have sex with men. Arch Sex Behav [Epub ahead of print].
    [39] Reisen CA, Iracheta MA, Zea MC, et al. (2010) Sex in public and private settings among Latino MSM. AIDS Care 22: 697-704. doi: 10.1080/09540120903325433
    [40] Reisner SL, Mimiaga MJ, Skeer M, et al. (2009) Differential HIV risk behavior among men who have sex with men seeking health-related mobile van services at diverse gay-specific venues. AIDS Behav 13: 822-831. doi: 10.1007/s10461-008-9430-0
    [41] Grov C, Golub SA, Parsons JT (2010) HIV status differences in venues where highly sexually active gay and bisexual men meet sex partners: results from a pilot study. AIDS Educ Prev 22: 496-508. doi: 10.1521/aeap.2010.22.6.496
    [42] Young SD, Szekeres G, Coates T (2013) Sexual risk and HIV prevention behaviours among African-American and Latino MSM social networking users. Int J STD AIDS 24: 643-649. doi: 10.1177/0956462413478875
    [43] Gile K, Handcock M (2010) Respondent‐driven sampling: an assessment of current methodology. Sociol Methodol 40: 285-327. doi: 10.1111/j.1467-9531.2010.01223.x
    [44] Fernández MI, Warren JC, Varga LM, et al. (2007) Cruising in cyber space: comparing Internet chat room versus community venues for recruiting Hispanic men who have sex with men to participate in prevention studies. J Ethn Subst Abuse 6: 143-162. doi: 10.1300/J233v06n02_09
    [45] Maman S, Lane T, Ntogwisangu J, et al. (2009) Using participatory mapping to inform a community-randomized trial of HIV counseling and testing. Field methods 21: 368-387. doi: 10.1177/1525822X09341718
    [46] Dias S, Gama A (2014) Community-based participatory research in public health: potentials and challenges. Rev Panam Salud Pública 35: 150-154.
    [47] Abramovitz D, Volz EM, Strathdee SA, et al. (2009) Using respondent driven sampling in hidden populations at risk of HIV infection: who do HIV-positive recruiters recruit? Sex Transm Dis 36: 750-756. doi: 10.1097/OLQ.0b013e3181b0f311
    [48] Goel S, Salganik MJ (2010) Assessing respondent-driven sampling. Proc Natl Acad Sci 107: 6743-6747. doi: 10.1073/pnas.1000261107
    [49] Malekinejad M, Johnston LG, Kendall C, et al. (2008) Using respondent-driven sampling methodology for HIV biological and behavioral surveillance in international settings: a systematic review. AIDS Behav 12: S105-130. doi: 10.1007/s10461-008-9421-1
    [50] Johnston LG, Khanam R, Reza M, et al. (2008) The effectiveness of respondent driven sampling for recruiting males who have sex with males in Dhaka, Bangladesh. AIDS Behav 12: 294-304. doi: 10.1007/s10461-007-9300-1
    [51] Barros AB, Dias SF, Martins MRO (2015) Hard-to-reach populations of men who have sex with men and sex workers: a systematic review on sampling methods. Syst Rev 4: 141. doi: 10.1186/s13643-015-0129-9
    [52] McCreesh N, Frost S, Seeley J, et al. (2012) Evaluation of respondent-driven sampling. Epidemiology 23: 138-147. doi: 10.1097/EDE.0b013e31823ac17c
    [53] Mouw T, Verdery AM (2012) Network sampling with memory: a proposal for more efficient sampling from social networks. Sociol Methodol 42: 206-256. doi: 10.1177/0081175012461248
    [54] Iguchi MY, Ober AJ, Berry SH, et al. (2009) Simultaneous recruitment of drug users and men who have sex with men in the united states and Russia using respondent-driven sampling: sampling methods and implications. J Urban Health 86: 5-31.
    [55] Salganik M, Heckathorn D (2004) Sampling and estimation in hidden populations using respondent‐driven sampling. Sociol Methodol 34: 193-239. doi: 10.1111/j.0081-1750.2004.00152.x
    [56] Ramirez-Valles J, Heckathorn DD, Vázquez R, et al. (2005) From networks to populations: the development and application of respondent-driven sampling among IDUs and Latino gay men. AIDS Behav 9: 387-402. doi: 10.1007/s10461-005-9012-3
    [57] Heckathorn DD (2011) Snowball versus respondent-driven sampling. Sociol Methodol 41: 355-366. doi: 10.1111/j.1467-9531.2011.01244.x
    [58] Salganik MJ (2006) Variance estimation, design effects, and sample size calculations for respondent-driven sampling. J Urban Health 83: i98-112. doi: 10.1007/s11524-006-9106-x
    [59] Watters JK, Biernacki P (1989) Targeted sampling: Options for the study of hidden populations. Sociol Probl 36: 416-430.
    [60] Stueve A, O'Donnell LN, Duran R, et al. (2001) Time-space sampling in minority communities: results with young Latino men who have sex with men. Am J Public Health 91: 922-926. doi: 10.2105/AJPH.91.6.922
    [61] MacKellar DA, Gallagher KM, Finlayson T, et al. (2007) Surveillance of HIV risk and prevention behaviors of men who have sex with men-a national application of venue-based, time-space sampling. Public Health Rep 122: 39-47. doi: 10.1177/00333549071220S107
    [62] Mirandola M, Folch C, Krampac I, et al. (2009) HIV bio-behavioural survey among men who have sex with men in Barcelona, Bratislava, Bucharest, Ljubljana, Prague and Verona, 2008–2009. Euro Surveill 14: pii19427.
    [63] Williamson LM, Hart GJ (2007) HIV prevalence and undiagnosed infection among a community sample of gay men in Scotland. J Acquir Immune Defic Syndr 45: 224-230. doi: 10.1097/QAI.0b013e318058a01e
    [64] Gios L, Mirandola M, Toskin I, et al. (2016) Bio-behavioural HIV and STI surveillance among men who have sex with men in Europe: the Sialon II protocols. BMC Public Health 16: 212. doi: 10.1186/s12889-016-2783-9
    [65] SIALON II (2015) Capacity building in combining targeted prevention with meaningful HIV surveillance among men who have sex with men. SIALON II.
    [66] Bowen A (2005) Internet sexuality research with rural men who have sex with men: can we recruit and retain them? J Sex Res 42: 317-323. doi: 10.1080/00224490509552287
    [67] Pachankis JE, Hatzenbuehler ML, Hickson F, et al. (2015) Hidden from health. AIDS 29: 1239-1246. doi: 10.1097/QAD.0000000000000724
    [68] Chiasson MA, Parsons JT, Tesoriero JM, et al. (2006) HIV behavioral research online. J Urban Health 83: 73-85.
    [69] Pequegnat W, Rosser BRS, Bowen AM, et al. (2006) Conducting internet-based HIV/STD prevention survey research: considerations in design and evaluation. AIDS Behav 11: 505-521.
    [70] Zhang D, Bi P, Hiller JE, et al. (2008) Web-based HIV/AIDS behavioral surveillance among men who have sex with men: potential and challenges. Int J Infect Dis 12: 126-131. doi: 10.1016/j.ijid.2007.06.007
    [71] UNAIDS (2015) UNAIDS Terminology Guidelines 2015. Joint United Nations Programme on HIV/AIDS.
  • Reader Comments
  • © 2017 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(6831) PDF downloads(992) Cited by(25)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog