Alcohol consumption and HIV disease prognosis among virally unsuppressed in Rural KwaZulu Natal, South Africa

Manasseh B. Wireko; Jacobus Hendricks; Kweku Bedu-Addo; Marlise Van Staden; Emmanuel A. Ntim; Samuel F. Odoom; Isaac K. Owusu; Manasseh B. Wireko; Jacobus Hendricks; Kweku Bedu-Addo; Marlise Van Staden; Emmanuel A. Ntim; Samuel F. Odoom; Isaac K. Owusu

doi:10.3934/medsci.2023018

AIMS Medical Science

2023, Volume 10, Issue 3: 223-236. doi: 10.3934/medsci.2023018

Previous Article Next Article

Research article

Alcohol consumption and HIV disease prognosis among virally unsuppressed in Rural KwaZulu Natal, South Africa

1.
Department of Applied and Theoretical Biology, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
2.
Department of Physiology and Environmental Health, University of Limpopo, South Africa
3.
Department of Physiology, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
4.
Department of Child Health, Komfo Anokye Teaching Hospital, Kumasi, Ghana
5.
Department of Medicine, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
6.
Directorate of Medicine, Komfo Anokye Teaching Hospital, Kumasi, Ghana

Received: 17 March 2023 Revised: 19 July 2023 Accepted: 25 July 2023 Published: 16 August 2023

Background

The effect of alcohol consumption and human immunodeficiency virus (HIV) disease prognosis has been examined in several studies with inconsistent findings. We sought to determine the effect of alcohol consumption on HIV disease prognosis by examining CD4⁺ T cell count/µL (CD4⁺ count) and HIV RNA concentration [HIV viral load (VL)] independent of anti-retroviral therapy (ART).

Methods

A secondary analysis was performed on a cross-sectional survey data of 1120 participants between 2018 and 2020. Questionnaires were used to obtain the participants' history of alcohol consumption. Blood samples were assayed for CD4⁺ T cell count/µL (CD4⁺ count) and HIV RNA concentration (HIV viral load). The history of alcohol consumption was categorized into non-alcohol consumers, non-heavy alcohol consumers, and heavy-alcohol consumers. Age, cigarette smoking, gender, and ART use were considered potential confounders. Participants were categorized into two cohorts for the analysis and a multivariate logistic regression was used to establish relationships among virally unsuppressed participants who were ART-experienced and ART-naïve.

Results

A total of 1120 participants were considered for analysis. The majority were females (65.9%) between 15–39 years (72.4%). The majority were non-smokers and non-alcohol consumers (88% and 79%, respectively). ART-experienced females had an increased risk of having a higher VL (VL > 1000). This finding was statistically significant [RR, 0.425, 95% CI, (0.192–0.944), p-value, 0.036]. However, ART-experienced participants aged above 64 years had an increased risk of having a lower VL (VL < 1000 copies/mL) and a lower risk of having a higher VL (VL > 1000). However, ART-naïve participants aged between 40–64 years had a significantly lower risk of having higher CD4 count (CD4⁺ > 500 cells) and an increased risk of having a lower CD4 count [OR, 0.566 95% CI, (0.386–0.829), p-value, 0.004]. History of alcohol consumption did not have a significant effect on CD4⁺ cell count and VL in neither the ART-experienced nor the naïve cohort.

Conclusions

Female middle-aged people living with HIV (PLWH) are more likely to have a poorer HIV disease state, independent of alcohol consumption. Alcohol consumption may not have a direct effect on CD4⁺ cell count and VL in either ART-naïve or experienced patients.

Keywords:

Citation: Manasseh B. Wireko, Jacobus Hendricks, Kweku Bedu-Addo, Marlise Van Staden, Emmanuel A. Ntim, Samuel F. Odoom, Isaac K. Owusu. Alcohol consumption and HIV disease prognosis among virally unsuppressed in Rural KwaZulu Natal, South Africa[J]. AIMS Medical Science, 2023, 10(3): 223-236. doi: 10.3934/medsci.2023018

Related Papers:

[1]	Paolo Dell'Aversana . Reservoir geophysical monitoring supported by artificial general intelligence and Q-Learning for oil production optimization. AIMS Geosciences, 2024, 10(3): 641-661. doi: 10.3934/geosci.2024033
[2]	Dell’Aversana Paolo, Bernasconi Giancarlo, Chiappa Fabio . A Global Integration Platform for Optimizing Cooperative Modeling and Simultaneous Joint Inversion of Multi-domain Geophysical Data. AIMS Geosciences, 2016, 2(1): 1-31. doi: 10.3934/geosci.2016.1.1
[3]	Paolo Dell’Aversana, Gianluca Gabbriellini, Alfonso Iunio Marini, Alfonso Amendola . Application of Musical Information Retrieval (MIR) Techniques to Seismic Facies Classification. Examples in Hydrocarbon Exploration. AIMS Geosciences, 2016, 2(4): 413-425. doi: 10.3934/geosci.2016.4.413
[4]	Paolo Dell'Aversana . Reservoir prescriptive management combining electric resistivity tomography and machine learning. AIMS Geosciences, 2021, 7(2): 138-161. doi: 10.3934/geosci.2021009
[5]	Thompson Lennox, Velasco Aaron A., Kreinovich Vladik . A Multi-Objective Optimization Framework for Joint Inversion. AIMS Geosciences, 2016, 2(1): 63-87. doi: 10.3934/geosci.2016.1.63
[6]	Zamora Azucena, A.Velasco Aaron . Inversion of Gravity Anomalies Using Primal-Dual Interior Point Methods. AIMS Geosciences, 2016, 2(2): 116-151. doi: 10.3934/geosci.2016.2.116
[7]	Santiago Quinteros, Aleksander Gundersen, Jean-Sebastien L'Heureux, J. Antonio H. Carraro, Richard Jardine . Øysand research site: Geotechnical characterisation of deltaic sandy-silty soils. AIMS Geosciences, 2019, 5(4): 750-783. doi: 10.3934/geosci.2019.4.750
[8]	Eve-Agnès Fiorentino, Sheldon Warden, Maksim Bano, Pascal Sailhac, Thomas Perrier . One-off geophysical detection of chlorinated DNAPL during remediation of an industrial site: a case study. AIMS Geosciences, 2021, 7(1): 1-21. doi: 10.3934/geosci.2021001
[9]	Ayesha Nadeem, Muhammad Farhan Hanif, Muhammad Sabir Naveed, Muhammad Tahir Hassan, Mustabshirha Gul, Naveed Husnain, Jianchun Mi . AI-Driven precision in solar forecasting: Breakthroughs in machine learning and deep learning. AIMS Geosciences, 2024, 10(4): 684-734. doi: 10.3934/geosci.2024035
[10]	John D Alexopoulos, Nikolaos Voulgaris, Spyridon Dilalos, Georgia S Mitsika, Ioannis-Konstantinos Giannopoulos, Vassileios Gkosios, Nena Galanidou . A geophysical insight of the lithostratigraphic subsurface of Rodafnidia area (Lesbos Isl., Greece). AIMS Geosciences, 2023, 9(4): 769-782. doi: 10.3934/geosci.2023041

Abstract

Background

Methods

Results

Conclusions

1. Introduction

In mathematics, computer science and economics, as well as in other disciplines like geophysics, solving an optimization problem consists of finding the best of all possible solutions in a given model space ^[1]. This target can be realized by minimizing (or maximizing) some type of objective function that includes, in many practical cases, the difference between observed and predicted quantities. For instance, in geophysics, a typical optimization problem is finding an Earth-model consisting of seismic-velocity spatial distribution that minimizes the differences between observed and predicted seismic travel times ^[2].

Optimization techniques can be divided into approaches that allow exploring locally the model space, and approaches that allow a global or quasi-global search of the solution. In the first case, we generally incur in the problem of convergence towards local minima (or local maxima) of the cost function. In fact, the final solution will depend strongly on the initial model and on the exploration path in the parameters space. In general, when we apply local optimization techniques, we search for a solution in a limited portion of the model space, converging towards solutions that could not correspond with the best one for our specific problem. In order to face this problem, Global optimization techniques are addressed to find the global minimum (or the global maximum) of the objective function over the given set. Unfortunately, finding the global minimum (or maximum) of a function commonly represents a difficult task. Analytical methods are frequently not applicable and the use of numerical solution strategies often is not sufficient ^[3]. Typical techniques based on global or quasi-global search in the model space ^[4], include stochastic methods like Direct Monte-Carlo sampling approaches. Other methods are based on heuristic approaches to explore the model space in a more or less intelligent way. These include, for instance, Ant Colony optimization (ACO), Simulated annealing, Evolutionary algorithms (e.g., genetic algorithms and evolution strategies), and so forth. Despite the many advantages, these types of global optimization methods are generally difficult to put in practice in many situations, especially in three dimensions, due to the very expensive computational process when dealing with large parameter spaces.

In order to face the intrinsic problems of both local and global optimization methods, in this paper, we propose to reformulate the optimization problems in terms of Reinforcement Learning (RL). Our approach aims to teach an "artificial agent" to search for the global minimum of the cost function in the model space using the advantages offered by a large suite of Reinforcement Learning algorithms. These are aimed at mapping situations to actions through the maximization of a "numerical reward signal" ^{[5,6,7,8,9,10,11,12,13]}. In every particular state, an artificial agent learns progressively by continuous interaction with its environment. This can be a true physical environment, as it happens, for instance, in case we desire to teach an agent to move through a real physical space. More in general, the environment can consist of a virtual space with which one or more artificial agents interact. The effects of every agent's action will be returned by the modified environment in terms of a reward (or a punishment) and a new state. The reward depends on the "quality" of the agent's actions. High rewards correspond with positive impact of the actions on the agent's target, and vice versa. For instance, if the objective of the artificial agent is to find the exit from a maze in the shortest possible time (or through the shortest path), the agent will receive a positive reward every time it moves properly to reach the exit.

The final objective of such a learning strategy is to maximize the total reward accumulated during all iterations (cumulative reward), and not just the immediate reward. In the example of the maze, it means that the agent's objective is to find a global strategy to escape from the maze, rather than just selecting a single local step forward that could lead him into a dead end. This is a crucial point, because the goal of Reinforcement Learning methods is optimizing the agent's actions for a long-term horizon. Such an intrinsic forward-looking approach of RL algorithms can be used with profit to find global solution(s) in many optimization/inversion problems in geophysics (as well as in other fields). In fact, it is easy to grasp the analogies and possible points of connection between geophysical inversion problems and Reinforcement Learning. In the first case, the goal is to find an Earth model that corresponds to a minimum value of a certain cost function. In the second case, the goal is to find an optimal policy through which an agent can maximize its total reward. These are both examples of optimization problems.

In the next methodological section, we will see how the geophysical inverse problem can be reformulated in terms of Reinforcement Learning strategy. For that purpose, we will use a combination of Q-Learning, Temporal Difference and Epsilon-Greedy algorithms. We will see that these methods fit the purpose of optimizing the exploration of the parameter-space in inversion problems. Finally, we will test our approach using synthetic geo-electric data, plus a seismic data set available in the public domain.

2. Theoretical framework

Reinforcement Learning includes a suite of algorithms and techniques through which an "artificial agent" learns an optimal "behavior" by interacting with a dynamic "environment" and by maximizing a "reward metric" for the task, without being explicitly programmed for that task and without human intervention. The artificial agent selects those actions that allow increasing the cumulative reward, r ∈ R, achievable from a given state, s ∈ S (Figure 1).

Figure 1. Conceptual scheme of Reinforcement Learning.

DownLoad: Full-Size Img PowerPoint

A "discount factor", γ, is applied to the long term rewards with the scope of giving progressively lower weights to rewards received far in the future. The agent's goal is to learn, by trials and errors, a "policy" for maximizing such cumulative long-term reward. The policy is often denoted by the symbol π. It consists of a function of the current environment state, s, belonging to the set S of all possible states, and returns an action, a, belonging to the set A of all possible actions.

$\pi \left(s\right):S\to A .$

(1)

There are many different Reinforcement Learning techniques. Among the various methods, the Q-Learning method ^[14] is a suitable approach for solving optimization/inverse problems. The name derives from the Q-function that provides a measure of the Quality (in terms of effectiveness for a given task) of an action that the agent takes starting from a certain state. It is defined as follows:

$Q(s, a) = S\times A\to R .$

(2)

The Bellman equation below provides an operative definition of the maximum cumulative reward. This is given by the reward r that the agent received for entering the current state s and action a, plus the maximum future reward for the next state s', taking all the possible actions ${a}^{'}$ from that state:

$Q\left(s, a\right) = r+\gamma {max}_{a^{'}}Q\left({s}^{'}, {a}^{'}\right).$

(3)

In formula (3), the symbol γ indicates the "discount factor". It is introduced for balancing the contribution of future rewards with respect to the immediate reward. The value of Q(s, a) can be found recursively: the algorithm starts by using random values (or any guess value) for the Q-function. Then, when the agent proceeds exploring its environment, the initial Q values progressively converge towards the optimal ones, based on the positive and/or negative feedback that the agent receives from its environment. The "Temporal Difference" (briefly TD) method (formula 4 below) provides a practical way for updating the Q values, as follows:

${Q}^{new}\left({s}_{t}, {a}_{t}\right)\leftarrow Q\left({s}_{t}, {a}_{t}\right)+\alpha \cdot \left[{r}_{t}+\gamma \cdot \underset{a}{\mathrm{max}}Q\left({s}_{t+1}, a\right)-Q\left({s}_{t}, {a}_{t}\right)\right]$

(4)

We can see that the new value of Q for state ${s}_{t}$ and action ${a}_{t}$ , is obtained by adding to the previous Q value a new term (in the square parenthesis) called temporal difference. This, in turn, is multiplied by a factor α that represents the learning rate and is commonly determined empirically by the user. The temporal difference consists of the immediate reward, ${r}_{t}$ , plus the difference between the maximum Q value for all the actions that the agent can take from the state s_t+1, minus the old value of Q. The $\underset{a}{\mathrm{max}}Q\left({s}_{t+1}, a\right)$ term is multiplied by the above mentioned discount factor, γ.

Now, we must explain how we define the Q values in the frame of our integrated Inversion-Reinforcement Learning (called, briefly, RL-Inv) approach. In other words, we must clarify how we assign a reward to the artificial agent (the optimization algorithm) while it explores the model space. In our method, we set the Q-function inversely proportional to the cost function (that, in turn, depends on the difference between observed and predicted responses) after a certain number N of iterations. The user determines such N value empirically. Indeed, we assume that a good convergence path towards a final low misfit represents a reasonable long-term reward for our Reinforcement Learning agent. In that case, low misfit (as well as low values of the cost function) correspond to high rewards and high Q values.

For instance, let us suppose that we apply a Least Square optimization algorithm to solve our inverse problem; that algorithm coincides with our agent. In that case, we can define the cost function Φ(m) as follows:

$\mathit{\Phi (}\mathit{\boldsymbol{m}}\mathit{)} = {\mathit{(}{\mathit{\boldsymbol{d}}_{obs}} - g\mathit{(}\mathit{\boldsymbol{m}}\mathit{))}^T}{\mathit{\boldsymbol{W}}_\mathit{\boldsymbol{d}}}\mathit{(}{\mathit{\boldsymbol{d}}_{obs}} - g\mathit{(}\mathit{\boldsymbol{m}}\mathit{))} + \eta \cdot {\mathit{\boldsymbol{m}}^T}\mathit{\boldsymbol{Rm}}\mathit{\boldsymbol{.}}$

(5)

In formula (5), m represents the vector of model parameters, or model vector; d_obs represents the data vector (observations); g(m) is the forward operator by which we calculate the predicted response in the model vector m; the symbol T indicates "transpose"; W_d is he data covariance matrix for taking data uncertainties into account; R is a smoothing operator applied to the model vector m as a regularization term; η is a factor regulating the weight of the smoothing term in the cost function.

In our procedure, we calculate Φ(m) at each iteration and store its value at every iteration. In such a way, we can calculate and store the correspondent Q value as follows:

$Q\left( {{s}_{t}}, {{a}_{t}} \right)\approx {}^{1}\!\!\diagup\!\!{}_{\Phi }\;(\mathit{\boldsymbol{m}}).$

(6)

Next, let us clarify how the Q-Learning formulas contribute to the inversion. In the frame of the Q-Learning approach, we need to estimate a cumulative reward by taking into account both the immediate as well as the long-term reward. In our approach, the immediate reward is given by the inverse of the cost function after just one or two iterations, as in formula (6). Instead, the long-term reward is given by the inverse of the cost function estimated after a "significant number" of iterations (such number depends on the inverse problem and is decided by the user, case by case). In such a way, we intend to set a policy that minimizes the cost function through a balanced combination of both short-term and-long term views. This concept will be further expanded in the next two sections.

The Bellman equation (3) and the Temporal Difference iterative method (4) allow us estimating and progressively updating the values of the Q-function during the optimization (inversion) process. These values depend on the starting models and on the exploration paths in the model-space. The goal of our approach is to find an optimal policy for our optimization agent. Such a policy will coincide with the "optimal" exploration/exploitation path in the model space aimed at maximizing the Q-function. Hence, a crucial point is how the model space (that represents the environment of our Reinforcement Learning approach) is explored.

2.1. Q-Learning, model space exploration and inversion

In the frame of geophysical inversion (as well as in other optimization problems), the environment of the Reinforcement Learning problem is represented by the space of model parameters, or model space (Figure 2). As we said earlier, the agent corresponds with the optimization algorithm through which we try to minimize the cost function. At each iteration, the algorithm performs an action: it explores the environment in order to update the current geophysical model with the goal to reduce the misfit between observed and predicted responses. In our approach, we perform such an exploration using the Epsilon-Greedy algorithm. This provides an effective strategy for facing the well-known "Exploration vs. Exploitation" question. Let us explain the basics of this strategy and the reason why we included it in our approach.

Figure 2. Conceptual link between the Reinforcement Learning approach and the exploration of the model space in optimization problems.

DownLoad: Full-Size Img PowerPoint

Exploration allows an agent improving its current state at each action, leading to a long-term benefit. In the frame of geophysical inversion, this corresponds to retrieve a distribution of model parameters that allows lowering the cost function (or the misfit) and, consequently, improving the Earth model. On the other hand, exploitation means to choose the greedy action to get the most short-term reward by exploiting the agent's current action-value. For instance, in case of Gradient-based optimization methods, this action corresponds to taking repeated steps in the opposite direction of the gradient of the cost function. The crucial point is that by being greedy with respect to immediate action-reward estimates, may not actually lead towards the maximum long-term reward, causing a sub-optimal behaviour. In other words, trying to minimize the cost function at each step could not represent the optimal inversion policy.

2.2. Epsilon-Greedy approach

Epsilon-Greedy is an effective approach aimed at balancing exploration and exploitation by choosing randomly between these two possibilities. The term "epsilon" refers to the probability of choosing to explore that is commonly lower than the probability to exploit. In other words, the optimization/inversion algorithm exploits most of the time with a small chance of exploring. It means that it updates the model parameters respecting the condition of reducing the cost function at each iteration (exploitation). However, it explores the model parameters (with lower probability: epsilon < < 1) in different directions too, even if that choice could imply a temporary increase of the cost function. Figure 3 shows a scheme of such approach and its pseudo-code.

Figure 3. Scheme of the Epsilon-Greedy approach (left) and its pseudo-code (right).

DownLoad: Full-Size Img PowerPoint

At the same time, by applying the Bellman equation and the Temporal Difference method, we aim to a long-term reward that is minimizing the cost function after a significant number N of iterations (and not just the cost function at each individual iteration). This strategy allows us sampling large portions of the model space that otherwise would be excluded by a traditional greedy optimization strategy. Finally, we will get the optimal inversion policy. This uses the best exploitation/exploration strategy, produces the lowest final value of the cost function and the best inverted model.

The block diagram of figure 4 summarizes the entire procedure, showing the sequence of steps through which we update the model parameters by maximizing the Q-function through a combination of Epsilon-Greedy exploration strategy and Bellman/Temporal Difference equations.

Figure 4. Block diagram of the Reinforcement Learning-Inversion (RL-Inv) approach.

DownLoad: Full-Size Img PowerPoint

With reference to figure 4, in order to clarify better how and where the Q-Learning formulas contribute to the inversion process, we schematize the entire workflow through the following key steps:

1) Create m starting models (process initialization). 2) Choose n (number of iterations). 3) Run n iterations for each model. 4) Update each model after n iterations. 5) Calculate the inverse of cost function (eq.6) after 1 or 2 iterations (short-term reward for each model). 6) Calculate the inverse of cost function (eq.6) after n iterations (long-term reward for each model). 7) Calculate (or update) the cumulative reward (Q values) using the Bellman and TD formulas (eqq.3 and 4). 8) Store Q values and update the Q-Table. 9) Chose epsilon (for the epsilon-Greedy method), as shown in figure 3. 10) Select model with the highest total reward with probability = 1-epsilon (exploitation). 11) Alternatively, select random model with probability = epsilon (exploration). 12) Use the selected model, perturb it and create other m initial models. 13) Iterate from step 3. 14) Exit from the loop when the cost function and the cumulative reward Q is stationary. 15) Finally, select the model with the highest Q-value (lowest cost function).

3. Examples

In this section, we discuss two tests where we apply the RL-Inv method to two types of data set. In the first case, we use synthetic data obtained through a simulated resistivity survey. In the second case, we use refraction seismic data available in the public domain. For each test, we compare the final models obtained through a "standard" inversion/optimization approach and the RL-Inv methodology.

3.1. Synthetic test on geo-electric data

In this test, we simulated the acquisition of DC (Direct Current) geo-electric data along a line 550 m long, with electrodes deployed with a regular spacing of 10 m. The upper panel of figure 5 shows the "true" resistivity scenario in which we simulated the resistivity survey. The model consists of two stacked resistive layers embedded in a conductive uniform background. The lower panel of the same figure shows the data (apparent resistivity section) of the simulated DC response. After adding 5% of Gaussian noise to the simulated response, our goal was to invert the synthetic data in order to retrieve the correct resistivity model. We started from a half-space initial guess, assuming no a priori information.

Figure 5. "True" (original) resistivity model (upper panel) and observed apparent resistivity (lower panel). Colour scale represents resistivity, in Ω·m.

DownLoad: Full-Size Img PowerPoint

Despite its apparent simplicity, the resistivity model shown in figure 5 is not easy to retrieve by data inversion without using any prior information. Many equivalent geophysical models can honour the data equally well if we do not use any constraint. The inversion algorithm that we used in this case is a "standard" Damped Least Square optimization algorithm that minimises iteratively the cost function, like the one expressed by eq. (5). The regularization operator consists of a smoothing functional that allows finding smoothed model solutions. The effect will be that the two resistive layers cannot be adequately distinguished and, after the inversion process, they appear "mixed" into a unique layer. This is clearly shown in figure 6.

Figure 6. Inverted resistivity model (upper panel) using a Damped Least Square Optimization algorithm. The "true" model is shown again in the lower panel, for comparison.

DownLoad: Full-Size Img PowerPoint

Next, we performed again the inversion of the same synthetic data, but this time through our Reinforcement Learning approach (RL-Inv), in order to verify if it was possible to find an inverse solution more consistent with the original resistivity model. Figure 7 shows the inverted resistivity model (upper panel). In this case, the RL-Inv solution shows the two resistive layers properly separated. Furthermore, they were retrieved with almost correct resistivity values, although the resistivity of the upper layer is slightly overestimated.

Figure 7. Inverted resistivity model (upper panel) using the RL-Inv approach. The "true" model is shown again in the lower panel, for comparison.

DownLoad: Full-Size Img PowerPoint

Figure 8 shows the cross plot of predicted vs. observed apparent resistivity for both inversion results. This type of graph is useful because it provides a synoptic view of the misfit between observed and predicted geo-electrical responses. In case of perfect fit, the points should be on a 45-degree tilted line (green line in the figure). The scattering of the points above the ideal best-fit line is a measure of the misfit and of the noise in the data. Both cross plots show some level of scattering and of resistivity overestimation; however, the misfit of the second inversion result (from RL-Inv) is less than the one obtained through the traditional Damped Least Square approach. Furthermore, the second scattering cross-plot shows two clusters of scattered points that are related with the two separate resistivity layers.

Figure 8. Cross plot of predicted vs. observed apparent resistivity for the Damped Least Square inversion result (upper panel), compared with the cross plot for RL-inv results (lower panel).

DownLoad: Full-Size Img PowerPoint

In summary, the RL-Inv approach produced results that are more consistent with the original resistivity scenario used for the simulation.

3.2. Test with public seismic data

In this second example, we applied the RL-Inv method to a classical refraction seismic data set with heterogeneous overburden and some high-velocity bedrock. This data set is included in the examples provided in the public-domain repository prepared for testing the open source "pyGIMLi software library" ^[15]. Figure 9 shows the data set in terms of travel times vs. offsets. The complex trends of the travel-time curves vs. offset suggest significant variability in the velocity field. We can observe frequent variations in the slope of the curves that indicate lateral as well as vertical velocity changes. Such complexity in the data space corresponds to a similar complexity in the model space. In scenarios like this, our RL-Inv approach can be useful to find a global solution for the refraction tomography problem, limiting the risk to fall in local minima of the cost function during the inversion process. We followed the scheme of Figure 3 by exploring the model space through the Epsilon-Greedy approach. First, we created an initial Q-Table based on the cost function values (here expressed in terms of Chi² values) for a set of different starting models (Table 1). Next, the optimization agent started exploring the model-space (in this case, the unknown model parameter is P-Velocity, V_p) through the Epsilon-Greedy approach.

Figure 9. Data set: refraction travel-times (s) vs. offsets, x(m).

DownLoad: Full-Size Img PowerPoint

Table 1. Q-Table filled with the inverse values of the cost function for each search direction.

| Show Table

DownLoad: CSV

Figure 10 shows an example of "Model selection histogram" obtained through exploration of the model space with the Epsilon-Greedy method. The bars of each histogram are proportional to the probability to select one model among many possible starting models. In this example, we have considered just 20 possible candidate models, for illustrative purposes. For each model, we calculated the cumulative reward using the Bellman formulas, as explained earlier in the methodological section. We can see that for low values of the epsilon parameter, the method selects almost exclusively the model(s) with high cumulative reward (some examples are indicated by the arrows in the figure 10). This corresponds to adopt a greedy strategy, with prevalence of exploitation of the model(s) with high reward. On the other side, by choosing high values of epsilon, model selection tends to be random, allowing exploring the model space through directions that would otherwise have been ignored. In other words, an appropriate setting of the epsilon parameter allows a balanced policy between exploration and exploitation in the model space during the inversion process. In this specific test, we performed many tests by setting the epsilon parameter in the range between 0.0 and 1.0. There is not any absolute rule to find the optimal value of epsilon. However, a good strategy is to make epsilon variable: as trials increase, epsilon should decrease. Indeed, as trials increase, we have less need of exploration and more convenience of exploitation, in order to get the maximum benefit from our policy.

Figure 10. Example of "Model selection histograms" using the Epsilon-Greedy method, for variable values of epsilon. Test on 20 different models.

DownLoad: Full-Size Img PowerPoint

During the inversion process, the Q-Table was progressively updated. As explained earlier, the rule for updating the Q-Table is given by the Bellman equation and the iterative Temporal-difference method. In summary, the agent (the minimization algorithm) explores the model space and selects the optimal path that corresponds with the direction in the space of parameters with the highest cumulative reward. At the same time, it does not neglect to explore alternative directions in the model space, although with lower probability. After many iterations, the agent learns to move in the model space following the most convenient policy. This corresponds with the one that allows finding the global minimum of the cost function. Our inversion test seems to confirm the effectiveness of such strategy, as in the previous test. Figure 11 shows some examples of velocity models obtained by travel-time tomography, with the correspondent ray tracing. Each individual model corresponds to a certain point of the cost function in the model space. For each path explored in the model space, we have a correspondent suite of values of the cost function. Finally, the best model (left panel of Figure 12) is the one retrieved through the RL-Inv approach. It shows the V_p parameter distribution that corresponds to the highest cumulative reward. For comparison, the right panel of the same figure shows the V_p model obtained without the support of the RL approach, using a "standard" optimization approach. Compared with the RL-Inv solution, the "standard" solution tends to overestimate the bedrock velocity and is not able to highlight properly the heterogeneities in the overburden.

Figure 11. Examples of velocity models obtained by travel-time tomography, with the correspondent ray tracing.

DownLoad: Full-Size Img PowerPoint

Figure 12. Comparison between the inverted Vp models obtained by RL-Inv (left) and by a "standard" seismic refraction tomography approach (based on generalized Gauss-Newton optimization method (right).

DownLoad: Full-Size Img PowerPoint

4. Conclusions

We introduced a new optimization/inversion approach fully integrated with Q-Learning, Temporal Difference and Epsilon-Greedy methods. These allow expanding the exploration of the model-space, minimizing the misfit and limiting the problem of falling in local inversion minima. The advantages of our approach are clearly highlighted through the comparative test results on multidisciplinary data (electrical and seismic). Finally, we remark that we expect the greatest benefits from our method in those applications where an extended exploration of the model-space is difficult or prohibitive, due to the size of the data-model space and the complexity of the inversion problem. For instance, interesting cases include full-wave seismic inversion and simultaneous joint inversion of multi-physics data.

Conflict of interest

The author declares no conflict of interest.

Web links

pyGIMLi examples data repository: https://github.com/gimli-org/example-data/blob/master/traveltime/koenigsee.sgt.

Acknowledgments

We acknowledge that the data received from AHRI enabled us to perform this secondary analysis. We are very grateful to the Vukuzazi team members for their technical assistance in interpreting some of the variables used in the dataset.

Authors' contributions

All authors made a significant contribution to this study, whether that is in conception, data analysis and interpretation. All authors also took part in the drafting, revising, and gave approval for the publication of this manuscript.

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Conflict of interest

Authors involved in this study have no conflict of interest to declare.

References

[1]	UNAIDS, UNAIDS DATA 2022. Switzerland UNAIDS, 2022. Available from: unaids.org/sites/default/files/media_asset/data-book-2022_en.pdf
[2]	Jahagirdar D, Walters MK, Novotney A, et al. (2021) Global, regional, and national sex-specific burden and control of the HIV epidemic, 1990–2019, for 204 countries and territories: the Global Burden of Diseases Study 2019. Lancet HIV 8: e633-e651. https://doi.org/10.1016/S2352-3018(21)00152-1
[3]	UNAIDS, UNAIDS data 2021. Switzerland UNAIDS, 2021. Available from: https://www.unaids.org/sites/default/files/media_asset/JC3032_AIDS_Data_book_2021_En.pdf
[4]	Woolf-King SE, Fatch R, Cheng DM, et al. (2018) Alcohol use and unprotected sex among HIV-infected Ugandan adults: findings from an event-level study. Arch Sex Behav 47: 1937-1848. https://doi.org/10.1007/s10508-017-1131-1
[5]	Muyindike WR, Lloyd-travaglini C, Fatch R, et al. (2017) Phosphatidylethanol confirmed alcohol use among ART-naïve HIV-infected persons who denied consumption in rural Uganda. AIDS Care 29: 1442-1447. https://doi.org/10.1080/09540121.2017.1290209
[6]	Kiwanuka N, Ssetaala A, Ssekandi I, et al. (2017) Population attributable fraction of incident HIV infections associated with alcohol consumption in fishing communities around Lake Victoria, Uganda. PLoS One 12: e0171200. https://doi.org/10.1371/journal.pone.0171200
[7]	Eyawo O, McGinnis KA, Justice AC, et al. (2018) Alcohol and mortality: combining self-reported (AUDIT-C) and biomarker detected (PEth) alcohol measures among HIV infected and uninfected. J Acquir Immune Defic Syndr 77: 135-143. https://doi.org/10.1097/QAI.0000000000001588
[8]	Magidson JF, Fatch R, Orrell C, et al. (2019) Biomarker-measured unhealthy alcohol use in relation to CD4 count among individuals starting ART in Sub-Saharan Africa. AIDS Behav 23: 1656-1667. https://doi.org/10.1007/s10461-018-2364-2
[9]	Baum MK, Rafie C, Lai S, et al. (2010) Alcohol use accelerates HIV disease progression. AIDS Res Hum Retroviruses 26: 511-518. https://doi.org/10.1089/aid.2009.0211
[10]	Samet JH, Cheng DM, Libman H, et al. (2007) Alcohol consumption and HIV disease progression. J Acquir Immune Defic Syndr 46: 194-199. https://doi.org/10.1097/QAI.0b013e318142aabb
[11]	Hahn JA, Samet JH (2010) Alcohol and HIV disease progression: weighing the evidence. Curr HIV/AIDS Rep 7: 226-233. https://doi.org/10.1007/s11904-010-0060-6
[12]	Wandera B, Tumwesigye NM, Nankabirwa JI, et al. (2017) Efficacy of a single, brief alcohol reduction intervention among men and women living with HIV/AIDS and using alcohol in Kampala, Uganda: A randomized trial. J Int Assoc Provid AIDS Care 16: 276-285. https://doi.org/10.1177/2325957416649669
[13]	Hahn JA, Cheng DM, Emenyonu NI, et al. (2018) Alcohol use and HIV disease progression in an antiretroviral naive cohort. J Acquir Immune Defic Syndr 77: 492-501. https://doi.org/10.1097/QAI.0000000000001624
[14]	Bryant KJ, Nelson S, Braithwaite RS, et al. (2010) Integrating HIV/AIDS and alcohol research. Alcohol Res Health 33: 167-178.
[15]	Amedee A, Nichols W, Robichaux S, et al. (2014) Chronic alcohol abuse and HIV disease progression: studies with the non-human primate model. Curr HIV Res 12: 243-253. https://doi.org/10.2174/1570162x12666140721115717
[16]	Molina PE, Bagby GJ, Nelson S (2014) Biomedical consequences of alcohol use disorders in the HIV-infected host. Curr HIV Res 12: 265-275. https://doi.org/10.2174/1570162x12666140721121849
[17]	Talman A, Bolton S, Walson JL (2013) Interactions between HIV/AIDS and the environment: toward a syndemic framework. Am J Public Health 103: 253-261. https://doi.org/10.2105/AJPH.2012.300924
[18]	Massyn N, Barron P, Day C, et al. (2020) District Health Baromether 2018/2019. Durban: Health Systems Trust.
[19]	Gunda R, Koole O, Gareta D, et al. (2022) Cohort profile: the Vukuzazi (“Wake Up and Know Yourself” in isiZulu) population science programme. Int J Epidemiol 51: e131-e142. https://doi.org/10.1093/ije/dyab229
[20]	World Health Organization, WHO case definitions of HIV for surveillance and revised clinical staging and immunological classification of HIV-related disease in adults and children. World Health Organization, 2007. Available from: https://apps.who.int/iris/handle/10665/43699
[21]	Neuman MG, Schneider M, Nanau RM, et al. (2012) Alcohol consumption, progression of disease and other comorbidities, and responses to antiretroviral medication in people living with HIV. AIDS Res Treat 2012: 751827. https://doi.org/10.1155/2012/751827
[22]	Marshall BDL, Tateb JP, McGinnis KA, et al. (2017) Long-term alcohol use patterns and HIV disease severity. AIDS 31: 1313-1321. https://doi.org/10.1097/QAD.0000000000001473
[23]	Wu ES, Metzger DS, Lynch KG, et al. (2011) Association betwwn alcohol use and HIV viral load. J Acquir Immune Defic Syndr 56: e129-130. https://doi.org/10.1097/QAI.0b013e31820dc1c8
[24]	da Silva CM, Mendoza-Sassi RA, da Mota LD, et al. (2017) Alcohol use disorders among people living with HIV/AIDS in Southern Brazil: prevalence, risk factors and biological markers outcomes. BMC Infect Dis 17: 263. https://doi.org/10.1186/s12879-017-2374-0
[25]	Ghebremichaela M, Paintsil E, Ickovicsc JR, et al. (2009) Longitudinal association of alcohol use with HIV disease progression and psychological health of women with HIV. AIDS Care 21: 834-841. https://doi.org/10.1080/09540120802537864
[26]	Carrico AW, Hunt PW, Emenyonu NI, et al. (2015) Unhealthy alcohol use is associated with monocyte activation prior to starting anti-retroviral therapy. Alcohol Clin Exp Res 39: 2422-2426. https://doi.org/10.1111/acer.12908
[27]	Sureshchandra S, Raus A, Jankeel A, et al. (2019) Dose-dependent effects of chronic alcohol drinking on peripheral immune responses. Sci Rep 9: 7847. https://doi.org/10.1038/s41598-019-44302-3
[28]	Dasarathy S (2016) Nutrition and alcoholic liver disease: effects of alcoholism on nutrition, effects of nutrition on alcoholic liver disease and nutritional therapies for alcoholic liver disease. Clin Liver Dis 20: 535-550. https://doi.org/10.1016/j.cld.2016.02.010
[29]	Langford SE, Ananworanich J, Cooper DA (2007) Predictors of disease progression in HIV infection: A review. AIDS Res Ther 4: 1-14. https://doi.org/10.1186/1742-6405-4-11
[30]	Manfredi R (2002) HIV disease and advanced age: An increasing therapeutic challenge. Drugs Aging 19: 647-669. https://doi.org/10.2165/00002512-200219090-00003
[31]	Klimas N, Koneru AOB, Fletcher MA (2008) Overview of HIV. Psychosom Med 70: 523-530. https://doi.org/10.1097/PSY.0b013e31817ae69f
[32]	Kalichman SC, Pellowski J, Turner C (2011) Prevalence of sexually transmitted co-infections in people living with HIV/AIDS: systematic review with implications for using HIV treatments for prevention. Sex Transm Infect 87: 183-190. https://doi.org/10.1136/sti.2010.047514
[33]	Abaasa A, Crook A, Gafos M, et al. (2013) Long-term consistent use of a vaginal microbicide gel among HIV-1 sero-discordant couples in a phase III clinical trial (MDP 301) in rural south-west Uganda. Trials 14: 33. https://doi.org/10.1186/1745-6215-14-33
[34]	Abbai NS, Wand H, Ramjee G (2016) Biological factors that place women at risk for HIV: evidence from a large-scale clinical trial in Durban. BMC Womens Health 16: 19. https://doi.org/10.1186/s12905-016-0295-5
[35]	Rogers RG, Everett BG, Onge JMS, et al. (2010) Social, behavioral, and biological factors, and sex differences in mortality. Demography 47: 555-78. https://doi.org/10.1353/dem.0.0119

This article has been cited by:

1.	Valeria Giampaolo, Paolo Dell’Aversana, Luigi Capozzoli, Gregory De Martino, Enzo Rizzo, Optimization of Aquifer Monitoring through Time-Lapse Electrical Resistivity Tomography Integrated with Machine-Learning and Predictive Algorithms, 2022, 12, 2076-3417, 9121, 10.3390/app12189121
2.	Yulong Zhao, Ruike Luo, Longxin Li, Ruihan Zhang, Deliang Zhang, Tao Zhang, Zehao Xie, Shangui Luo, Liehui Zhang, A review on optimization algorithms and surrogate models for reservoir automatic history matching, 2024, 233, 29498910, 212554, 10.1016/j.geoen.2023.212554
3.	Ravichandran Sowmya, Manoharan Premkumar, Pradeep Jangir, Newton-Raphson-based optimizer: A new population-based metaheuristic algorithm for continuous optimization problems, 2024, 128, 09521976, 107532, 10.1016/j.engappai.2023.107532
4.	Chang Soon Kim, Van Quan Dao, Jinje Park, Byungho Jang, Seok-Ju Lee, Minwon Park, Lei Chen, Combining finite element and reinforcement learning methods to design superconducting coils of saturated iron-core superconducting fault current limiter in the DC power system, 2023, 18, 1932-6203, e0294657, 10.1371/journal.pone.0294657
5.	Sungil Kim, Tea-Woo Kim, Suryeom Jo, Artificial intelligence in geoenergy: bridging petroleum engineering and future-oriented applications, 2025, 15, 2190-0558, 10.1007/s13202-025-01939-3

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Medical Science

0.7

Metrics

Article views(1647) PDF downloads(136) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

AIMS Medical Science

Alcohol consumption and HIV disease prognosis among virally unsuppressed in Rural KwaZulu Natal, South Africa

Related Papers:

Abstract

1. Introduction

2. Theoretical framework

2.1. Q-Learning, model space exploration and inversion

2.2. Epsilon-Greedy approach

3. Examples

3.1. Synthetic test on geo-electric data

3.2. Test with public seismic data

4. Conclusions

Conflict of interest

Web links

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Abstract

1. Introduction

2. Theoretical framework

2.1. Q-Learning, model space exploration and inversion

2.2. Epsilon-Greedy approach

3. Examples

3.1. Synthetic test on geo-electric data

3.2. Test with public seismic data

4. Conclusions

Conflict of interest

Web links

References

AIMS Medical Science

Alcohol consumption and HIV disease prognosis among virally unsuppressed in Rural KwaZulu Natal, South Africa

Related Papers:

Abstract

1. Introduction

2. Theoretical framework

2.1. Q-Learning, model space exploration and inversion

2.2. Epsilon-Greedy approach

3. Examples

3.1. Synthetic test on geo-electric data

3.2. Test with public seismic data

4. Conclusions

Conflict of interest

Web links

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog

Abstract

1. Introduction

2. Theoretical framework

2.1. Q-Learning, model space exploration and inversion

2.2. Epsilon-Greedy approach

3. Examples

3.1. Synthetic test on geo-electric data

3.2. Test with public seismic data

4. Conclusions

Conflict of interest

Web links

References