Research article Special Issues

Real-time responses to epidemics: A Reinforcement-Learning approach

  • Published: 06 February 2026
  • Open-loop optimal control applied to epidemic outbreaks is a valuable tool to develop control principles and inform future preparedness guidelines. A drawback of this approach is its assumption of complete knowledge of both transmission dynamics and the effects of policy measures. As a result, such methods lack responsiveness to real-time conditions, since they do not integrate feedback from the evolving epidemic state. Overcoming this requires a closed-loop approach. We propose a novel closed-loop method for real-time social distancing responses using a general Reinforcement Learning (RL)-based decision-support framework. It enables adaptive management of social distancing policies during an epidemic, thereby balancing direct health costs (e.g., hospitalizations, deaths) with indirect (economic, social, psychological) costs from prolonged interventions. The framework builds on and compares with a COVID-19 model that was previously used for open-loop assessments, thereby capturing key disease characteristics like asymptomatic transmission, healthcare saturation, and quarantine. We test the framework by evaluating optimal real-time responses for a severe outbreak under varying priorities of indirect costs by public authorities. The full spectrum of policy strategies—elimination, suppression, and mitigation—emerges depending on the cost prioritization as a result of closed-loop adaptability. The framework supports timely, informed decisions by governments and health authorities during current or future pandemics.

    Citation: Gabriele Gemignani, Alberto d'Onofrio, Alberto Landi, Giulio Pisaneschi, Piero Manfredi. Real-time responses to epidemics: A Reinforcement-Learning approach[J]. Mathematical Biosciences and Engineering, 2026, 23(3): 753-775. doi: 10.3934/mbe.2026029

    Related Papers:

  • Open-loop optimal control applied to epidemic outbreaks is a valuable tool to develop control principles and inform future preparedness guidelines. A drawback of this approach is its assumption of complete knowledge of both transmission dynamics and the effects of policy measures. As a result, such methods lack responsiveness to real-time conditions, since they do not integrate feedback from the evolving epidemic state. Overcoming this requires a closed-loop approach. We propose a novel closed-loop method for real-time social distancing responses using a general Reinforcement Learning (RL)-based decision-support framework. It enables adaptive management of social distancing policies during an epidemic, thereby balancing direct health costs (e.g., hospitalizations, deaths) with indirect (economic, social, psychological) costs from prolonged interventions. The framework builds on and compares with a COVID-19 model that was previously used for open-loop assessments, thereby capturing key disease characteristics like asymptomatic transmission, healthcare saturation, and quarantine. We test the framework by evaluating optimal real-time responses for a severe outbreak under varying priorities of indirect costs by public authorities. The full spectrum of policy strategies—elimination, suppression, and mitigation—emerges depending on the cost prioritization as a result of closed-loop adaptability. The framework supports timely, informed decisions by governments and health authorities during current or future pandemics.



    加载中


    [1] A. Brodeur, D. Gray, A. Islam, S. Bhuiyan, A literature review of the economics of covid-19, J. Econ. Surv., 35 (2021), 1007–1044. https://doi.org/10.1111/joes.12423 doi: 10.1111/joes.12423
    [2] R. Horton, The COVID-19 catastrophe: What's gone wrong and how to stop it happening again, John Wiley & Sons, 2021.
    [3] R. M. Anderson, R. M. May, Infectious diseases of humans: dynamics and control, Oxford university press, 1991. https://doi.org/10.1093/oso/9780198545996.001.0001
    [4] N. M. Ferguson, D. Laydon, G. Nedjati-Gilani, N. Imai, K. Ainslie, M. Baguelin, et al., Impact of non-pharmaceutical interventions (npis) to reduce covid-19 mortality and healthcare demand. imperial college covid-19 response team, Imperial College COVID-19 Response Team, 20 (2020), 77482. https://doi.org/10.25561/77482 doi: 10.25561/77482
    [5] M. G. Baker, N. Wilson, T. Blakely, Elimination could be the optimal response strategy for covid-19 and other emerging pandemic diseases, BMJ, 371 (2020), m4907. https://doi.org/10.1136/bmj.m4907 doi: 10.1136/bmj.m4907
    [6] A. K. Sabherwal, A. Sood, M. A. Shah, Evaluating mathematical models for predicting the transmission of covid-19 and its variants towards sustainable health and well-being, Discov. Sustain., 5 (2024), 38. https://doi.org/10.1007/s43621-024-00213-6 doi: 10.1007/s43621-024-00213-6
    [7] M. Gatto, E. Bertuzzo, L. Mari, S. Miccoli, L. Carraro, R. Casagrandi, et al. Spread and dynamics of the covid-19 epidemic in italy: Effects of emergency containment measures, Proceed. Nat. Aca. Sci., 117 (2020), 10484–10491. https://doi.org/10.1073/pnas.2004978117 doi: 10.1073/pnas.2004978117
    [8] K. Wickwire, Mathematical models for the control of pests and infectious diseases: A survey, Theor. Popul. Biol., 11 (1977), 182–238. https://doi.org/10.1016/0040-5809(77)90025-9 doi: 10.1016/0040-5809(77)90025-9
    [9] M. Betta, M. Laurino, A. Pugliese, G. Guzzetta, A. Landi, P. Manfredi, Perspectives on optimal control of varicella and herpes zoster by mass routine varicella vaccination, Proceed. Royal Soc. B Biol. Sci., 283 (2016), 20160054. https://doi.org/10.1098/rspb.2016.0054 doi: 10.1098/rspb.2016.0054
    [10] O. Sharomi, T. Malik, Optimal control in epidemiology, Ann. Oper. Res., 251 (2017), 55–71. https://doi.org/10.1007/s10479-015-1834-4 doi: 10.1007/s10479-015-1834-4
    [11] G. Pisaneschi, M. Tarani, G. Di Donato, A. Landi, M. Laurino, P. Manfredi, Optimal social distancing in epidemic control: Cost prioritization, adherence and insights into preparedness principles, Sci. Rep., 14 (2024), 4365. https://doi.org/10.1038/s41598-024-54955-4 doi: 10.1038/s41598-024-54955-4
    [12] F. Alvarez, D. Argente, F. Lippi, A simple planning problem for covid-19 lock-down, testing, and tracing, Am. Econ. Rev. Insights, 3 (2021), 367–382. https://doi.org/10.1257/aeri.20200201 doi: 10.1257/aeri.20200201
    [13] D. Acemoglu, V. Chernozhukov, I. Werning, M. D. Whinston, Optimal targeted lockdowns in a multigroup sir model, Am. Econ. Rev. Insights, 3 (2021), 487–502. https://doi.org/10.1257/aeri.20200590 doi: 10.1257/aeri.20200590
    [14] S. A. Nowak, P. Nascimento de Lima, R. Vardavas, Optimal non-pharmaceutical pandemic response strategies depend critically on time horizons and costs, Sci. Rep., 13 (2023), 2416. https://doi.org/10.1038/s41598-023-28936-y doi: 10.1038/s41598-023-28936-y
    [15] R. Bellman, Dynamic programming, Science, 153 (1966), 34–37. https://doi.org/10.1126/science.153.3731.34 doi: 10.1126/science.153.3731.34
    [16] L. Ó. Náraigh, Á. Byrne, Piecewise-constant optimal control strategies for controlling the outbreak of covid-19 in the irish population, Math. Biosci., 330 (2020), 108496. https://doi.org/10.1016/j.mbs.2020.108496 doi: 10.1016/j.mbs.2020.108496
    [17] T. A. Perkins, G. España, Optimal control of the covid-19 pandemic with non-pharmaceutical interventions, Bull. Math. Biol., 82 (2020), 118. https://doi.org/10.1007/s11538-020-00795-y doi: 10.1007/s11538-020-00795-y
    [18] D. H. Morris, F. W. Rossine, J. B. Plotkin, S. A. Levin, Optimal, near-optimal, and robust epidemic control, Commun. Phys., 4 (2021), 78. https://doi.org/10.1038/s42005-021-00570-y doi: 10.1038/s42005-021-00570-y
    [19] A. Kasis, S. Timotheou, N. Monshizadeh, M. Polycarpou, Optimal intervention strategies to mitigate the covid-19 pandemic effects, Sci. Rep., 12 (2022), 6124. https://doi.org/10.1038/s41598-022-09857-8 doi: 10.1038/s41598-022-09857-8
    [20] R. Carli, G. Cavone, N. Epicoco, P. Scarabaggio, M. Dotoli, Model predictive control to mitigate the COVID-19 outbreak in a multi-region scenario, Ann. Rev. Control, 50 (2020), 373–393. https://doi.org/10.1016/j.arcontrol.2020.09.005 doi: 10.1016/j.arcontrol.2020.09.005
    [21] J. Köhler, L. Schwenkel, A. Koch, J. Berberich, P. Pauli, F. Allgöwer, Robust and optimal predictive control of the covid-19 outbreak, Ann. Rev. Control, 51 (2021), 525–539. https://doi.org/10.1016/j.arcontrol.2020.11.002 doi: 10.1016/j.arcontrol.2020.11.002
    [22] R. An, J. Hu, L. Wen, A nonlinear model predictive control model aimed at the epidemic spread with quarantine strategy, J. Theor. Biol., 531 (2021), 110915. https://doi.org/10.1016/j.jtbi.2021.110915 doi: 10.1016/j.jtbi.2021.110915
    [23] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, et al., Human-level control through deep reinforcement learning, Nature, 518 (2015), 529–533. https://doi.org/10.1038/nature14236 doi: 10.1038/nature14236
    [24] H. Le, S. Saeedvand, C.-C. Hsu, A comprehensive review of mobile robot navigation using deep reinforcement learning algorithms in crowded environments, J. Intell. Robot. Syst., 110 (2024), 1–22. https://doi.org/10.1007/s10846-024-02198-w doi: 10.1007/s10846-024-02198-w
    [25] B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. Al Sallab, S. Yogamani, et al., Deep reinforcement learning for autonomous driving: A survey, IEEE Transact. Intell. Transport. Syst., 23 (2021), 4909–4926. https://doi.org/10.1109/TITS.2021.3054625 doi: 10.1109/TITS.2021.3054625
    [26] J. Weltz, A. Volfovsky, E. B. Laber, Reinforcement learning methods in public health, Clin. Ther., 44 (2022), 139–154. https://doi.org/10.1016/j.clinthera.2021.11.002 doi: 10.1016/j.clinthera.2021.11.002
    [27] A. A. Abdellatif, N. Mhaisen, A. Mohamed, A. Erbad, M. Guizani, Reinforcement learning for intelligent healthcare systems: A review of challenges, applications, and open research issues, IEEE Int. Things J., 10 (2023), 21982–22007. https://doi.org/10.1109/JIOT.2023.3288050 doi: 10.1109/JIOT.2023.3288050
    [28] E. Jordan, D. E. Shin, S. Leekha, S. Azarm, Optimization in the context of covid-19 prediction and control: A literature review, Ieee Access, 9 (2021), 130072–130093. https://doi.org/10.1109/ACCESS.2021.3113812 doi: 10.1109/ACCESS.2021.3113812
    [29] H. Khadilkar, T. Ganu, D. P. Seetharam, Optimising lockdown policies for epidemic control using reinforcement learning: An ai-driven control approach compatible with existing disease and network models, Transact. Indian Nat. Acad. Eng., 5 (2020), 129–132. https://doi.org/10.1007/s41403-020-00129-3 doi: 10.1007/s41403-020-00129-3
    [30] A. Q. Ohi, M. Mridha, M. M. Monowar, M. A. Hamid, Exploring optimal control of epidemic spread using reinforcement learning, Sci. Rep., 10 (2020), 22106. https://doi.org/10.1038/s41598-020-79147-8 doi: 10.1038/s41598-020-79147-8
    [31] R. Padmanabhan, N. Meskin, T. Khattab, M. Shraim, M. Al-Hitmi, Reinforcement learning-based decision support system for covid-19, Biomed. Signal Process. Control, 68 (2021), 102676. https://doi.org/10.1016/j.bspc.2021.102676 doi: 10.1016/j.bspc.2021.102676
    [32] S. N. Khatami, C. Gopalappa, Deep reinforcement learning framework for controlling infectious disease outbreaks in the context of multi-jurisdictions, medRxiv, 2022–10. https://doi.org/10.1101/2022.10.18.22281063
    [33] M. Arango, L. Pelov, Covid-19 pandemic cyclic lockdown optimization using reinforcement learning, arXiv preprint arXiv: 2009.04647. https://doi.org/10.48550/arXiv.2009.04647
    [34] G. Gemignani, A. Landi, P. Manfredi, G. Pisaneschi, A reinforcement learning-based decision framework for assessing health/economics dilemma in pandemic control, in 2025 25th International Conference on Control Systems and Computer Science (CSCS), IEEE, 2025, 46–53. https://ieeexplore.ieee.org/document/11181599
    [35] D. Buitrago-Garcia, D. Egli-Gany, M. J. Counotte, S. Hossmann, H. Imeri, A. M. Ipekci, et al., Occurrence and transmission potential of asymptomatic and presymptomatic sars-cov-2 infections: A living systematic review and meta-analysis, PLoS Med., 17 (2020), e1003346. https://doi.org/10.1371/journal.pmed.1003346 doi: 10.1371/journal.pmed.1003346
    [36] B. Nogrady, What the data say about asymptomatic covid infections, Nature, 587 (2020), 534–535. https://doi.org/10.1038/d41586-020-03141-3 doi: 10.1038/d41586-020-03141-3
    [37] M. K. Slifka, L. Gao, Is presymptomatic spread a major contributor to covid-19 transmission?, Nat. Med., 26 (2020), 1531–1533. https://doi.org/10.1038/s41591-020-1046-6 doi: 10.1038/s41591-020-1046-6
    [38] W. C. Koh, L. Naing, L. Chaw, M. A. Rosledzana, M. F. Alikhan, et al., What do we know about sars-cov-2 transmission? A systematic review and meta-analysis of the secondary attack rate and associated risk factors, PloS One, 15 (2020), e0240205. https://doi.org/10.1371/journal.pone.0240205 doi: 10.1371/journal.pone.0240205
    [39] B. Rai, A. Shukla, L. K. Dwivedi, Incubation period for covid-19: A systematic review and meta-analysis, J. Public Health, 30 (2022), 2649–2656. https://doi.org/10.1007/s10389-021-01478-1 doi: 10.1007/s10389-021-01478-1
    [40] J. Zhang, M. Litvinova, W. Wang, Y. Wang, X. Deng, X. Chen, et al., Evolving epidemiology of novel coronavirus diseases 2019 and possible interruption of local transmission outside hubei province in china: A descriptive and modeling study, MedRxiv. Available from: https://pubmed.ncbi.nlm.nih.gov/32511424/
    [41] M. Cevik, M. Tate, O. Lloyd, A. E. Maraolo, J. Schafers, A. Ho, Sars-cov-2, sars-cov, and mers-cov viral load dynamics, duration of viral shedding, and infectiousness: A systematic review and meta-analysis, Lancet Microbe, 2 (2021), e13–e22. https://doi.org/10.1016/S2666-5247(20)30172-5 doi: 10.1016/S2666-5247(20)30172-5
    [42] Istituto Superiore di Sanità (ISS), COVID-19: Sorveglianza, impatto delle infezioni ed efficacia vaccinale. Report esteso — Aggiornamento nazionale 25 maggio 2022, Istituto Superiore di Sanità (ISS), Roma, Italy, 2022, In Italian. Available from: https://www.epicentro.iss.it/coronavirus/bollettino/Bollettino-sorveglianza-integrata-COVID-19_25-maggio-2022.pdf
    [43] R. E. Hall, C. I. Jones, P. J. Klenow, Trading off consumption and COVID-19 deaths, Technical report, National Bureau of Economic Research, 2020. https://doi.org/10.3386/w27340
    [44] R. L. Ohsfeldt, C. K.-C. Choong, P. L. Mc Collam, H. Abedtash, K. A. Kelton, R. Burge, Inpatient hospital costs for covid-19 patients in the united states, Adv. Ther., 38 (2021), 5557–5595. https://doi.org/10.1007/s12325-021-01887-4 doi: 10.1007/s12325-021-01887-4
    [45] A. Landi, G. Pisaneschi, M. Laurino, P. Manfredi, Optimal social distancing in pandemic preparedness and lessons from covid-19: Intervention intensity and infective travelers, J. Theor. Biol., 604 (2025), 112072. https://doi.org/10.1016/j.jtbi.2025.112072 doi: 10.1016/j.jtbi.2025.112072
    [46] K. Arulkumaran, M. P. Deisenroth, M. Brundage, A. A. Bharath, Deep reinforcement learning: A brief survey, IEEE Signal Process. Magaz., 34 (2017), 26–38. https://doi.org/10.1109/MSP.2017.2743240 doi: 10.1109/MSP.2017.2743240
    [47] C. Szepesvári, Algorithms for reinforcement learning, Springer nature, 2022. https://doi.org/10.1007/978-3-031-01551-9
    [48] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, et al., Playing atari with deep reinforcement learning, arXiv preprint arXiv: 1312.5602. https://doi.org/10.48550/arXiv.1312.5602
    [49] H. Van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double q-learning, in Proceedings of the AAAI conference on artificial intelligence, vol. 30, 2016. https://doi.org/10.1609/aaai.v30i1.10295
    [50] M. Towers, A. Kwiatkowski, J. Terry, J. U. Balis, G. De Cola, T. Deleu, et al., Gymnasium: A standard interface for reinforcement learning environments, arXiv preprint arXiv: 2407.17032. https://doi.org/10.48550/arXiv.2407.17032
    [51] J. Weng, H. Chen, D. Yan, K. You, A. Duburcq, M. Zhang, et al., Tianshou: A highly modularized deep reinforcement learning library, J. Mach. Learn. Res., 23 (2022), 1–6. Available from: http://jmlr.org/papers/v23/21-1127.html
    [52] T. Eimer, M. Lindauer, R. Raileanu, Hyperparameters in reinforcement learning and how to tune them, in International conference on machine learning, PMLR, 2023, 9104–9149. https://doi.org/10.48550/arXiv.2306.01324
    [53] I. Loshchilov, F. Hutter, Decoupled weight decay regularization, arXiv preprint arXiv: 1711.05101. https://doi.org/10.48550/arXiv.1711.05101
    [54] T. Alleman, E. Torfs, I. Nopens, Covid-19: from model prediction to model predictive control, Unpubl Preprint. Available from: https://www.researchgate.net/publication/360342734_Covid-19_from_model_prediction_to_model_predictive_control
    [55] S. Beregi, K. V. Parag, Optimal algorithms for controlling infectious diseases in real time using noisy infection data, medRxiv, 2024–05. https://doi.org/10.1101/2024.05.24.24307878
    [56] K. Mitsopoulos, L. Baker, C. Lebiere, P. Pirolli, M. Orr, R. Vardavas, Cognitively-plausible reinforcement learning in epidemiological agent-based simulations, Front. Epidemiol., 5 (2025), 1563731. https://doi.org/10.3389/fepid.2025.1563731 doi: 10.3389/fepid.2025.1563731
    [57] M. Bin, P. Y. Cheung, E. Crisostomi, P. Ferraro, H. Lhachemi, R. Murray-Smith, et al., Post-lockdown abatement of covid-19 by fast periodic switching, PLoS Comput. Biol., 17 (2021), e1008604. https://doi.org/10.1371/journal.pcbi.1008604 doi: 10.1371/journal.pcbi.1008604
  • Reader Comments
  • © 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(186) PDF downloads(19) Cited by(0)

Article outline

Figures and Tables

Figures(6)  /  Tables(2)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog