Directed acyclic graphs as conceptual and analytical tools in applied and theoretical epidemiology: advances, setbacks and future possibilities

George TH Ellison; Hanan Rhoma; George TH Ellison; Hanan Rhoma

doi:10.3934/mbe.2025048

Mathematical Biosciences and Engineering

2025, Volume 22, Issue 6: 1280-1306. doi: 10.3934/mbe.2025048

Previous Article Next Article

Review Special Issues

Directed acyclic graphs as conceptual and analytical tools in applied and theoretical epidemiology: advances, setbacks and future possibilities

George TH Ellison ^{1,2
,
,},
Hanan Rhoma ^2,3

1.
Centre for Data Innovation, JB Firth, University of Central Lancashire, Preston PR1 2HE, UK
2.
Leeds Institute for Data Analytics and University of Leeds Medical School, Leeds LS2 9JT, UK
3.
Department of Statistics, Faculty of Sciences, University of Tripoli, Tripoly, Libya

Received: 30 August 2024 Revised: 22 February 2025 Accepted: 28 February 2025 Published: 22 April 2025

In this review, we explore the advances, setbacks, and future possibilities of directed acyclic graphs (DAGs) as conceptual and analytical tools in applied and theoretical epidemiology. DAGs are literal, theoretical or speculative, and diagrammatic representations of known, uncertain, or unknown data generating mechanisms (and dataset generating processes) in which the causal relationships between variables are determined on the basis of two over-riding principles—"directionality" and "acyclicity". Among the many strengths of DAGs are their transparency, simplicity, flexibility, methodological utility, and epistemological credibility. All these strengths can help applied epidemiological studies better mitigate (and acknowledge) the impact of avoidable (and unavoidable) biases in causal inference analyses based on observational/non-experimental data. They can also strengthen the credibility and utility of theoretical studies that use DAGs to identify and explore hitherto hidden sources of analytical and inferential bias. Nonetheless, and despite their apparent simplicity, the application of DAGs has suffered a number of setbacks due to weaknesses in understanding, practice, and reporting. These include a failure to include all possible (conceivable and inconceivable) unmeasured covariates when developing and specifying DAGs; and weaknesses in the reporting of DAGs containing more than a handful of variables and paths, and where the intended application(s) and rationale(s) involved is necessary for appreciating, evaluating, and exploiting any causal insights they might offer. We proposed two additional principles to address these weaknesses and identify a number of opportunities where DAGs might lead to further advancements: The critical appraisal and synthesis of observational studies; the external validity and portability of causality-informed prediction; the identification of novel sources of bias; and the application of DAG-dataset consistency assessment to resolve pervasive uncertainty in the temporal positioning of time-variant and time-invariant exposures, outcomes, and covariates.
- directed acyclic graph,
- DAG,
- causal inference,
- prediction,
- bias,
- epistemology
Citation: George TH Ellison, Hanan Rhoma. Directed acyclic graphs as conceptual and analytical tools in applied and theoretical epidemiology: advances, setbacks and future possibilities[J]. Mathematical Biosciences and Engineering, 2025, 22(6): 1280-1306. doi: 10.3934/mbe.2025048

Related Papers:

Abstract

In this review, we explore the advances, setbacks, and future possibilities of directed acyclic graphs (DAGs) as conceptual and analytical tools in applied and theoretical epidemiology. DAGs are literal, theoretical or speculative, and diagrammatic representations of known, uncertain, or unknown data generating mechanisms (and dataset generating processes) in which the causal relationships between variables are determined on the basis of two over-riding principles—"directionality" and "acyclicity". Among the many strengths of DAGs are their transparency, simplicity, flexibility, methodological utility, and epistemological credibility. All these strengths can help applied epidemiological studies better mitigate (and acknowledge) the impact of avoidable (and unavoidable) biases in causal inference analyses based on observational/non-experimental data. They can also strengthen the credibility and utility of theoretical studies that use DAGs to identify and explore hitherto hidden sources of analytical and inferential bias. Nonetheless, and despite their apparent simplicity, the application of DAGs has suffered a number of setbacks due to weaknesses in understanding, practice, and reporting. These include a failure to include all possible (conceivable and inconceivable) unmeasured covariates when developing and specifying DAGs; and weaknesses in the reporting of DAGs containing more than a handful of variables and paths, and where the intended application(s) and rationale(s) involved is necessary for appreciating, evaluating, and exploiting any causal insights they might offer. We proposed two additional principles to address these weaknesses and identify a number of opportunities where DAGs might lead to further advancements: The critical appraisal and synthesis of observational studies; the external validity and portability of causality-informed prediction; the identification of novel sources of bias; and the application of DAG-dataset consistency assessment to resolve pervasive uncertainty in the temporal positioning of time-variant and time-invariant exposures, outcomes, and covariates.

References

[1]	N. L. Biggs, E. K. Lloyd, R. J. Wilson, Graph Theory, 1736–1936, Oxford University Press, 1986.
[2]	G. R. Law, R. Green, G. T. H. Ellison, Confounding and causal path diagrams, in Modern Methods for Epidemiology, (eds. Y. K. Tu and D. C. Greenwood), Springer, (2012), 1–13. http://dx.doi.org/10.1007/978-94-007-3024-3_1
[3]	J. Zhou, M. Müller, Depth-first discovery algorithm for incremental topological sorting of directed acyclic graphs, Inf. Process. Lett., 88 (2003), 195–200. https://doi.org/10.1016/j.ipl.2003.07.005 doi: 10.1016/j.ipl.2003.07.005
[4]	I. A. Kader, Path partition in directed graph-modeling and optimization, New Trend. Math. Sci., 1 (2013), 74–84.
[5]	P. W. G. Tennant, E. J. Murray, K. F. Arnold, L. Berrie, M. P. Fox, S. C. Gadd, et al., Use of directed acyclic graphs (DAGs) to identify confounders in applied health research: Review and recommendations, Int. J. Epidemiol., 50 (2021), 620–632. https://doi.org/10.1093/ije/dyaa213 doi: 10.1093/ije/dyaa213
[6]	G. T. H. Ellison, Using directed acyclic graphs (DAGs) to represent the data generating mechanisms of disease and healthcare pathways: A guide for educators, students, practitioners and researchers, in Teaching Biostatistics in Medicine and Allied Health Sciences, (eds. R. J. Medeiros Mirra and D. Farnell), Springer Verlag, (2023), 61–101.
[7]	M. Lewis, A. Kuerbis, An overview of causal directed acyclic graphs for substance abuse researchers, J. Drug Alcohol. Res., 5 (2016), 1–8. https://doi.org/10.4303/jdar/235992 doi: 10.4303/jdar/235992
[8]	Z. M. Laubach, E. J. Murray, K. L. Hoke, R. J. Safran, W. Perng, A biologist's guide to model selection and causal inference, Proc. Biol. Sci., 288 (2021), 20202815. https://doi.org/10.1098/rspb.2020.2815 doi: 10.1098/rspb.2020.2815
[9]	B. Sauer, T. J. VanderWeele, Use of directed acyclic graphs, in Developing a Protocol for Observational Comparative Effectiveness Research: A User's Guide, (eds. P. Velentgas, N. A. Dreyer, P. Nourjah, S. R. Smith and M. M. Torchia), Agency for Healthcare Research and Quality (2013), 177–183.
[10]	J. C. Digitale, J. N. Martin, M. M. Glymour, Tutorial on directed acyclic graphs, J. Clin. Epidemiol., 142 (2021), 264–267. https://doi.org/10.1016/j.jclinepi.2021.08.001 doi: 10.1016/j.jclinepi.2021.08.001
[11]	H. Iwata, T. Wakabayashi, R. Kato, The dawn of directed acyclic graphs in primary care research and education, J. Gen. Fam. Med., 24 (2023), 274. https://doi.org/10.1002/jgf2.627 doi: 10.1002/jgf2.627
[12]	R. A. Rose, J. A. Cosgrove, B. R. Lee, Directed acyclic graphs in social work research and evaluation: A primer. J. Soc. Social. Work Res., 15 (2024), 391–415. http://dx.doi.org/10.1086/723606 doi: 10.1086/723606
[13]	S. Fergus, DAGs in data engineering: A powerful, problematic tool, Shipyard. Blog., (2024).
[14]	P. Hünermund, Causal Data Science with Directed Acyclic Graphs, Copenhagen Business School, University of Copenhagen (2021). https://web.archive.org/web/20200523155727/https://www.udemy.com/course/causal-data-science/
[15]	C. S. Aneshensel, Theory-Based Data Analysis for the Social Sciences, SAGE Publications, 2002. https://doi.org/10.4135/9781412986342
[16]	J. Pearl, Probabilistic Reasoning in Intelligent Systems, Elsevier, 1988. https://doi.org/10.1016/C2009-0-27609-4
[17]	T. R. Frieden, Evidence for health decision making—beyond randomized, controlled trials, N. Engl. J. Med., 377 (2017), 465–475. https://doi.org/10.1056/nejmra1614394 doi: 10.1056/nejmra1614394
[18]	M. Piccininni, S. Konigorski, J. L. Rohmann, T. Kurth, Directed acyclic graphs and causal thinking in clinical risk prediction modelling, BMC Med. Res. Methodol., 20 (2020), 179. https://doi.org/10.1186/s12874-020-01058-z doi: 10.1186/s12874-020-01058-z
[19]	L. Lin, M. Sperrin, D. A. Jenkins, G. P. Martin, N. Peek, A scoping review of causal methods enabling predictions under hypothetical interventions, Diagn. Progn. Res., 5 (2021), 1–6. https://doi.org/10.1186/s41512-021-00092-9 doi: 10.1186/s41512-021-00092-9
[20]	P. Msaouel, J. Lee, J. A. Karam, P. F. Thall, A causal framework for making individualized treatment decisions in oncology, Cancers, 14 (2022), 3923. https://doi.org/10.3390/cancers14163923 doi: 10.3390/cancers14163923
[21]	J. Fehr, M. Piccininni, T. Kurth. S. Konigorski, Assessing the transportability of clinical prediction models for cognitive impairment using causal models, BMC Med. Res. Meth., 23 (2023), 187. https://doi.org/10.1186/s12874-023-02003-6 doi: 10.1186/s12874-023-02003-6
[22]	CRRS (Committee on Reproducibility and Replicability in Science), Reproducibility and Replicability in Science, National Academies Press, (2019), 1–256. https://doi.org/10.17226/25303
[23]	R. A. Alfawaz, Exploring the Relationship Between Metabolic Syndrome and Sleep Amongst Adults in the UK, Ph. D thesis, University of Leeds Medical School, (2017).
[24]	J. Textor, B. van der Zander, M. S. Gilthorpe, M. Liśkiewicz, G. T. H. Ellison, Robust causal inference using directed acyclic graphs: the R package 'dagitty', Int. J. Epidemiol., 45 (2016), 1887–1894. https://doi.org/10.1093/ije/dyw341 doi: 10.1093/ije/dyw341
[25]	M. Fiore, M. Devesas Campos, The algebra of directed acyclic graphs, in Computation, Logic, Games, and Quantum Foundations. The Many Facets of Samson Abramsky, (eds. B. Coecke, L. Ong and P. Panangaden), Springer, (2013), 37–51. https://doi.org/10.1007/978-3-642-38164-5_4
[26]	S. Geneletti, S. Richardson, N, Best, Adjusting for selection bias in retrospective, case-control studies, Biostatistics, 10 (2009), 17–31. https://doi.org/10.1093/biostatistics/kxn010 doi: 10.1093/biostatistics/kxn010
[27]	G. T. H. Ellison, Might temporal logic improve the specification of directed acyclic graphs (DAGs)?, J. Stat. Data Sci. Educ., 29 (2021), 202–213. https://doi.org/10.1080/26939169.2021.1936311 doi: 10.1080/26939169.2021.1936311
[28]	A. Tafti, G. Shmueli, Beyond overall treatment effects: Leveraging covariates in randomized experiments guided by causal structure, Inf. Syst. Res., 31 (2020), 1183–1199. https://dx.doi.org/10.2139/ssrn.3331772 doi: 10.2139/ssrn.3331772
[29]	F. E. Raimondi, T. O'Keeffe, H. Chockler, A. R. Lawrence, T. Stemberga, A. Franca, et al., Causal analysis of the TOPCAT trial: Spironolactone for preserved cardiac function heart failure, preprint, arXiv: 2211.12983. https://doi.org/10.48550/arXiv.2211.12983
[30]	G. J. Griffith, T. T. Morris, M. J. Tudball, A. Herbert, G. Mancano, L. Pike, et al., Collider bias undermines our understanding of COVID-19 disease risk and severity, Nat. Commun., 11 (2020), 5749. https://doi.org/10.1038/s41467-020-19478-2 doi: 10.1038/s41467-020-19478-2
[31]	M. G. Hudgens, M. E. Halloran, Toward causal inference with interference, J. Am. Stat. Ass., 103 (2008), 832–842. https://doi.org/10.1198/016214508000000292 doi: 10.1198/016214508000000292
[32]	R. M. Baron, D. A. Kenny, The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations, J. Pers. Soc. Psychol., 51 (1986), 1173–1182. https://doi.org/10.1037//0022-3514.51.6.1173 doi: 10.1037//0022-3514.51.6.1173
[33]	T. J. VanderWeele, A unification of mediation and interaction: A four-way decomposition, Epidemiology, 25 (2014), 749–761. https://doi.org/10.1097/ede.0000000000000121 doi: 10.1097/ede.0000000000000121
[34]	R. H. Groenwold, T. M. Palmer, K. Tilling, To adjust or not to adjust? When a "confounder" is only measured after exposure, Epidemiology, 32 (2021), 194–201. https://doi.org/10.1097/ede.0000000000001312 doi: 10.1097/ede.0000000000001312
[35]	M. Viswanathan, N. D. Berkman, D. M. Dryden, L. Hartling, Assessing Risk of Bias and Confounding in Observational Studies of Interventions or Exposures: Further Development of the RTI Item Bank, Agency for Healthcare Research and Quality, 2013.
[36]	T. S. Al-Jewair, N. Pandis, Y. K. Tu, Directed acyclic graphs: A tool to identify confounders in orthodontic research, Part Ⅱ, Am. J. Orthod. Dentofacial. Orthop., 151 (2017), 619–621. https://doi.org/10.1016/j.ajodo.2016.12.003 doi: 10.1016/j.ajodo.2016.12.003
[37]	B. van der Zander, M. Liśkiewicz, J. Textor, Constructing separators and adjustment sets in ancestral graphs, in Proceedings of the UAI 2014 Conference on Causal Inference, 1274 (2014), 11–24.
[38]	S. Greenland, J. Pearl, J. M. Robins, Causal diagrams for epidemiologic research, Epidemiology, 10 (1999), 37–48. https://doi.org/10.1097/00001648-199901000-00008 doi: 10.1097/00001648-199901000-00008
[39]	F. Elwert, C. Winship, Endogenous selection bias: The problem of conditioning on a collider variable, Ann. Rev. Sociol., 40 (2014), 31–53. https://doi.org/10.1146/annurev-soc-071913-043455 doi: 10.1146/annurev-soc-071913-043455
[40]	T. B. Dondo, M. Hall, T. Munyombwe, C. Wilkinson, M. E. Yadegarfar, A. Timmis, et al., A nationwide causal mediation analysis of survival following ST-elevation myocardial infarction, Heart, 106 (2020), 765–771. https://doi.org/10.1136/heartjnl-2019-315760 doi: 10.1136/heartjnl-2019-315760
[41]	D. A. Freedman, On regression adjustments to experimental data, Adv. Appl. Math., 40 (2008), 180–193. https://doi.org/10.1016/j.aam.2006.12.003 doi: 10.1016/j.aam.2006.12.003
[42]	M. Mueller, M. D'Addario, M. Egger, M. Cevallos, O. Dekkers, C. Mugglin, et al., Methods to systematically review and meta-analyse observational studies: A systematic scoping review of recommendations, BMC Med. Res. Methodol., 18 (2018), 44. https://doi.org/10.1186/s12874-018-0495-9 doi: 10.1186/s12874-018-0495-9
[43]	O. M. Dekkers, J. P. Vandenbroucke, M. Cevallos, A. G. Renehan, D. G. Altman, M. Egger, COSMOS-E: Guidance on conducting systematic reviews and meta-analyses of observational studies of etiology, PLoS Med., 16 (2019), 1002742. https://doi.org/10.1371/journal.pmed.1002742 doi: 10.1371/journal.pmed.1002742
[44]	G. Sarri, E. Patorno, H. Yuan, J. J. Guo, D. Bennett, X. Wen, et al., Framework for the synthesis of non-randomised studies and randomised controlled trials: A guidance on conducting a systematic review and meta-analysis for healthcare decision making, Brit. Med. J. Evid. Based Med., 27 (2020), 109–119. https://doi.org/10.1136/bmjebm-2020-111493 doi: 10.1136/bmjebm-2020-111493
[45]	A. Ankan, I. M. Wortel, J. Textor, Testing graphical causal models using the R package "dagitty", Curr. Protoc., 1 (2021), e45. https://doi.org/10.1002/cpz1.45 doi: 10.1002/cpz1.45
[46]	P. Ding, L. W. Miratrix, To adjust or not to adjust? Sensitivity analysis of M-bias and butterfly-bias, J. Caus. Inf., 3 (2015), 41–57. https://doi.org/10.1515/jci-2013-0021 doi: 10.1515/jci-2013-0021
[47]	T. J. VanderWeele, Commentary: Resolutions of the birthweight paradox: Competing explanations and analytical insights, Int. J. Epidemiol., 43 (2014), 1368–1373. https://doi.org/10.1093/ije/dyu162 doi: 10.1093/ije/dyu162
[48]	B. N. Detweiler, L. E. Kollmorgen, B. A. Umberham, R. J. Hedin, B. M. Vassar, Risk of bias and methodological appraisal practices in systematic reviews published in anaesthetic journals: A meta‐epidemiological study, Anaesthesia, 71 (2016), 955–968. https://doi.org/10.1111/anae.13520 doi: 10.1111/anae.13520
[49]	G. T. H. Ellison, COVID-19 and the epistemology of epidemiological models at the dawn of AI, Ann. Hum. Biol., 47 (2020), 506–513. https://doi.org/10.1080/03014460.2020.1839132 doi: 10.1080/03014460.2020.1839132
[50]	S. J. Pocock, T. J. Collier, K. J. Dandreo, B. L. de Stavola, M. B. Goldman, L. A. Kalish, et al., Issues in the reporting of epidemiological studies: A survey of recent practice, Brit. Med. J., 329 (2004), 883. https://doi.org/10.1136/bmj.38250.571088.55 doi: 10.1136/bmj.38250.571088.55
[51]	E. von Elm, M. Egger, The scandal of poor epidemiological research, Brit. Med. J., 329 (2004), 868–869. https://doi.org/10.1136/bmj.329.7471.868 doi: 10.1136/bmj.329.7471.868
[52]	A. Blair, P. Stewart, J. H. Lubin, F. Forastiere, Methodological issues regarding confounding and exposure misclassification in epidemiological studies of occupational exposures, Am. J. Ind. Med., 50 (2007), 199–207. https://doi.org/10.1002/ajim.20281 doi: 10.1002/ajim.20281
[53]	T. Kurth, Continuing to advance epidemiology, Front. Epidemiol., 1 (2021), 782374. https://doi.org/10.3389/fepid.2021.782374 doi: 10.3389/fepid.2021.782374
[54]	V. Tomić, I. Buljan, A. Marušić, Perspectives of key stakeholders on essential virtues for good scientific practice in research areas, Account. Res., 29 (2021), 77–108. https://doi.org/10.1080/08989621.2021.1900739 doi: 10.1080/08989621.2021.1900739
[55]	J. Textor, J. Hardt, S. Knüppel, DAGitty: A graphical tool for analyzing causal diagrams, Epidemiology, 22 (2011), 745. https://doi.org/10.1097/ede.0b013e318225c2be doi: 10.1097/ede.0b013e318225c2be
[56]	J. Textor, DAGbase: A database of human-drawn causal diagrams, Proc. Eur. Causal. Inf. Mtg., (2020).
[57]	T. Stacey, P. W. G. Tennant, L. M. E. McCowan, E. A. Mitchell, J. Budd, M. Li, et al., Gestational diabetes and the risk of late stillbirth: A case-control study from England, UK, Brit. J. Obstet. Gynaecol., 126 (2019), 973–982. https://doi.org/10.1111/1471-0528.15659 doi: 10.1111/1471-0528.15659
[58]	N. Swartz, The Concept of Physical Law, 2nd Edition, Cambridge University Press, 2003.
[59]	G. E. P. Box, Science and statistics, J. Am. Stat. Ass., 71 (1976), 791–799.
[60]	G. T. H. Ellison, R. B. Mattes, H. Rhoma, T. de Wet, Economic vulnerability and poor service delivery made it more difficult for shack-dwellers to comply with COVID-19 restrictions, S. Afr. J. Sci., 118 (2022), 1–5. https://doi.org/10.17159/sajs.2022/13301 doi: 10.17159/sajs.2022/13301
[61]	G. A. Escobar, D. Burks, M. R. Abate, M. F. Faramawi, A. T. Ali, L. C. Lyons, et al., Risk of acute kidney injury after percutaneous pharmacomechanical thrombectomy using AngioJet in venous and arterial thrombosis, Ann. Vasc. Surg., 42 (2017), 238–245. https://doi.org/10.1016/j.avsg.2016.12.018 doi: 10.1016/j.avsg.2016.12.018
[62]	D. Hume, A Treatise of Human Nature, John Noon, 1738.
[63]	M. Barrows, G. T. H. Ellison, 'Belief-consistent information processing' vs. 'coherence-based reasoning': Pragmatic frameworks for exposing common cognitive biases in intelligence analysis, preprint, (2024), 2024011338. https://doi.org/10.20944/preprints202401.1338.v1
[64]	R. Foraita, J. Spallek, H. Zeeb, Directed acyclic graphs, in Handbook of Epidemiology, (eds. W. Ahrens and I. Pigeot), Springer, (2014), 1481–1517. https://doi.org/10.1007/978-0-387-09834-0_65
[65]	M. J. Gardner, D. G. Altman, Confidence intervals rather than P values: Estimation rather than hypothesis testing, Brit. Med. J., 292 (1986), 746–750. https://doi.org/10.1136/bmj.292.6522.746 doi: 10.1136/bmj.292.6522.746
[66]	Y. K. Tu, D. Gunnell, M. S. Gilthorpe, Simpson's Paradox, Lord's Paradox, and suppression effects are the same phenomenon–the reversal paradox, Emerg. Themes Epidemiol., 5 (2008), 1–9. https://doi.org/10.1186/1742-7622-5-2 doi: 10.1186/1742-7622-5-2
[67]	E. F. Schisterman, S. R. Cole, R. W. Platt, Overadjustment bias and unnecessary adjustment in epidemiologic studies, Epidemiology, 20 (2009), 488–495. https://doi.org/10.1097/ede.0b013e3181a819a1 doi: 10.1097/ede.0b013e3181a819a1
[68]	L. Richiardi, R. Bellocco, D. Zugna, Mediation analysis in epidemiology: Methods, interpretation and bias, Int. J. Epidemiol., 42 (2013), 1511–1519. https://doi.org/10.1093/ije/dyt127 doi: 10.1093/ije/dyt127
[69]	K. J. Rothman, T. L. Lash, Epidemiologic study design with validity and efficiency considerations, Chapter 6, in Modern Epidemiology, (eds. T. L. Lash, T. J. VanderWeele, S. Haneuse and K. J. Rothman), Wolters Kluwer, (2021), 161–213.
[70]	F. Castelletti, Bayesian model selection of Gaussian directed acyclic graph structures, Int. Stat. Rev., 88 (2020), 752–775. https://doi.org/10.1111/insr.12379 doi: 10.1111/insr.12379
[71]	G. Li, Q. Zhou, Bayesian multi-task variable selection with an application to differential DAG analysis, J. Comput. Graph Stat., 33 (2024), 35–46. https://doi.org/10.1080/10618600.2023.2252023 doi: 10.1080/10618600.2023.2252023

Reader Comments

Your name:*

Email:*
© 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)