Export file:


  • RIS(for EndNote,Reference Manager,ProCite)
  • BibTex
  • Text


  • Citation Only
  • Citation and Abstract

Statistical modeling on human microbiome sequencing data

1 Department of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
2 Department of Biostatistics, Princess Margeret Cancer Centre, Toronto, Canada

Research studies have shown that human microbiome is associated with many diseases through the linkage between bacterial taxa and environmental and genetic factors. Typical human microbiome sequencing data that obtained by next generation sequencing technologies of the 16S rRNA gene are high dimensional and sparse because most taxa are not shared among the samples. As a result, the data is often over-dispersed and with excess zeros. These features rise statistical challenges for compositional data analysis. We review the recent statistical methodology development for this setting. In particular, we summarize some current popular parametric probability models including the cases when repeated measurements of the microbiome are applicable. Multivariate analyses methods that are based on distance measurement for testing differences between microbes community are introduced. Statistical models which are developed to assess the association between genetic variants on X-chromosome and microbial components are highlighted. We discuss some applications on analysis of the association of host genome, microbial compositions and human diseases. Despite sophisticated approaches to statistical analysis of taxa count data, we suggest some future research directions on how to classify and predict clinical outcomes with microbial compositions.
  Article Metrics

Keywords big data; statistical modeling; human microbiome; sequencing data; genomics; data mining

Citation: Dongyang Yang, Wei Xu. Statistical modeling on human microbiome sequencing data. Big Data and Information Analytics, 2019, 4(1): 1-12. doi: 10.3934/bdia.2019001


  • 1. Whiteside SA, Razvi H, Dave S, et al. (2015) The microbiome of the urinary tracta role beyond infection. Nat Rev Urol 12: 81-90.    
  • 2. Cho I and Blaser MJ, (2012) The human microbiome: At the interface of health and disease. Nat Rev Genet 13: 260-270.    
  • 3. HMP Integrative, (2014) The integrative human microbiome project: Dynamic analysis of microbiome-host omics profiles during periods of human health and disease. Cell Host Microbe 16: 276-289.    
  • 4. Young VB, (2017) The role of the microbiome in human health and disease: An introduction for clinicians. BMJ 356: j831.
  • 5. Singh RK, Chang HW, Yan D, et al. (2017) Influence of diet on the gut microbiome and implications for human health. J Trans Med e 15: 73.    
  • 6. Hollister EB, Gao C and Versalovic J, (2014) Compositional and functional features of the gastrointestinal microbiome and their effects on human health. Gastroenterology 146: 1449-1458.    
  • 7. Sampson TR, Debelius JW, Thron T, et al. (2016) Gut microbiota regulate motor deficits and neuroinflammation in a model of parkinsons disease. Cell 167: 1469-1480.    
  • 8. Greenblum S, Turnbaugh PT and Borenstein E, (2012) Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease. Proc Natl Acad Sci 109: 594-599.    
  • 9. Morgan XC, Tickle TL, Sokol H, et al. (2012) Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment. Genome Biol 13: R79.    
  • 10. Samuel BS and Gordon JI, (2006) A humanized gnotobiotic mouse model of host-archaeal-bacterial mutualism. Proc Natl Acad Sci 103: 10011-10016.    
  • 11. Holmes E, Li JV, Athanasiou T, et al. (2011) Understanding the role of gut microbiome-host metabolic signal disruption in health and disease. Trends Microbiol 19: 349-359.    
  • 12. Turpin W, Espin-Garcia O, Xu W, et al. (2016) Association of host genome with intestinal microbial composition in a large healthy cohort. Nat Genet 48: 1413-1417.    
  • 13. Schloissnig S, Arumugam M, Sunagawa S, et al. (2013) Genomic variation landscape of the human gut microbiome. Nature 493: 45-50.    
  • 14. Chase J, Fouquier J, Zare M, et al. (2016) Geography and location are the primary drivers of office microbiome composition. mSystems 1: e00022-16.
  • 15. Kong HH, Oh J, Deming C, et al. (2012) Temporal shifts in the skin microbiome associated with disease flares and treatment in children with atopic dermatitis. Genome Res 22: 850-859.    
  • 16. Grice EA, Kong HH, Conlan S, et al. (2009) Topographical and temporal diversity of the human skin microbiome. Science 324: 1190-1192.    
  • 17. Turnbaugh PJ, Ley RE, Mahowald MA, et al. (2006) An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444: 1027-1031.    
  • 18. Kau AL, Ahern PP, Griffin NW, et al. (2011) Human nutrition, the gut microbiome and the immune system. Nature 474: 327-336.    
  • 19. Tringe SG and Rubin EM, (2005) Metagenomics: Dna sequencing of environmental samples. Nat Rev Genet 6: 805.    
  • 20. Caporaso JG, Kuczynski J, Stombaugh J, et al. (2010) Qiime allows analysis of high-throughput community sequencing data. Nat Methods 7: 335-336.    
  • 21. Blaxter M, Mann J, Chapman T, et al. (2005) Defining operational taxonomic units using dna barcode data. Philos Trans R Soc, B 360: 1935-1943.    
  • 22. Lin W, Shi P, Feng R, et al. (2014) Variable selection in regression with compositional covariates. Biometrika 101: 785-797.    
  • 23. Hongzhe Li, (2015) Microbiome, metagenomics, and high-dimensional compositional data analysis. Annu Rev Stat Its Appl 2: 73-94.    
  • 24. Shankar J, Szpakowski S, Solis NV, et al. (2015) A systematic evaluation of high-dimensional, ensemble-based regression for exploring large model spaces in microbiome analyses. BMC Bioinf 16: 31.    
  • 25. McMurdie PJ and Holmes S, (2014) Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput Biol 10: e1003531.    
  • 26. La Rosa PS, Brooks JP, Deych E, et al. (2012) Hypothesis testing and power calculations for taxonomic-based human microbiome data. PloS One 7: e52078.    
  • 27. Chen W, Liu F, Ling Z, et al. (2012) Human intestinal lumen and mucosa-associated microbiota in patients with colorectal cancer. PloS One 7: e39743.    
  • 28. Iwai S, Fei M, Huang D, et al. (2012) Oral and airway microbiota in hiv-infected pneumonia patients. J Clin Microbiol 50: 2995-3002.    
  • 29. Kim KA, Jung IH, Park SH, et al. (2013) Comparative analysis of the gut microbiota in people with different levels of ginsenoside rb1 degradation to compound k. PLoS One 8: e62409.    
  • 30. Xu L, Paterson AD, Turpin W, et al. (2015) Assessment and selection of competing models for zero-inflated microbiome data. PloS One 10: e0129606.    
  • 31. Xu L, Paterson AD and Xu W, (2017) Bayesian latent variable models for hierarchical clustered count outcomes with repeated measures in microbiome studies. Genet Epidemiol 41: 221-232.    
  • 32. Shestopaloff K, Escobar MD and Xu W, (2018) Analyzing differences between microbiome communities using mixture distributions. Stat Med 37: 4036-4053.    
  • 33. Anderson MJ, (2001) A new method for non-parametric multivariate analysis of variance. Aust Ecol 26: 32-46.
  • 34. Jostins L, Ripke S, Weersma RK, et al. (2012) Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491: 119-124.    
  • 35. Goodrich JK, Waters JL, Poole AC, et al. (2014) Human genetics shape the gut microbiome. Cell 159: 789-799.    
  • 36. Blekhman R, Goodrich JK, Huang K, et al. (2015) Host genetic variation impacts microbiome composition across human body sites. Genome Biol 16: 191.    
  • 37. Espin-Garcia O, Croitoru K and Xu W, (2019) A finite mixture model for x-chromosome association with an emphasis on microbiome data analysis. Genet Epidemiol 43: 427-439.    
  • 38. David Clayton, (2008) Testing for association on the x chromosome. Biostatistics 9: 593-600.    
  • 39. Zheng G, Joo J, Zhang C, et al. (2007) Testing association for markers on the x chromosome. Genet Epidemiol 31: 834-843.    
  • 40. Kevans D, Turpin W, Madsen K, et al. (2015) Determinants of intestinal permeability in healthy first-degree relatives of individuals with crohn's disease. Inflammatory Bowel Dis 21: 879-887.    
  • 41. Matson V, Fessler J, Bao R, et al. (2018) The commensal microbiome is associated with anti-pd-1 efficacy in metastatic melanoma patients. Science 359: 104-108.    


Reader Comments

your name: *   your email: *  

© 2019 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution Licese (http://creativecommons.org/licenses/by/4.0)

Download full text in PDF

Export Citation

Copyright © AIMS Press All Rights Reserved