Fully Bayesian analysis of allele-specific RNA-seq data

Ignacio Alvarez-Castro; Jarad Niemi; Ignacio Alvarez-Castro; Jarad Niemi

doi:10.3934/mbe.2019389

Mathematical Biosciences and Engineering

2019, Volume 16, Issue 6: 7751-7770. doi: 10.3934/mbe.2019389

Previous Article Next Article

Research article Special Issues

Fully Bayesian analysis of allele-specific RNA-seq data

Ignacio Alvarez-Castro ^{1
,
,},
Jarad Niemi ²

1.
Instituto de Estadística, Universidad de la República, Montevideo, Uruguay
2.
Department of Statistics, Iowa State University, Iowa, IA 50010, USA

Received: 27 January 2019 Accepted: 12 August 2019 Published: 23 August 2019

Diploid organisms have two copies of each gene, called alleles, that can be separately transcribed. The RNA abundance associated to any particular allele is known as allele-specific expression (ASE). When two alleles have polymorphisms in transcribed regions, ASE can be studied using RNA-seq read count data. ASE has characteristics different from the regular RNA-seq expression: ASE cannot be assessed for every gene, measures of ASE can be biased towards one of the alleles (reference allele), and ASE provides two measures of expression for a single gene for each biological samples with leads to additional complications for single-gene models. We present statistical methods for modeling ASE and detecting genes with differential allelic expression. We propose a hierarchical, overdispersed, count regression model to deal with ASE counts. The model accommodates gene-specific overdispersion, has an internal measure of the reference allele bias, and uses random effects to model the gene-specific regression parameters. Fully Bayesian inference is obtained using the fbseq package that implements a parallel strategy to make the computational times reasonable. Simulation and real data analysis suggest the proposed model is a practical and powerful tool for the study of differential ASE.
- hierarchical model,
- shrinkage priors,
- allele-specific expression,
- RNA-seq,
- Markov chain Monte Carlo,
- GPU
Citation: Ignacio Alvarez-Castro, Jarad Niemi. Fully Bayesian analysis of allele-specific RNA-seq data[J]. Mathematical Biosciences and Engineering, 2019, 16(6): 7751-7770. doi: 10.3934/mbe.2019389

Related Papers:

Abstract

Diploid organisms have two copies of each gene, called alleles, that can be separately transcribed. The RNA abundance associated to any particular allele is known as allele-specific expression (ASE). When two alleles have polymorphisms in transcribed regions, ASE can be studied using RNA-seq read count data. ASE has characteristics different from the regular RNA-seq expression: ASE cannot be assessed for every gene, measures of ASE can be biased towards one of the alleles (reference allele), and ASE provides two measures of expression for a single gene for each biological samples with leads to additional complications for single-gene models. We present statistical methods for modeling ASE and detecting genes with differential allelic expression. We propose a hierarchical, overdispersed, count regression model to deal with ASE counts. The model accommodates gene-specific overdispersion, has an internal measure of the reference allele bias, and uses random effects to model the gene-specific regression parameters. Fully Bayesian inference is obtained using the fbseq package that implements a parallel strategy to make the computational times reasonable. Simulation and real data analysis suggest the proposed model is a practical and powerful tool for the study of differential ASE.

References

[1]	S. Datta and D. Nettleton, Statistical Analysis of Next Generation Sequencing Data, Springer, 2014. Available from: http://link.springer.com/content/pdf/10.1007/978-3-319-07212-8.pdf.
[2]	W. Sun and Y. Hu, Mapping of expression quantitative trait loci using RNA-seq data, in Statistical Analysis of Next Generation Sequencing Data (eds. D. Nettleton and S. Datta), 2014, 25–50.
[3]	P. S. Schnable and N. M. Springer, Progress toward understanding heterosis in crop plants, Annu. Rev. Plant Biol., 64 (2013), 71–88.
[4]	A. Paschold, Y. Jia, C. Marcon, et al., Complementation contributes to transcriptome complexity in maize (Zea mays L.) hybrids relative to their inbred parents., Genome Res., 22 (2012), 2445–2454.
[5]	G. D. M. Bell, N. C. Kane, L. H. Rieseberg, et al., RNA-Seq analysis of allele-specific expression, hybrid effects, and regulatory divergence in hybrids compared with their parents from natural populations, Genome Biol. Evol., 5 (2013), 1309–1323.
[6]	J. K. Pickrell, J. C. Marioni, A. A. Pai, et al., Understanding mechanisms underlying human gene expression variation with rna sequencing, Nature, 464 (2010), 768–772.
[7]	W. Sun and Y. Hu, eQTL Mapping Using RNA-seq Data, Stat. Biosci., 5 (2013), 198–219.
[8]	C. T. Harvey, G. A. Moyerbrailean, G. O. Davis, et al., Quasar: quantitative allele-specific analysis of reads, Bioinformatics, 31 (2014), 1235–1242.
[9]	N. Raghupathy, K. Choi, M. J. Vincent, et al., Hierarchical analysis of RNA-seq reads improves the accuracy of allele-specific expression, Bioinformatics, 34 (2018), 2177–2184.
[10]	S. Srivastava and L. Chen, A two-parameter generalized Poisson model to improve the analysis of RNA-seq data., Nucleic Acids Res., 38 (2010), e170.
[11]	X. Wei and X. Wang, A computational workflow to identify allele-specific expression and epigenetic modification in maize., Genom. Proteom. Bioinf., 11 (2013), 247–252.
[12]	M. D. Robinson, D. J. McCarthy and G. K. Smyth, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data., Bioinformatics (Oxford, England), 26 (2010), 139–140.
[13]	D. J. Lorenz, R. S. Gill, R. Mitra, et al., Using RNA-seq Data to Detect Differentially Expressed Genes, in Statistical Analysis of Next Generation Sequencing Data (eds. S. Datta and D. Nettleton), 2014, chapter 2, 25–49.
[14]	Y.-J. Hu, W. Sun, J.-Y. Tzeng, et al., Proper use of allele-specific expression improves statistical power for cis -eQTL mapping with RNA-seq data, J. Am. Stat. Assoc., 110 (2015), 962–974.
[15]	W. Landau, J. Niemi and D. Nettleton, Fully bayesian analysis of rna-seq counts for the detection of gene expression heterosis, J. Am. Stat. Assoc., 114 (2019), 601–612.
[16]	N. I. Panousis, M. Gutierrez-Arcelus, E. T. Dermitzakis, et al., Allelic mapping bias in RNA-sequencing is not a major confounder in eQTL studies, Genome. Biol., 15 (2014), 467.
[17]	J. F. Degner, J. C. Marioni, A. A. Pai, et al., Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data, Bioinformatics, 25 (2009), 3207–3212.
[18]	R. Vijaya Satya, N. Zavaljevski and J. Reifman, A new strategy to reduce allelic bias in RNA-Seq readmapping, Nucleic Acids Res., 40 (2012), 1–9.
[19]	K. R. Stevenson, J. D. Coolon and P. J. Wittkopp, Sources of bias in measures of allele-specific expression derived from RNA-sequence data aligned to a single reference genome., BMC Genom., 14 (2013), 536.
[20]	Y. Chen, A. T. L. Lun and G. K. Smyth, Differential expression analysis of complex RNA-seq experiments using edgeR, in Statistical Analysis of Next Generation Sequencing Data, Springer, Cham, 2014, 51–74.
[21]	T. Park and G. Casella, The Bayesian lasso, J. Am. Stat. Assoc., 103 (2008), 681–686.
[22]	C. M. Carvalho, N. G. Polson and J. G. Scott, Handling Sparsity via the Horseshoe, J. Mach. Learn. Res., 5 (2009), 73–80.
[23]	A. Gelman, Prior distributions for variance parameters in hierarchical models, Bayesian Analysis, 1 (2006), 515–533.
[24]	A. Gelman, J. B. Carlin, H. S. Stern,et al., Bayesian Data Analysis, CRC press, 2013.
[25]	J. K. Ghosh, M. Delampady and T. Samanta, An Introduction to Bayesian Analysis, Springer, 2006. Available from: http://onlinelibrary.wiley.com/doi/10.1002/9781118684818.ch16/summary.
[26]	L. G. León-Novelo, L. M. McIntyre, J. M. Fear, et al., A flexible Bayesian method for detecting allelic imbalance in RNA-seq data, BMC Genom., 15 (2014), 920.
[27]	J. Niemi, E. Mittman, W. Landau, et al., Empirical Bayes analysis of RNA-seq data for detection of gene expression heterosis, J. Agr. Biol. Envir. St., 20 (2015), 614–628.
[28]	M. A. Van De Wiel, G. G. R. Leday, L. Pardo, et al., Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors, Biostatistics, 14 (2013), 113–128.
[29]	W. Landau and J. Niemi, A fully Bayesian strategy for high-dimensional hierarchical modeling using massively parallel computing, 2016. Available from: http://arxiv.org/abs/1606.06659.
[30]	A. Lithio and D. Nettleton, Hierarchical modeling and differential expression analysis for RNA-seq experiments with inbred and hybrid genotypes, J. Agr. Biol. Envir. St., 20 (2015), 598–613.
[31]	M. Ventrucci, E. M. Scott and D. Cocchi, Multiple testing on standardized mortality ratios: a Bayesian hierarchical model for FDR estimation, Biostatistics, 12 (2011), 51–67.
[32]	P. Muller, G. Parmigiani and K. Rice, FDR and Bayesian multiple comparisons ules, 2006. Available from: http://biostats.bepress.com/jhubiostat/paper115.
[33]	H. Y. Bar, J. G. Booth and M. T. Wells, A bivariate model for simultaneous testing in bioinformatics data, J. Am. Stat. Assoc., 109 (2014), 537–547.
[34]	P. Müller, G. Parmigiani, C. Robert, et al., Optimal sample size for multiple testing: the case of gene expression microarrays, J. Am. Stat. Assoc., 99 (2004), 990–1001.
[35]	S. Anders and W. Huber, Differential expression analysis for sequence count data, Genome Biol., 11 (2010), R106.
[36]	P. R. Hahn and J. He, Elliptical slice sampling for Bayesian shrinkage regression with applications to causal inference, 2016. Available from: http://faculty.chicagobooth.edu/richard. hahn/JCGS_submit.pdf.
[37]	M. C. Sachs, plotROC: A tool for plotting roc curves, J. Stat. Software, 79 (2017), 1–19.
[38]	Y. Benjamini and Y. Hochberg, Controlling the false discovery rate: A practical and powerful approach to multiple testing, J. R. Stat. Soc-B, 57 (1995), 289–300.
[39]	P. S. Schnable, D. Ware, R. S. Fulton, et al., The B73 maize genome: complexity, diversity, and dynamics, Science, 326 (2009), 1112–1115.

Reader Comments

Your name:*

Email:*
© 2019 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)