Convergence of maximum likelihood supertree reconstruction

Vu Dinh; Lam Si Tung Ho; Vu Dinh; Lam Si Tung Ho

doi:10.3934/math.2021513

AIMS Mathematics

2021, Volume 6, Issue 8: 8854-8867. doi: 10.3934/math.2021513

Previous Article Next Article

Research article Special Issues

Convergence of maximum likelihood supertree reconstruction

Vu Dinh ¹,
Lam Si Tung Ho ^{2
,
,}

1.
Department of Mathematical Sciences, University of Delaware, Newark, Delaware, USA
2.
Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada

Received: 29 February 2020 Accepted: 16 May 2021 Published: 11 June 2021
MSC : Primary 05C05, 62F12; secondary 92B10, 92D15

Supertree methods are tree reconstruction techniques that combine several smaller gene trees (possibly on different sets of species) to build a larger species tree. The question of interest is whether the reconstructed supertree converges to the true species tree as the number of gene trees increases (that is, the consistency of supertree methods). In this paper, we are particularly interested in the convergence rate of the maximum likelihood supertree. Previous studies on the maximum likelihood supertree approach often formulate the question of interest as a discrete problem and focus on reconstructing the correct topology of the species tree. Aiming to reconstruct both the topology and the branch lengths of the species tree, we propose an analytic approach for analyzing the convergence of the maximum likelihood supertree method. Specifically, we consider each tree as one point of a metric space and prove that the distance between the maximum likelihood supertree and the species tree converges to zero at a polynomial rate under some mild conditions. We further verify these conditions for the popular exponential error model of gene trees.
- supertree,
- maximum likelihood estimator,
- species tree reconstruction,
- convergence rate,
- exponential model
Citation: Vu Dinh, Lam Si Tung Ho. Convergence of maximum likelihood supertree reconstruction[J]. AIMS Mathematics, 2021, 6(8): 8854-8867. doi: 10.3934/math.2021513

Related Papers:

Abstract

Supertree methods are tree reconstruction techniques that combine several smaller gene trees (possibly on different sets of species) to build a larger species tree. The question of interest is whether the reconstructed supertree converges to the true species tree as the number of gene trees increases (that is, the consistency of supertree methods). In this paper, we are particularly interested in the convergence rate of the maximum likelihood supertree. Previous studies on the maximum likelihood supertree approach often formulate the question of interest as a discrete problem and focus on reconstructing the correct topology of the species tree. Aiming to reconstruct both the topology and the branch lengths of the species tree, we propose an analytic approach for analyzing the convergence of the maximum likelihood supertree method. Specifically, we consider each tree as one point of a metric space and prove that the distance between the maximum likelihood supertree and the species tree converges to zero at a polynomial rate under some mild conditions. We further verify these conditions for the popular exponential error model of gene trees.

References

[1]	N. Amenta, M. Godwin, N. Postarnakevich, K. S. John, Approximating geodesic tree distance, Inform. Process. Lett., 103 (2007), 61-65. doi: 10.1016/j.ipl.2007.02.008
[2]	B. R. Baum, Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees, Taxon, 41 (1992), 3-10. doi: 10.2307/1222480
[3]	M. S. Bayzid, T. Warnow, Naive binning improves phylogenomic analyses, Bioinformatics, 29 (2013), 2277-2284. doi: 10.1093/bioinformatics/btt394
[4]	L. J. Billera, S. P. Holmes, K. Vogtmann, Geometry of the space of phylogenetic trees, Adv. Appl. Math., 27 (2001), 733-767. doi: 10.1006/aama.2001.0759
[5]	D. Bryant, R. Bouckaert, J. Felsenstein, N. A. Rosenberg, A. RoyChoudhury, Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis, Mol. Biol. Evol., 29 (2012), 1917-1932. doi: 10.1093/molbev/mss086
[6]	J. Chakerian, S. Holmes, DISTORY: Distance between phylogenetic histories. R package version, 1 (2013).
[7]	J. Chifman, L. Kubatko, Quartet inference from SNP data under the coalescent model, Bioinformatics, 30 (2014), 3317-3324. doi: 10.1093/bioinformatics/btu530
[8]	J. A. Cotton, M. Wilkinson, Majority-rule supertrees, Syst. biol., 56 (2007), 445-452.
[9]	V. Dinh, L. S. T. Ho, M. A. Suchard, F. A. Matsen IV, Consistency and convergence rate of phylogenetic inference via regularization, Ann. Stat., 46 (2018), 1481.
[10]	J. Gatesy, M. S. Springer, Phylogenetic analysis at deep timescales: unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum, Mol. Phylogenet. Evol., 80 (2014), 231-266. doi: 10.1016/j.ympev.2014.08.013
[11]	J. Heled, A. J. Drummond, Bayesian inference of species trees from multilocus data, Mol. Biol. Evol., 27 (2009), 570-580.
[12]	W. Hoeffding, Probability inequalities for sums of bounded random variables, J. Am. Stat. Assoc., 58 (1963), 13-30. doi: 10.1080/01621459.1963.10500830
[13]	S. Ji, J. Kollár, B. Shiffman, A global Łojasiewicz inequality for algebraic varieties, T. Am. Math. Soc., 329 (1992), 813-818.
[14]	L. S. Kubatko, B. C. Carstens, L. L. Knowles, STEM: species tree estimation using maximum likelihood for gene trees under coalescence, Bioinformatics, 25 (2009), 971-973. doi: 10.1093/bioinformatics/btp079
[15]	M. K. Kuhner, J. Felsenstein, A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates, Mol. Biol. Evol., 11 (1994), 459-468.
[16]	B. R. Larget, S. K. Kotha, C. N. Dewey, C. Ané, BUCKy: gene tree/species tree reconciliation with bayesian concordance analysis, Bioinformatics, 26 (2010), 2910-2911. doi: 10.1093/bioinformatics/btq539
[17]	L. Liu, L. Yu, Estimating species trees from unrooted gene trees, Syst. Biol., 60 (2011), 661-667. doi: 10.1093/sysbio/syr027
[18]	L. Liu, L. Yu, S. V. Edwards, A maximum pseudo-likelihood approach for estimating species trees under the coalescent model, BMC Evol. Biol., 10 (2010), 302. doi: 10.1186/1471-2148-10-302
[19]	S. Mirarab, M. S. Bayzid, B. Boussau, T. Warnow, Statistical binning enables an accurate coalescent-based estimation of the avian tree, Science, 346 (2014), 1250463. doi: 10.1126/science.1250463
[20]	S. Mirarab, R. Reaz, M. S. Bayzid, T. Zimmermann, M. S. Swenson, T. Warnow, ASTRAL: genome-scale coalescent-based species tree estimation, Bioinformatics, 30 (2014), i541-i548. doi: 10.1093/bioinformatics/btu462
[21]	E. Mossel, S. Roch, Incomplete lineage sorting: consistent phylogeny estimation from multiple loci, IEEE ACM T. Comput. Bi., 7 (2008), 166-171.
[22]	S. Patel, R. T. Kimball, E. L. Braun, Error in phylogenetic estimation for bushes in the tree of life, Journal of Phylogenetics & Evolutionary Biology, (2013).
[23]	D. F. Robinson, Comparison of labeled trees with valency three, J. Comb. Theory B, 11 (1971), 105-119. doi: 10.1016/0095-8956(71)90020-7
[24]	S. Roch, M. Nute, T. Warnow, Long-branch attraction in species tree estimation: inconsistency of partitioned likelihood and topology-based summary methods, Syst. biol., 68 (2019), 281-297. doi: 10.1093/sysbio/syy061
[25]	S. Roch, T. Warnow, On the robustness to gene tree estimation error (or lack thereof) of coalescent-based species tree methods, Syst. Biol., 64 (2015), 663-676. doi: 10.1093/sysbio/syv016
[26]	A. Rokas, B. L. Williams, N. King, S. B. Carroll, Genome-scale approaches to resolving incongruence in molecular phylogenies, Nature, 425 (2003), 798-804. doi: 10.1038/nature02053
[27]	K. P. Schliep, phangorn: phylogenetic analysis in r, Bioinformatics, 27 (2011), 592-593. doi: 10.1093/bioinformatics/btq706
[28]	M. Steel, A. Rodrigo, Maximum likelihood supertrees, Syst. Biol., 57 (2008), 243-250. doi: 10.1080/10635150802033014
[29]	P. Vachaspati, T. Warnow, ASTRID: accurate species trees from internode distances, BMC genomics, 16 (2015), 1-13. doi: 10.1186/1471-2164-16-1

Reader Comments

Your name:*

Email:*
© 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)