Research article Special Issues

Convergence of maximum likelihood supertree reconstruction

  • Received: 29 February 2020 Accepted: 16 May 2021 Published: 11 June 2021
  • MSC : Primary 05C05, 62F12; secondary 92B10, 92D15

  • Supertree methods are tree reconstruction techniques that combine several smaller gene trees (possibly on different sets of species) to build a larger species tree. The question of interest is whether the reconstructed supertree converges to the true species tree as the number of gene trees increases (that is, the consistency of supertree methods). In this paper, we are particularly interested in the convergence rate of the maximum likelihood supertree. Previous studies on the maximum likelihood supertree approach often formulate the question of interest as a discrete problem and focus on reconstructing the correct topology of the species tree. Aiming to reconstruct both the topology and the branch lengths of the species tree, we propose an analytic approach for analyzing the convergence of the maximum likelihood supertree method. Specifically, we consider each tree as one point of a metric space and prove that the distance between the maximum likelihood supertree and the species tree converges to zero at a polynomial rate under some mild conditions. We further verify these conditions for the popular exponential error model of gene trees.

    Citation: Vu Dinh, Lam Si Tung Ho. Convergence of maximum likelihood supertree reconstruction[J]. AIMS Mathematics, 2021, 6(8): 8854-8867. doi: 10.3934/math.2021513

    Related Papers:

  • Supertree methods are tree reconstruction techniques that combine several smaller gene trees (possibly on different sets of species) to build a larger species tree. The question of interest is whether the reconstructed supertree converges to the true species tree as the number of gene trees increases (that is, the consistency of supertree methods). In this paper, we are particularly interested in the convergence rate of the maximum likelihood supertree. Previous studies on the maximum likelihood supertree approach often formulate the question of interest as a discrete problem and focus on reconstructing the correct topology of the species tree. Aiming to reconstruct both the topology and the branch lengths of the species tree, we propose an analytic approach for analyzing the convergence of the maximum likelihood supertree method. Specifically, we consider each tree as one point of a metric space and prove that the distance between the maximum likelihood supertree and the species tree converges to zero at a polynomial rate under some mild conditions. We further verify these conditions for the popular exponential error model of gene trees.



    加载中


    [1] N. Amenta, M. Godwin, N. Postarnakevich, K. S. John, Approximating geodesic tree distance, Inform. Process. Lett., 103 (2007), 61-65. doi: 10.1016/j.ipl.2007.02.008
    [2] B. R. Baum, Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees, Taxon, 41 (1992), 3-10. doi: 10.2307/1222480
    [3] M. S. Bayzid, T. Warnow, Naive binning improves phylogenomic analyses, Bioinformatics, 29 (2013), 2277-2284. doi: 10.1093/bioinformatics/btt394
    [4] L. J. Billera, S. P. Holmes, K. Vogtmann, Geometry of the space of phylogenetic trees, Adv. Appl. Math., 27 (2001), 733-767. doi: 10.1006/aama.2001.0759
    [5] D. Bryant, R. Bouckaert, J. Felsenstein, N. A. Rosenberg, A. RoyChoudhury, Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis, Mol. Biol. Evol., 29 (2012), 1917-1932. doi: 10.1093/molbev/mss086
    [6] J. Chakerian, S. Holmes, DISTORY: Distance between phylogenetic histories. R package version, 1 (2013).
    [7] J. Chifman, L. Kubatko, Quartet inference from SNP data under the coalescent model, Bioinformatics, 30 (2014), 3317-3324. doi: 10.1093/bioinformatics/btu530
    [8] J. A. Cotton, M. Wilkinson, Majority-rule supertrees, Syst. biol., 56 (2007), 445-452.
    [9] V. Dinh, L. S. T. Ho, M. A. Suchard, F. A. Matsen IV, Consistency and convergence rate of phylogenetic inference via regularization, Ann. Stat., 46 (2018), 1481.
    [10] J. Gatesy, M. S. Springer, Phylogenetic analysis at deep timescales: unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum, Mol. Phylogenet. Evol., 80 (2014), 231-266. doi: 10.1016/j.ympev.2014.08.013
    [11] J. Heled, A. J. Drummond, Bayesian inference of species trees from multilocus data, Mol. Biol. Evol., 27 (2009), 570-580.
    [12] W. Hoeffding, Probability inequalities for sums of bounded random variables, J. Am. Stat. Assoc., 58 (1963), 13-30. doi: 10.1080/01621459.1963.10500830
    [13] S. Ji, J. Kollár, B. Shiffman, A global Łojasiewicz inequality for algebraic varieties, T. Am. Math. Soc., 329 (1992), 813-818.
    [14] L. S. Kubatko, B. C. Carstens, L. L. Knowles, STEM: species tree estimation using maximum likelihood for gene trees under coalescence, Bioinformatics, 25 (2009), 971-973. doi: 10.1093/bioinformatics/btp079
    [15] M. K. Kuhner, J. Felsenstein, A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates, Mol. Biol. Evol., 11 (1994), 459-468.
    [16] B. R. Larget, S. K. Kotha, C. N. Dewey, C. Ané, BUCKy: gene tree/species tree reconciliation with bayesian concordance analysis, Bioinformatics, 26 (2010), 2910-2911. doi: 10.1093/bioinformatics/btq539
    [17] L. Liu, L. Yu, Estimating species trees from unrooted gene trees, Syst. Biol., 60 (2011), 661-667. doi: 10.1093/sysbio/syr027
    [18] L. Liu, L. Yu, S. V. Edwards, A maximum pseudo-likelihood approach for estimating species trees under the coalescent model, BMC Evol. Biol., 10 (2010), 302. doi: 10.1186/1471-2148-10-302
    [19] S. Mirarab, M. S. Bayzid, B. Boussau, T. Warnow, Statistical binning enables an accurate coalescent-based estimation of the avian tree, Science, 346 (2014), 1250463. doi: 10.1126/science.1250463
    [20] S. Mirarab, R. Reaz, M. S. Bayzid, T. Zimmermann, M. S. Swenson, T. Warnow, ASTRAL: genome-scale coalescent-based species tree estimation, Bioinformatics, 30 (2014), i541-i548. doi: 10.1093/bioinformatics/btu462
    [21] E. Mossel, S. Roch, Incomplete lineage sorting: consistent phylogeny estimation from multiple loci, IEEE ACM T. Comput. Bi., 7 (2008), 166-171.
    [22] S. Patel, R. T. Kimball, E. L. Braun, Error in phylogenetic estimation for bushes in the tree of life, Journal of Phylogenetics & Evolutionary Biology, (2013).
    [23] D. F. Robinson, Comparison of labeled trees with valency three, J. Comb. Theory B, 11 (1971), 105-119. doi: 10.1016/0095-8956(71)90020-7
    [24] S. Roch, M. Nute, T. Warnow, Long-branch attraction in species tree estimation: inconsistency of partitioned likelihood and topology-based summary methods, Syst. biol., 68 (2019), 281-297. doi: 10.1093/sysbio/syy061
    [25] S. Roch, T. Warnow, On the robustness to gene tree estimation error (or lack thereof) of coalescent-based species tree methods, Syst. Biol., 64 (2015), 663-676. doi: 10.1093/sysbio/syv016
    [26] A. Rokas, B. L. Williams, N. King, S. B. Carroll, Genome-scale approaches to resolving incongruence in molecular phylogenies, Nature, 425 (2003), 798-804. doi: 10.1038/nature02053
    [27] K. P. Schliep, phangorn: phylogenetic analysis in r, Bioinformatics, 27 (2011), 592-593. doi: 10.1093/bioinformatics/btq706
    [28] M. Steel, A. Rodrigo, Maximum likelihood supertrees, Syst. Biol., 57 (2008), 243-250. doi: 10.1080/10635150802033014
    [29] P. Vachaspati, T. Warnow, ASTRID: accurate species trees from internode distances, BMC genomics, 16 (2015), 1-13. doi: 10.1186/1471-2164-16-1
  • Reader Comments
  • © 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2379) PDF downloads(75) Cited by(0)

Article outline

Figures and Tables

Figures(2)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog