Citation: Xuhua Xia. Starless bias and parameter-estimation bias in the likelihood-based phylogenetic method[J]. AIMS Genetics, 2018, 5(4): 212-223. doi: 10.3934/genet.2018.4.212
[1] | Jeffrey M. Marcus . Our love-hate relationship with DNA barcodes, the Y2K problem, and the search for next generation barcodes. AIMS Genetics, 2018, 5(1): 1-23. doi: 10.3934/genet.2018.1.1 |
[2] | Chih-Sheng Chen, Ching-Tsan Huang, Ruey-Shyang Hseu . Evidence for two types of nrDNA existing in Chinese medicinal fungus Ophiocordyceps sinensis. AIMS Genetics, 2017, 4(3): 192-201. doi: 10.3934/genet.2017.3.192 |
[3] | Ruthie Su, Margaret P. Adam, Linda Ramsdell, Patricia Y. Fechner, Margarett Shnorhavorian . Can the external masculinization score predict the success of genetic testing in 46,XY DSD?. AIMS Genetics, 2015, 2(2): 163-172. doi: 10.3934/genet.2015.2.163 |
[4] | Joanna L. Richens, Jonathan P. Bramble, Hannah L. Spencer, Fiona Cantlay, Molly Butler, Paul O’Shea . Towards defining the Mechanisms of Alzheimer’s disease based on a contextual analysis of molecular pathways. AIMS Genetics, 2016, 3(1): 25-48. doi: 10.3934/genet.2016.1.25 |
[5] | Cecilia Rodríguez, Marcelo H. Cassini, Gabriela del V. Delgado, Argentinian Integron Study Group, María S. Ramírez, D. Centrón . Analysis of class 2 integrons as a marker for multidrug resistance among Gram negative bacilli. AIMS Genetics, 2016, 3(4): 196-204. doi: 10.3934/genet.2016.4.196 |
[6] | Xiaojuan Wang, Jianghong Wu, Zhongren Yang, Fenglan Zhang, Hailian Sun, Xiao Qiu, Fengyan Yi, Ding Yang, Fengling Shi . Physiological responses and transcriptome analysis of the Kochia prostrata (L.) Schrad. to seedling drought stress. AIMS Genetics, 2019, 6(2): 17-35. doi: 10.3934/genet.2019.2.17 |
[7] | Vladlena Tiasto, Valeriia Mikhailova, Valeriia Gulaia, Valeriia Vikhareva, Boris Zorin, Alexandra Kalitnik, Alexander Kagansky . Esophageal cancer research today and tomorrow: Lessons from algae and other perspectives. AIMS Genetics, 2018, 5(1): 75-90. doi: 10.3934/genet.2018.1.75 |
[8] | Valeriia Mikhailova, Valeriia Gulaia, Vladlena Tiasto, Stanislav Rybtsov, Margarita Yatsunskaya, Alexander Kagansky . Towards an advanced cell-based in vitro glioma model system. AIMS Genetics, 2018, 5(2): 91-112. doi: 10.3934/genet.2018.2.91 |
[9] | M. Reza Jabalameli, Ignacio Briceno, Julio Martinez, Ignacio Briceno, Reuben J. Pengelly, Sarah Ennis, Andrew Collins . Aarskog-Scott syndrome: phenotypic and genetic heterogeneity. AIMS Genetics, 2016, 3(1): 49-59. doi: 10.3934/genet.2016.1.49 |
[10] | Carina Ladeira, Lenka Smajdova . The use of genotoxicity biomarkers in molecular epidemiology: applications in environmental, occupational and dietary studies. AIMS Genetics, 2017, 4(3): 166-191. doi: 10.3934/genet.2017.3.166 |
I explore two phylogenetic issues here. The first is the starless bias. If a set of aligned sequences are equidistant from each other, i.e., the number of various types of substitutions between any two sequences is exactly the same, then we intuitively would expect a star tree. Distance-based methods such as neighbor-joining [1] or FastME [2] will indeed give us a star tree whenever pairwise distances are all equal. The starless bias refers to the inability of a phylogenetic method to generate a star tree with equidistant sequences. It was first alluded to in a study of potential bias in maximum likelihood method involving missing data and rate heterogeneity over sites [3], but its occurrence is more general that that. I will illustrate this bias here with four sequences, identify the source of the bias, and discuss alternative approaches relevant to the problem.
The second issue, related to the first, is the confounding effect of tree topology on phylogenetic parameter estimation, in particular the shape parameter α of gamma distribution used to model rate heterogeneity over sites. It may seem obvious that α depend on topology, but the issue needs to be studied for two reasons. First, how such dependence occurs is not well dissected. Second, one frequently encounters interpretation of α as if it is a sequence-specific feature, with a small alpha interpreted as indicating some sites strongly constrained by purifying selection and other sites not. It is therefore relevant to caution against such interpretation with real examples. For simplicity, I will work on 4-OTU trees only so that all site patterns can be conveniently considered.
I provide two self-contained R script files
With four sequences, there are 256 possible site patterns. For sequences to evolve according to the JC69 substitution model [4], the 256 site patterns would become equally frequently after sequences experienced full substitution saturation. Different combinations of these 256 site patterns support different substitution models and different topologies. We will focus on the JC69 substitution model and classify these 256 possible site patterns into 15 classes (Figure 1A) so that we only need to calculate likelihood for these 15 site pattern classes. The transition probabilities needed for likelihood calculation for the JC69 model, together with other frequently used Markovian nucleotide substitution models such as F84 (used in DNAML since 1984) [5,6], {XE “substitution model: HKY85”} [7], TN93 {XE “substitution model: TN93”} [8], and GTR {XE “substitution model: GTR”} [9,10] have been numerically illustrated and re-derived with three different approaches [11,12]. These illustrations are sufficiently detailed for one to extend the JC69 model in the two R script files to other models.
The 15 site pattern classes (Figure 1A) can be further lumped into five groups (G1 to G5, Figure 1A). Sites in G1 have all four OTUs (S1 to S4) with different nucleotides, with 24 unique site patterns represented as a single G1 site in Figure 1A (because JC69 sees these 24 site patterns as identical). Sites in G2 each have three different nucleotides, with a total of 144 unique site patterns. Only six sites are used to represent them in Figure 1A because JC69 sees these 144 unique sites in this group to be identical to one of the six representative sites. Sites in G3 feature two nucleotides, with three OTUs having the same nucleotide, and a total of 48 unique sites represented by four sites in Figure 1A. Sites in G4 also feature two nucleotides, with two OTUs sharing one nucleotide and the other two OTUs sharing the other (i.e., they are the traditional informative sites in Fitch parsimony). There are 36 unique G4 sites represented by three sites in Figure 1A. G5 sites are monomorphic, with four unique site patterns represented by site 15 in Figure 1A.
Given a JC69 model, the five groups of sites (G1 to G5 in Figure 1A) support the three unrooted topologies equally (i.e., they do not preferentially support any of the three). This is obvious for G1 and G5 sites. The six sites in G2, shown in Figure 1A, jointly also support the three topologies equally, so do the four sites in G3 and three sites in G4. Note that, with a star tree and JC69, we cannot have G1, G2 and G4 sites without having G3 sites first. With low sequence divergence, almost all site patterns from a star tree should be G3 sites.
Subscripts n1 to n5 in Figure 1A mean multiples of the enclosed sites, e.g., an n2 of 10 means that the six G2 sites in Figure 1A is repeated 10 times (for a total of 60 sites). A combination of (n1, n2, n3, n4, n5), where ni corresponds to those in Figure 1A, means a set of aligned sequences containing n1 G1 sites, n2 G2 sites (i.e., n2*6 sites), n3 G3 sites (for a total of n3*4 sites), n4 G4 sites (for a total of n4*3 sites), and n5 G5 sites (Figure 1A). A set of four sequences of length 256 containing all 256 possible site patterns is specified as (24, 24, 12, 12, 4). Such a set of sequences will naturally have equal nucleotide frequencies and twice as many transversions as transitions, i.e., the equilibrium ratio of substitution saturation. A set of sequences with any combinations of (n1, n2, n3, n4, n5) are equidistant from each other from a JC69 perspective, and we would desire to have a star tree (Figure 1B) instead of one of the three resolved trees (Figure 1C). Hereafter I specify a set of four aligned sequences equidistant form each other simply by (n1, n2, n3, n4, n5) which guarantee that the four sequences are equidistant from each other.
Not all (n1, n2, n3, n4, n5) combinations are equally likely under the JC69 model with a star tree. For example, the site pattern combination (24, 24, 12, 96, 32) has a near-zero chance to occur with a star tree and a strict JC69 model, because, given a star tree with x5 = 0, G4 sites can only emerge through independent substitutions along each of the four branches (i.e., all G4 sites result from convergent substitutions). This means that we cannot have G4 sites in a star tree without first having G3 sites, so G4 sites should not be more frequent than G3 sites given a star tree and JC69. The ratio of G3/G4 sites will decrease from ∞ (when the first substitution occurs) towards 4/3 (i.e., 48 possible G4 site patterns and 36 possible site patterns) when sequences have gradually experienced full substitution saturation. The site pattern combination above with a ratio of G3/G4 equal to (12*4)/(96*3) cannot happen with a star tree because the ratio is far too small.
However, when different genes evolving under JC69 models with different rates (and potentially with different evolutionary histories and conflicting phylogenetic signals) are concatenated, strange site pattern combinations such as (24, 24, 12, 96, 32) may occur and cannot be dismissed as unreal. In fact, this site pattern combination results from a concatenation of site patterns from simulations with four different trees, i.e., the star tree and the three resolved trees (all with JC69 with no rate heterogeneity over sites), with slight modifications to ensure 1) that the nucleotide frequencies are exactly equal to 0.25, 2) that the number of transversions is exactly trice as many as the number of transitions, and 3) that the number of transitional and transversional differences between each pairwise comparison is exactly the same.
The site pattern combination (24, 24, 12, 96, 32) mentioned above represents a special deviation from any of the four topologies (the star tree plus the three resolved trees). While the four sequences will remain equidistant from each other with the site pattern combination of (24, 24, 12, 96, 32), the likelihood method will not favour a star tree in spite of equidistance among sequences. In general, as illustrated in Figure 2, we expect that increasing number of G4 sites relative to G3 sites would increase the likelihood of rejecting a star tree. As I mentioned before, G3 sites are the first site pattern to appear in sequence evolution along a star tree. In contrast, a G4 site requires a minimum of two substitutions when x5 = 0 (Figure 2C), but only a minimum of one with a resolved tree from a parsimony perspective (Figure 2D). Tree lnL would be greater with x5 > 0 (so that a single substitution can occur along the internal branch to be shared by two descendants) than with x5 = 0 (which would force two independent substitutions). Thus, G4 sites favor a resolved tree (x5 > 0). In short, increasing number of G4 sites relative to G3 sites increases the likelihood of rejecting the star tree. This is true even with G4 sites support all three resolved topologies equally because 1/3 of the G4 sites will support a resolved topology.
I will use G4 sites to illustrate the effect of topology on rate heterogeneity over sites. With a star tree in Figure 2C, all G4 sites require exactly the same number of changes (a minimum of two substitutions per site from a parsimony perspective). In other words, there is no rate heterogeneity among G4 sites with a star tree. However, for a resolved tree in Figure 2D, 1/3 of the G4 sites (with identical nucleotides between sister groups as in Figure 2D) would require only one substitution from the parsimony perspective. The other 2/3 of the G4 sites (with different nucleotides between sister groups) would require two substitutions from the parsimony perspective. Thus, from a parsimony perspective, 1/3 of the G4 sites evolve at a rate half as fast as the other 2/3 of the sites. Likewise with the likelihood perspective when multiple substitutions are corrected, 2/3 of the G4 sites will have a substitution rate at least twice as large as 1/3 of the G4 sites, giving rise to rate heterogeneity not present with a star tree. Thus, the relative numbers of G4 and G3 sites (denoted n4 and n3, respectively, Figure 1A) affect not only the tendency to reject the star tree, but also the characterization of the rate heterogeneity over sites. In what follows, I assess the effect of (n4-n3) on the tendency to reject the star tree and the confounding effect of topology on estimating the shape parameter α.
I provide five additional sets of sequences in fasta format as representative of various site pattern combinations.
Supplementary sequence files
Supplementary file
For anyone to recreate results in this paper, I have implemented the likelihood method in R for the JC69 model and four OTUs to evaluate a star tree and a rooted tree, either with a constant rate over sites (
Because the continuous gamma requires integration over the rate, I used the R function ‘integral’ in the ‘pracma’ package which therefore needs to be installed before running the R scripts. The tree evaluation with a continuous gamma may take three minutes to complete on a PC with an i7-4770 CPU. I also used PhyML [15] to obtain lnL for the resolved tree. I set the tree improvement option ‘-s’ to ‘BEST’ (best of NNI and SPR search), and the ‘-o’ option to ‘tlr’ which optimizes the topology, the branch lengths and rate parameters. For JC69+Γ model, four categories of rates were used to estimate the shape parameter. PhyML does not generate lnL for a star tree, hence the need for the R scripts.
This part, while trivial, is needed for methodological validation. I have simulated sequence evolution of four OTUs under JC69 model and a star tree, by using the Evolver program in the PAML package [13], with different tree lengths and sequence lengths from 500 to 3000, each with 100 sets of sequences, with a constant rate over site. The sequences are then processed in DAMBE [16] so that the six site patterns in G2 (Figure 1A) occur exactly equally, so do the four site patterns in G3 and three site patterns in G4 (Figure 1A). This ensures that the resulting sequences do not support any one of the three alternative topologies. I analyzed these site patterns by both PhyML 3.1 [15] and the likelihood methods implemented in R in the
Table 1 includes results for two such simulated sets of sequences without rate heterogeneity (A and B in the column headed by ‘Data’, Table 1). These two sets of sequences are in
Data(1) | Method | b1 | b2 | b3 | b4 | b5(2) | lnL |
A | 0.426784 | 0.426784 | 0.426784 | 0.426784 | 0.000000 | −1537.68 | |
0.426784 | 0.426784 | 0.426784 | 0.426784 | N/A | −1537.68 | ||
PhyML | 0.426785 | 0.426783 | 0.426793 | 0.426769 | 0.000001 | −1537.68 | |
B | 0.400581 | 0.400575 | 0.400584 | 0.400578 | 0.000000 | −1416.09 | |
0.400584 | 0.400584 | 0.400584 | 0.400584 | N/A | −1416.09 | ||
PhyML | 0.400586 | 0.400578 | 0.400570 | 0.400584 | 0.000001 | −1416.09 | |
C | 5.655316 | 2.585728 | 10 | 10 | 5.63553 | −1419.57 | |
10 | 10 | 9.999938 | 10 | N/A | −1419.57 | ||
PhyML | 10.000000 | 6.303193 | 10.000000 | 6.180340 | 6.303193 | −1419.57 | |
*Note: (1) A to C corresponding to Supplementary file |
Data set C (Table 1) is in
The likelihood method rejects a star tree for sequences in
Method(1) | b1 | b2 | b3 | b4 | b5(2) | α(3) | lnL |
0.849814 | 0.849814 | 1.693866 | 0 | 0.850 | N/A | −2943.148 | |
1.020276 | 1.020276 | 1.020276 | 1.020276 | N/A | N/A | −2947.793 | |
PhyML | 0.849942 | 0.849764 | 1.694148 | 0.000015 | 0.850 | N/A | −2943.148 |
NewΓ3 | 2.783640 | 1.133961 | 1.575337 | 2.343977 | 10 | 1 | −2930.116 |
3.359848 | 1.413517 | 1.886961 | 2.886549 | 30 | 0.843 | −2921.743 | |
3.767304 | 1.850750 | 2.302868 | 3.315211 | 50 | 0.745 | −2918.782 | |
1.020429 | 1.020429 | 1.020429 | 1.020429 | N/A | 10000 | −2947.797 | |
PhyML | 2.213152 | 1.976554 | 2.206997 | 1.968304 | 10 | 0.814 | −2927.806 |
*Note: (1) Results from running Supplementary R scripts |
While a site pattern of (24, 24, 12, 96, 32) might be considered too extreme, one may take a milder site pattern combination such as (120, 192, 120, 204, 164), with a sequence length of 2528. The resulting lnL based on JC69+Γ is −13880.12 for a resolved tree, but −13889.4900 for a star tree, so 2ΔlnL = 18.74. The star tree is rejected with p = 0.000015.
One might argue that there is nothing wrong with the likelihood method itself because the data set is pathological. As mentioned in the Materials and Methods section, this data set is from a concatenation of simulated sequences from four topologies (the star tree and the three resolved trees) with modifications so that nucleotide frequencies are equal and all pairwise comparisons lead to the same number of transition and transversional differences. Thus, although the sequences are equidistant from each other, the star tree is not an appropriate model for the concatenated sequences (neither is any of the three resolved topologies). However, the issue at hand is not on the validity of the likelihood approach, but on its robustness in phylogenetic reconstruction with conflicting phylogenetic signals. With sequences equidistant from each other, we do desire a star tree instead of having it conclusively rejected. Furthermore, we have the problem of inconsistent parameter estimation that I highlight below.
The PAML package [13] has a basemlg program which implements a continuous gamma-distributed rate. However, it does not produce correct results. Take for example the sequences in
The estimated parameter values in Table 2 are disconcerting in two ways. First, when the star tree is imposed, there is no rate heterogeneity over sites, with the shape parameter α in the order of 10000 (Table 2). However, the estimated α becomes 0.745 when b5 (x5 in Figure 1) is allowed to be greater than 0, indicating strong rate heterogeneity over sites. I have given reasons in Figure 2, illustrated with G4 sites, that an excess of G4 sites will result in rate heterogeneity when a resolved tree is imposed but no rate heterogeneity when a star tree is imposed. That is, given a set of G4 sites supporting the three resolved topologies equally, 1/3 of the G4 sites will share a low rate of substitution whereas the other 2/3 will share a high rate of substitution when a resolved topology is imposed. In contrast, all G4 sites will have the same rate of substitution when a star tree is imposed. This highlights the point that we do need a star tree instead of having three equally supported resolved trees because we need the star tree to perform proper parameter estimation in this case. As I mentioned before, the sequence data is a concatenation of sequence simulation on four different topologies (star and three resolved trees), all with JC69 with a constant rate. However, the maximum likelihood criterion does not allow us to choose the star tree.
Second, the substantial change of lnL with b5 under JC69+Γ, associated with a change in α, is also unpleasant. For all practical consideration, the range of values for b5 from 10 to 50 all just indicates a branch experiencing substitution saturation, and one would expect lnL to change little with b5 in this range of values. However, lnL is −2930.116 with b5 = 10 (and α = 1, Table 2), but −0.2918.782 with b5 = 50 (and α = 0.745, Table 2). Given that the sequences are equidistant from each other, we do desire a tree with b5 = 0, not a method that says that a tree with an unrealistically long b5 is the best tree. Note that PhyML (version 3.1) appears to have set the maximum branch length to 10, so its generated lnL (= −2927.806, Table 2) has not yet reached its maximum.
While α is known to depend on topology, phylogenetic researchers often interpret α as if it is a sequence-specific property. A small α is often interpreted to mean strong purifying selection constraining certain sites but not others. I hope that the illustration above will caution against such interpretation.
I used sequences in
Figure 3 indicates that the relationship between 2ΔlnL and (n4-n3) is weaker when rate heterogeneity over sites is modelled by a gamma distribution. However, it is not always so, and one can easily find a counter example in which a constant-rate model leads to rejection of the star tree and a gamma-distributed rate model does not. For example, for a site pattern combination of (400,0,0,0,400) we will have lnL = −3984.64 for a resolved tree with JC69, lnL = −3987.998 for a star tree. This leads to 2ΔlnL = 6.716 and a rejection of the star tree (DF = 1, p = 0.0096). With JC69+Γ, we will have lnL = −3546.648 for a resolved tree, lnL = −3547.94 for a star tree. This leads to 2ΔlnL = 2.584 and the star tree is not rejected (DF = 1, p = 0.1079).
I should finally mention that the starless bias and parameter-estimation bias illustrated by the site pattern combination of (24, 24, 12, 96, 32) in
Two approaches are relevant to this problem of different topologies contributing conflicting phylogenetic signals caused by concatenating sequences of potentially different evolutionary history. The first is to use mixture model with different tree models each contributing a subset of sites. This is different from mixture models with the same topology but different nucleotide or amino acid substitution models. The second approach is as follows. For each phylogeny (T) obtained from a set of sequence data (S) under a substitution model (M), we should obtain the empirical site patterns from S and compare these empirical site patterns against their expectation from sequence simulation based on T and M. If the empirical site patterns deviate significantly from the expectation, then we conclude that the tree model is not appropriate. Applying this approach to sequences in
This study is supported by Discovery Grant from Natural Science and Engineering Research Council (NSERC, RGPIN/2018-03878) of Canada. I thank Guy Baele, Sudhir Kumar, Laura Kubatko and Arindam RoyChoudhury for discussion. Z. Yang and two anonymous reviewers provided comments that improved clarity of the manuscript.
The authors declare no conflict of interest.
[1] | Saitou N, Nei M (1987) The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol Biol Evol 4: 406–425. |
[2] | Desper R, Gascuel O (2004) Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting. Mol Biol Evol 21: 587–598. |
[3] | Xia X (2014) Phylogenetic bias in the likelihood method caused by missing data coupled with Among-Site rate variation: An analytical approach. In: Basu M, Pan Y, Wang J, editors. Bioinformatics Research and Applications: Springer, 12–23. |
[4] | Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN, editor. Mammalian Protein Metabolism. New York: Academic Press, 21–123. |
[5] |
Hasegawa M, Kishino H (1989) Heterogeneity of tempo and mode of mitochondrial DNA evolution among mammalian orders. Jpn J Genet 64: 243–258. doi: 10.1266/jjg.64.243
![]() |
[6] |
Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J Mol Evol 29: 170–179. doi: 10.1007/BF02100115
![]() |
[7] |
Hasegawa M, Kishino H, Yano T (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22: 160–174. doi: 10.1007/BF02101694
![]() |
[8] | Tamura K, Nei M (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10: 512–526. |
[9] |
Lanave C, Preparata G, Saccone C, et al. (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20: 86–93. doi: 10.1007/BF02101990
![]() |
[10] | Tavaré S (1986) Some probabilistic and statistical problems in the analysis of DNA sequences. In: Miura RM, editor. Lectures on Mathematics in the Life Sciences. Providence, RI: Amer Math Soc: 57–86. |
[11] | Xia X (2017) Deriving transition probabilities and evolutionary distances from substitution rate matrix by probability reasoning. J Genet Genome Res 3: 031. |
[12] | Xia X (2018) Nucleotide substitution models and evolutionary distances. Bioinf Cell: 269–314. |
[13] |
Yang Z (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol 24: 1586–1591. doi: 10.1093/molbev/msm088
![]() |
[14] | Xia X (2018) Maximum likelihood in molecular phylogenetics. Bioinf Cell: 381–395. |
[15] |
Guindon S, Dufayard JF, Lefort V, et al. (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0. Syst Biol 59: 307–321. doi: 10.1093/sysbio/syq010
![]() |
[16] |
Xia X (2018) DAMBE7: New and improved tools for data analysis in molecular biology and evolution. Mol Biol Evol 35: 1550–1552. doi: 10.1093/molbev/msy073
![]() |
1. | Katherine E Noah, Jiasheng Hao, Luyan Li, Xiaoyan Sun, Brian Foley, Qun Yang, Xuhua Xia, Major Revisions in Arthropod Phylogeny Through Improved Supermatrix, With Support for Two Possible Waves of Land Invasion by Chelicerates, 2020, 16, 1176-9343, 10.1177/1176934320903735 |
Data(1) | Method | b1 | b2 | b3 | b4 | b5(2) | lnL |
A | 0.426784 | 0.426784 | 0.426784 | 0.426784 | 0.000000 | −1537.68 | |
0.426784 | 0.426784 | 0.426784 | 0.426784 | N/A | −1537.68 | ||
PhyML | 0.426785 | 0.426783 | 0.426793 | 0.426769 | 0.000001 | −1537.68 | |
B | 0.400581 | 0.400575 | 0.400584 | 0.400578 | 0.000000 | −1416.09 | |
0.400584 | 0.400584 | 0.400584 | 0.400584 | N/A | −1416.09 | ||
PhyML | 0.400586 | 0.400578 | 0.400570 | 0.400584 | 0.000001 | −1416.09 | |
C | 5.655316 | 2.585728 | 10 | 10 | 5.63553 | −1419.57 | |
10 | 10 | 9.999938 | 10 | N/A | −1419.57 | ||
PhyML | 10.000000 | 6.303193 | 10.000000 | 6.180340 | 6.303193 | −1419.57 | |
*Note: (1) A to C corresponding to Supplementary file |
Method(1) | b1 | b2 | b3 | b4 | b5(2) | α(3) | lnL |
0.849814 | 0.849814 | 1.693866 | 0 | 0.850 | N/A | −2943.148 | |
1.020276 | 1.020276 | 1.020276 | 1.020276 | N/A | N/A | −2947.793 | |
PhyML | 0.849942 | 0.849764 | 1.694148 | 0.000015 | 0.850 | N/A | −2943.148 |
NewΓ3 | 2.783640 | 1.133961 | 1.575337 | 2.343977 | 10 | 1 | −2930.116 |
3.359848 | 1.413517 | 1.886961 | 2.886549 | 30 | 0.843 | −2921.743 | |
3.767304 | 1.850750 | 2.302868 | 3.315211 | 50 | 0.745 | −2918.782 | |
1.020429 | 1.020429 | 1.020429 | 1.020429 | N/A | 10000 | −2947.797 | |
PhyML | 2.213152 | 1.976554 | 2.206997 | 1.968304 | 10 | 0.814 | −2927.806 |
*Note: (1) Results from running Supplementary R scripts |
Data(1) | Method | b1 | b2 | b3 | b4 | b5(2) | lnL |
A | 0.426784 | 0.426784 | 0.426784 | 0.426784 | 0.000000 | −1537.68 | |
0.426784 | 0.426784 | 0.426784 | 0.426784 | N/A | −1537.68 | ||
PhyML | 0.426785 | 0.426783 | 0.426793 | 0.426769 | 0.000001 | −1537.68 | |
B | 0.400581 | 0.400575 | 0.400584 | 0.400578 | 0.000000 | −1416.09 | |
0.400584 | 0.400584 | 0.400584 | 0.400584 | N/A | −1416.09 | ||
PhyML | 0.400586 | 0.400578 | 0.400570 | 0.400584 | 0.000001 | −1416.09 | |
C | 5.655316 | 2.585728 | 10 | 10 | 5.63553 | −1419.57 | |
10 | 10 | 9.999938 | 10 | N/A | −1419.57 | ||
PhyML | 10.000000 | 6.303193 | 10.000000 | 6.180340 | 6.303193 | −1419.57 | |
*Note: (1) A to C corresponding to Supplementary file |
Method(1) | b1 | b2 | b3 | b4 | b5(2) | α(3) | lnL |
0.849814 | 0.849814 | 1.693866 | 0 | 0.850 | N/A | −2943.148 | |
1.020276 | 1.020276 | 1.020276 | 1.020276 | N/A | N/A | −2947.793 | |
PhyML | 0.849942 | 0.849764 | 1.694148 | 0.000015 | 0.850 | N/A | −2943.148 |
NewΓ3 | 2.783640 | 1.133961 | 1.575337 | 2.343977 | 10 | 1 | −2930.116 |
3.359848 | 1.413517 | 1.886961 | 2.886549 | 30 | 0.843 | −2921.743 | |
3.767304 | 1.850750 | 2.302868 | 3.315211 | 50 | 0.745 | −2918.782 | |
1.020429 | 1.020429 | 1.020429 | 1.020429 | N/A | 10000 | −2947.797 | |
PhyML | 2.213152 | 1.976554 | 2.206997 | 1.968304 | 10 | 0.814 | −2927.806 |
*Note: (1) Results from running Supplementary R scripts |