TreeShrink: Efficient Detection of Outlier Tree Leaves

Uyen Mai¹⁵ &
Siavash Mirarab¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10562))

Included in the following conference series:

RECOMB International Workshop on Comparative Genomics

1172 Accesses
6 Citations

Abstract

Phylogenetic trees include errors for a variety of reasons. We argue that one way to detect errors is to build a phylogeny with all the data then detect taxa that artificially inflate the tree diameter. We formulate an optimization problem that seeks to find k leaves that can be removed to reduce the tree diameter maximally. We present a polynomial time solution to this “k-shrink” problem. Given this solution, we then use non-parametric statistics to find an outlier set of taxa that have an unexpectedly high impact on the tree diameter. We test our method, TreeShrink, on five biological datasets, and show that it is more conservative than rogue taxon removal using RogueNaRok. When the amount of filtering is controlled, TreeShrink outperforms RogueNaRok in three out of the five datasets, and they tie in another dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees

Article Open access 08 May 2018

Treemmer: a tool to reduce large phylogenetic datasets with minimal loss of diversity

Article Open access 02 May 2018

Scalable and Accurate Phylogenetic Placement Using pplacer-XR

References

Braun, M.J., Clements, J.E., Gonda, M.A.: The visna virus genome: evidence for a hypervariable site in the env gene and sequence homology among lentivirus envelope proteins. J. Virol. 61(12), 4046–4054 (1987)
Google Scholar
Hugenholtz, P., Huber, T.: Chimeric 16S rDNA sequences of diverse origin are accumulating in the public databases. Int. J. Syst. Evol. Microbio. 53(1), 289–293 (2003)
Article Google Scholar
Zwickl, D.J., Stein, J.C., Wing, R.A., Ware, D., Sanderson, M.J.: Disentangling methodological and biological sources of gene tree discordance on Oryza (Poaceae) chromosome 3. Syst. Biol. 63(5), 645–659 (2014)
Article Google Scholar
Leaché, A.D., Rannala, B.: The accuracy of species tree estimation under simulation: a comparison of methods. Syst. Biol. 60(2), 126–137 (2011)
Article Google Scholar
Mirarab, S., Bayzid, M.S., Boussau, B., Warnow, T.: Statistical binning enables an accurate coalescent-based estimation of the avian tree. Science 346(6215), 1250463 (2014)
Article Google Scholar
Gatesy, J., Springer, M.S.: PhyloGenet. Anal. at deep timescales: unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum. Mol. Phylogenet. Evol. 80, 231–266 (2014)
Article Google Scholar
Arvestad, L., Berglund, A.C., Lagergren, J., Sennblad, B.: Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution. In: RECOMB, pp. 326–335. ACM Press, New York (2004)
Google Scholar
Akerborg, O., Sennblad, B., Arvestad, L., Lagergren, J.: Simultaneous Bayesian gene tree reconstruction and reconciliation analysis. PNAS 106(14), 5714–5719 (2009)
Article Google Scholar
Szöllősi, G.J., Tannier, E., Daubin, V., Boussau, B.: The inference of gene trees with species trees. Syst. Biol. 64(1), e42–e62 (2014)
Article Google Scholar
Stolzer, M., Lai, H., Xu, M., et al.: Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees. Bioinformatics 28(18), i409–i415 (2012)
Article Google Scholar
Chauve, C., El-Mabrouk, N., Guéguen, L., Semeria, M., Tannier, E.: Duplication, rearrangement and reconciliation: a follow-up 13 years later. In: Chauve, C., El-Mabrouk, N., Tannier, E. (eds.) Models and Algorithms for Genome Evolution. Computational Biology, vol. 19, pp. 47–62. Springer, London (2013). doi:10.1007/978-1-4471-5298-9_4
Chapter Google Scholar
Wu, Y.C., Rasmussen, M.D., Bansal, M.S., Kellis, M.: TreeFix: statistically informed gene tree error correction using species trees. Syst. Biol. 62(1), 110–120 (2013)
Article Google Scholar
Lafond, M., Chauve, C., Dondi, R., El-Mabrouk, N.: Polytomy refinement for the correction of dubious duplications in gene trees. Bioinformatics 30(17), i519–i526 (2014)
Article Google Scholar
Bansal, M.S., Wu, Y.C., Alm, E.J., Kellis, M.: Improved gene tree error correction in the presence of horizontal gene transfer. Bioinformatics 31(8), 1211–1218 (2015)
Article Google Scholar
Tan, G., Muffato, M., Ledergerber, C., et al.: Current methods for automated filtering of multiple sequence alignments frequently worsen single-gene phylogenetic inference. Syst. Biol. 64(5), 778–791 (2015)
Article Google Scholar
Castresana, J.: Selection of conserved blocks from multiple alignments for their use in PhyloGenet. Anal. Mol. Biol. Evol. 17(4), 540–552 (2000)
Article Google Scholar
Capella-Gutiérrez, S., Silla-Martínez, J.M., Gabaldón, T.: trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15), 1972–1973 (2009)
Article Google Scholar
Shen, X.X., Hittinger, C.T., Rokas, A.: Studies can be driven by a handful of genes. Nature 1(April), 1–10 (2017)
Google Scholar
Krüger, D., Gargas, A.: New measures of topological stability in phylogenetic trees - taking taxon composition into account. Bioinformation 1(8), 327–330 (2006)
Article Google Scholar
Westover, K.M., Rusinko, J.P., Hoin, J., Neal, M.: Rogue taxa phenomenon: a biological companion to simulation analysis. Mol. Phylogenet. Evol. 69(1), 1–3 (2013)
Article Google Scholar
Pattengale, N.D., Swenson, K.M., Moret, B.M.E.: Uncovering hidden phylogenetic consensus. In: Borodovsky, M., Gogarten, J.P., Przytycka, T.M., Rajasekaran, S. (eds.) ISBRA 2010. LNCS, vol. 6053, pp. 128–139. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13078-6_16
Chapter Google Scholar
Aberer, A.J., Krompass, D., Stamatakis, A.: Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice. Syst. Biol. 62(1), 162–166 (2013)
Article Google Scholar
Goloboff, P.A., Szumik, C.A.: Identifying unstable taxa: efficient implementation of triplet-based measures of stability, and comparison with Phyutility and RogueNaRok. Mol. Phylogenet. Evol. 88, 93–104 (2015)
Article Google Scholar
Hosner, P.A., Braun, E.L., Kimball, R.T.: Land connectivity changes and global cooling shaped the colonization history and diversification of New World quail (Aves: Galliformes: Odontophoridae). J. Biogeogr. 42, 1883–1895 (2015)
Article Google Scholar
Streicher, J.W., Schulte, J.A., Wiens, J.J.: How should genes and taxa be sampled for phylogenomic analyses with missing data? An empirical study in iguanian lizards. Syst. Biol. 65(1), 128–145 (2016)
Article Google Scholar
Salichos, L., Rokas, A.: Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 497(7449), 327–331 (2013)
Article Google Scholar
Wickett, N.J., Mirarab, S., Nguyen, N., et al.: Phylotranscriptomic analysis of the origin and early diversification of land plants. PNAS 111(45), 4859–4868 (2014)
Article Google Scholar
Bergsten, J.: A review of long-branch attraction. Cladistics 21(2), 163–193 (2005)
Article Google Scholar
Hampl, V., Hug, L., Leigh, J.W., et al.: Phylogenomic analyses support the monophyly of Excavata and resolve relationships among eukaryotic “supergroups”. PNAS 106(10), 3859–3864 (2009)
Article Google Scholar
Song, S., Liu, L., Edwards, S.V., Wu, S.: Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model. PNAS 109(37), 14942–14947 (2012)
Article Google Scholar
Silverman, B.: Density estimation for statistics and data analysis. In: Monographs on Statistics and Applied Probability. Chapman & Hall (1986)
Google Scholar
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2016)
Google Scholar
Mirarab, S., Reaz, R., Bayzid, M.S., et al.: ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30(17), i541–i548 (2014)
Article Google Scholar
Misof, B., Liu, S., Meusemann, K., et al.: Phylogenomics resolves the timing and pattern of insect evolution. Science 346(6210), 763–767 (2014)
Article Google Scholar
Cannon, J.T., Vellutini, B.C., Smith, J., et al.: Xenacoelomorpha is the sister group to Nephrozoa. Nature 530(7588), 89–93 (2016)
Article Google Scholar
Rouse, G.W., Wilson, N.G., Carvajal, J.I., Vrijenhoek, R.C.: New deep-sea species of Xenoturbella and the position of Xenacoelomorpha. Nature 530(7588), 94–97 (2016)
Article Google Scholar
Philippe, H., Brinkmann, H., Copley, R.R., et al.: Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 470(7333), 255–258 (2011)
Article Google Scholar
Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015)
Article Google Scholar
Springer, M.S., Gatesy, J.: The gene tree delusion. Mol. Phylogenet. Evol. 94(Part A), 1–33 (2016)
Article Google Scholar
Sukumaran, J., Holder, M.T.: DendroPy: a Python library for phylogenetic computing. Bioinformatics 26(12), 1569–1571 (2010)
Article Google Scholar
Bogdanowicz, D., Giaro, K.: Matching split distance for unrooted binary phylogenetic trees. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(1), 150–160 (2012)
Article Google Scholar
Bogdanowicz, D., Giaro, K., Wróbel, B.: TreeCmp: comparison of trees in polynomial time. Evol. Bioinform. 2012(8), 475–487 (2012)
Google Scholar
DeSantis, T.Z., Hugenholtz, P., Larsen, N., et al.: Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72(7), 5069–5072 (2006)
Article Google Scholar

Download references

Acknowledgments

This work was supported by the NSF grant IIS-1565862 to SM and UM. Computations were performed on the San Diego Supercomputer Center (SDSC) through XSEDE allocations, which is supported by the NSF grant ACI-1053575.

Author information

Authors and Affiliations

Computer Science and Engineering, University of California at San Diego, San Diego, USA
Uyen Mai
Electrical and Computer Engineering, University of California at San Diego, San Diego, USA
Siavash Mirarab

Authors

Uyen Mai
View author publications
You can also search for this author in PubMed Google Scholar
Siavash Mirarab
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siavash Mirarab .

Editor information

Editors and Affiliations

University of Campinas, Campinas, São Paulo, Brazil
Joao Meidanis
Rice University, Houston, Texas, USA
Luay Nakhleh

Appendices

A Proofs

1.1 A.1 Proof of Proposition 1

We start with a Lemma:

Lemma 1

If (a, b) is a diameter pair of t, then for any $c,d \in L(t)-\{a\}$,

$$ max({\delta }({c},{b}), {\delta }({d},{b})) \ge {\delta }({c},{d}). $$

Proof

Consider the quartet formed by the 4 leaves a, b, c, d in t.

Case 1: (Fig. 6a) a and b are on the same side of the quartet:

$${\delta }({a},{b}) \ge {\delta }({a},{d}) \implies {\delta }({m},{b}) \ge {\delta }({m},{d}) \implies {\delta }({c},{m}) + {\delta }({m},{b}) \ge {\delta }({c},{m}) + {\delta }({m},{d})$$

$$ \implies {\delta }({c},{b}) \ge {\delta }({c},{d}) $$

Case 2: (Fig. 6b) Without loss of generality, we assume ${\delta }({n},{c}) \ge {\delta }({n},{d})$. We will prove that ${\delta }({b},{c}) \ge {\delta }({c},{d})$.

We have:

$$ {\delta }({a},{b}) \ge {\delta }({a},{c}) \implies {\delta }({b},{m}) \ge {\delta }({c},{m})$$

$$\implies {\delta }({b},{m}) + {\delta }({n},{c}) \ge {\delta }({c},{m}) + {\delta }({n},{c}) $$

$$\implies {\delta }({b},{m}) + {\delta }({n},{c}) \ge 2{\delta }({n},{c}) $$

$$\implies {\delta }({b},{c}) = {\delta }({b},{m}) + {\delta }({n},{c}) + {\delta }({m},{n}) \ge {\delta }({n},{c}) + {\delta }({n},{d}) = {\delta }({c},{d}) $$

$\square $

We now provide the proof of Proposition 1.

Proof

Consider an arbitrary leaf $b \in \mathcal {D}(t)-\{a\}$. We prove that $b \in \mathcal {D}(t\backslash _{a})$.

Case 1: $(a,b) \notin \mathcal {P}(t)$. Because $b \in \mathcal {D}(t)$, there exists $c \in \mathcal {D}(t)-\{a\}$ such that (c, b) is a diameter pair of $t\backslash _{a}$. Therefore, $b \in \mathcal {D}(t\backslash _{a})$.

Case 2: $(a,b) \in \mathcal {P}(t)$. Let (c, d) be a diameter-pair of $t\backslash _{a}$. According to Lemma 1, $max({\delta }({c},{b}), {\delta }({d},{b})) \ge {\delta }({c},{d})$. Therefore, either (c, b) or (d, b) is a diameter-pair of $t\backslash _{a}$. Thus, $b \in \mathcal {D}(t\backslash _{a})$. $\square $

1.2 A.2 Proof of Theorem 1

Proof

We need to prove that a $\mathcal {R}_k(t)$ is either a reasonable removing set or it is not an optimal removing set. We proceed by contradiction. Assume $\mathcal {R}_k(t)$ is optimal but not a reasonable removing set. Let $\mathcal {R}_m(t)$ be the largest reasonable removing set that is a subset of $\mathcal {R}_k(t)$ (note $0\le m\le k$). If $m = k$, then $\mathcal {R}_k(t)$ is a reasonable set, contradicting the assumption. For $m < k$, consider the tree and let $a_m,b_m$ be its diameter pair. if $a_m \in \mathcal {R}_k(t)$ or $b_m \in \mathcal {R}_k(t)$, adding them to $\mathcal {R}_m(t)$ would generate a reasonable chain of size $m+1$, contradicting our assumption. If $a_m \notin \mathcal {R}_k(t)$ and $b_m \notin \mathcal {R}_k(t)$, all removals after m in $\mathcal {R}_k(t)$ fail to reduce the diameter, but removing either $a_m$ or $b_m$ would reduce the diameter. Thus, $\mathcal {R}_k(t)$ cannot be optimal, contradicting our assumption. $\square $

1.3 A.3 Proof of Theorem 2

Proof

To remove k leaves from a singly paired tree t that has (a, b) as a diameter pair, at least one of a or b has to be removed (or else the diameter never decreases). Thus, three types of reasonable chains exist: those that contain only a, those that contain only b, and those that contain both a and b. Note that after removing a, by Proposition 1, removing b is a reasonable removal (and vice versa), and thus, removing both a and b is always reasonable (in either order).

Case 1: $a\in \mathcal {R}_{k-2}, b\notin \mathcal {R}_{k-2}$: If a reasonable chain has a but not b, by Proposition 1, b is in the diameter set at each step of the chain. Since b by definition is never removed and recalling that the tree is singly paired, at each step, there is only one reasonable removal (whatever taxon is on-diameter in addition to b). Therefore, only one reasonable chain does not include b.
Case 2: $a\notin \mathcal {R}_{k-2}, b\in \mathcal {R}_{k-2}$: Similar to Case 1, one such chain exists.
Case 3: $a\in \mathcal {R}_{k-2}, b\in \mathcal {R}_{k-2}$: In this case, the reasonable removing chain must start with a, b or b, a. In either ordering, we are left with the same induced tree, and need to remove $k-2$ more leaves. Therefore, the set of all reasonable removing sets in this case is: .

Combining the three cases together, we have:

Let $s_{k} = |\mathcal {S}_{k}(t)|$. We have the following recursion:

$$\begin{aligned} s_{k} = {\left\{ \begin{array}{ll} 1, &{} k = 0 \\ 2, &{} k = 1 \\ s_{k-2} + 2 &{} k \ge 2 \end{array}\right. } \end{aligned}$$

(1)

Thus, $s_{k} = k+1$ $\square $

1.4 A.4 Proof of Proposition 2

Proof

Recall that the record of each internal node keeps track of the most distant leaves below two children of the node. When we remove a, only those nodes on the path from a to the root can have a change in the their record. The first traversal of Algorithm 1 updates the records for those nodes, using simple recursive functions that can be computed in O(1) per node.

According to Proposition 1, $b \in \mathcal {D}(t\backslash _a)$. Therefore, one of the longest paths in $t\backslash _a$ must include b; let c be the other taxon. The record of the LCA of c and b, after the update in the first round, will have the value of this longest value. Thus, by checking the updated record for all nodes in the path from b to the root we will find the maximum value. Moreover, when updating the records in the first traversal from a to the root, we have already checked all the nodes from the LCA(a, b) to the root. In the second traversal, we check the nodes from b to LCA(a, b), completing the search. Each of the two traversals of Algorithm 1 visits at most h nodes and only need constant time operations in each visit. Therefore, the overall time complexity of Algorithm 1 is O(h). $\square $

1.5 A.5 Proof of Theorem 3

First, we prove the following lemmas:

Lemma 2

All the longest paths in any tree have the same midpoint.

Proof

If t has only one diameter pair, then Lemma 2 is trivially correct.

If t has more than one diameter pair, let (a, b) and (c, d) be two distinct diameter pairs of t and let m be the midpoint of the path between a and b. We prove that m is also the midpoint of the path between c and d, that is m lies on that path between c and d and ${\delta }({m},{c}) = {\delta }({m},{d})$. w.l.o.g, we suppose ${\delta }({m},{c}) \ge {\delta }({m},{d})$.

We prove that the path between c and d must pass m; that is, c and d belong to two different subsets in the partition defined by m on $\mathcal {L}$ (we call elements of the partition a “side”). We prove by contradiction, assuming c and d belong to the same side of m. Then ${\delta }({m},{c}) + {\delta }({m},{d}) > {\delta }({c},{d})$. Also, either a or b must be on a different side from c and d to m (by definition, a and b cannot be on the same side to m). Suppose a is in a different side from c and d to m. Then: ${\delta }({a},{b}) \ge {\delta }({a},{c}) \implies {\delta }({a},{m}) + {\delta }({m},{b}) \ge {\delta }({a},{m}) + {\delta }({m},{c}) \implies {\delta }({m},{b}) \ge {\delta }({m},{c}) \implies {\delta }({m},{a}) \ge {\delta }({m},{c})$. So we have, ${\delta }({a},{c}) = {\delta }({m},{a}) + {\delta }({m},{c}) \ge 2{\delta }({m},{c}) \ge {\delta }({m},{c}) + {\delta }({m},{d}) > {\delta }({c},{d})$; this leads to a contradiction because (c, d) is a diameter pair.
Prove that $mc = md$. Suppose ${\delta }({m},{c}) > {\delta }({m},{d})$.

We have: $2{\delta }({m},{a}) = 2{\delta }({m},{b}) = {\delta }({a},{b}) = {\delta }({c},{d}) \le {\delta }({m},{c}) + {\delta }({m},{d}) < 2{\delta }({m},{c})$. Therefore: ${\delta }({m},{a}) < {\delta }({m},{c})$ and ${\delta }({m},{b}) < {\delta }({m},{c})$.

Case 1: c belongs to a different side of a to m. Then, ${\delta }({m},{a}) + {\delta }({m},{c}) = {\delta }({a},{c}) \implies {\delta }({m},{a}) + {\delta }({m},{b})< {\delta }({a},{c}) \implies {\delta }({a},{b}) < {\delta }({a},{c})$. This is a contradiction because (a, b) is a diameter pair of t.

Case 2: c belongs to the same side of a to m. Then c belongs to a different side of b to m. Similar to case 1, in this case we can prove that ${\delta }({a},{b}) < {\delta }({b},{c})$ which also leads to a contradiction.

Thus, m is the midpoint of the path between c and d. $\square $

This lemma allows us to define some new concepts that are useful in the rest of the proof.

New Definitions: The single midpoint of any tree t partitions the diameter set into disjoint subsets; we call each of those subsets a diameter group of t (if the midpoint is in the middle of the branch, we have two diameter groups; a midpoint coinciding on an internal node would give three or more groups). We call any restriction of t with k leaves removed a k-optimal restricted tree if no other restriction removing k leaves has a lower diameter. We call a tree t k-shrinkable if there exists a k-removing set that strictly reduces its diameter. We call any induced tree on t that has a smaller diameter than t a shrunk tree of t. Note that unless all but one of the diameter groups of a tree t are removed, the tree cannot shrink in diameter. When all but one of the diameter groups of a tree t is removed, we refer to the resulting tree as a minimum shrunk tree of t.

It is easy to see the following lemma.

Lemma 3

For all a and b, $(a,b)\in \mathcal {P}(t)$ if and only if a and b belong to two distinct diameter groups. $\square $

Now we prove a less obvious Lemma.

Lemma 4

If tree t is k-shrinkable, any k-optimal restricted tree $t^*$ can be induced from one of the minimum shrunk trees of t.

Proof

Because t is k-shrinkable, the diameter of $t^*$ must be strictly smaller than the diameter of t. Suppose $t^*$ is not an induced tree of any minimum shrink tree of t; then, t* has at least two leaves from two different diameter groups of t. Based on Lemma 3, $t^*$ shares with t at least one diameter pair and therefore, has the same diameter as t, which is a contradiction. $\square $

We now turn to the proof of Theorem 3. Recall:

Theorem 3. For any k, any arbitrary pair-restricted k-removing space includes at least one optimal k-removing set.

Proof

If t is not k-shrinkable, any k-removing set is optimal and the result trivially follows. We now focus on a case where t is k-shrinkable.

Suppose t has m diameter groups:

$$D^1=\{d_{1}^{1},d_{2}^{1},\ldots ,d_{p_1}^{1}\},D^2= \{d_{1}^{2},d_{2}^{2},\ldots ,d_{p_2}^{2}\},\ldots ,D^m=\{d_{1}^{m},d_{2}^{m},\ldots ,d_{p_m}^{m}\}.$$

For $i=1\ldots m$, let $k^i = |\bigcup _{j\ne i} D_j|$ be the size of all groups except group i, and let $t^i$ denote the minimum shrunk tree of t that excludes all groups $D^j, j\ne i$. Let $k^{p} = \max _i(k^i)$. For the tree t to be k-shrinkable, we need that $k^i \le k$; thus, $k \ge k^p$.

To produce any minimum shrunk tree $t^i$ with $k^i \le k^p$, we can start from any removal (a, b) such that $a\in D^x$ and $b\in D^y$ (for $x\ne y$), and continue to produce $t^i$. To see this, note that if $x \ne y \ne i$, any chain that starts with either a or b and continues to select from any groups other than $D^i$ will produce the minimum shrunk tree $t^i$ after $k^i$ removals. Now, w.l.o.g, consider $x= i$ and $y \ne i$. Then, consider the chain that starts by removing y and continues by removals from any group other than $D^i$. This chain will also produce $t^i$ after $k^i$ removals. In other words, each pair-restricted k-removing space of t can produce all the minimum shrunk trees $t^i$ that have $k^i \le k$.

Based on Lemma 4, when t is k-shrinkable, at least one of the minimum shrunk trees (say $t^*_i$) can induce any k-optimal restricted tree $t^*$. We also just proved that any pair-restricted space can produce all minimum shrunk trees. Therefore, any arbitrary pair-restricted removing space will include a chain that induces $t^*_i$ from t and another chain that produces $t^*$ starting from $t^*_i$. Thus, the union of the removing sets corresponding to these two chains will produce $t^*$ and will be part of any arbitrary pair-restricted k-removing space.

B Supplementary Figures

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mai, U., Mirarab, S. (2017). TreeShrink: Efficient Detection of Outlier Tree Leaves. In: Meidanis, J., Nakhleh, L. (eds) Comparative Genomics. RECOMB-CG 2017. Lecture Notes in Computer Science(), vol 10562. Springer, Cham. https://doi.org/10.1007/978-3-319-67979-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-67979-2_7
Published: 15 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67978-5
Online ISBN: 978-3-319-67979-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TreeShrink: Efficient Detection of Outlier Tree Leaves

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees

Treemmer: a tool to reduce large phylogenetic datasets with minimal loss of diversity

Scalable and Accurate Phylogenetic Placement Using pplacer-XR

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Proofs

1.1 A.1 Proof of Proposition 1

Lemma 1

Proof

Proof

1.2 A.2 Proof of Theorem 1

Proof

1.3 A.3 Proof of Theorem 2

Proof

1.4 A.4 Proof of Proposition 2

Proof

1.5 A.5 Proof of Theorem 3

Lemma 2

Proof

Lemma 3

Lemma 4

Proof

Proof

B Supplementary Figures

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation