Loading [MathJax]/jax/element/mml/optable/MathOperators.js

C-algebras associated with asymptotic equivalence relations defined by hyperbolic toral automorphisms

  • We study the C-algebras of the étale groupoids defined by the asymptotic equivalence relations for hyperbolic automorphisms on the two-dimensional torus. The algebras are proved to be isomorphic to four-dimensional non-commutative tori by an explicit numerical computation. The ranges of the unique tracial states of its K0-groups of the C-algebras are described in terms of the hyperbolic matrices of the automorphisms on the torus.

    Citation: Kengo Matsumoto. C-algebras associated with asymptotic equivalence relations defined by hyperbolic toral automorphisms[J]. Electronic Research Archive, 2021, 29(4): 2645-2656. doi: 10.3934/era.2021006

    Related Papers:

    [1] Zhenhua Su, Zikai Tang . Minimum distance–unbalancedness of the merged graph of C3 and a tree. AIMS Mathematics, 2024, 9(7): 16863-16875. doi: 10.3934/math.2024818
    [2] Muhammad Kamran Jamil, Muhammad Imran, Aisha Javed, Roslan Hasni . On the first general Zagreb eccentricity index. AIMS Mathematics, 2021, 6(1): 532-542. doi: 10.3934/math.2021032
    [3] Vu Dinh, Lam Si Tung Ho . Convergence of maximum likelihood supertree reconstruction. AIMS Mathematics, 2021, 6(8): 8854-8867. doi: 10.3934/math.2021513
    [4] Zhen Lin . The biharmonic index of connected graphs. AIMS Mathematics, 2022, 7(4): 6050-6065. doi: 10.3934/math.2022337
    [5] Xiao Xu, Hong Liao, Xu Yang . An automatic density peaks clustering based on a density-distance clustering index. AIMS Mathematics, 2023, 8(12): 28926-28950. doi: 10.3934/math.20231482
    [6] Junfeng An, Yingzhi Tian . Graphs with a given conditional diameter that maximize the Wiener index. AIMS Mathematics, 2024, 9(6): 15928-15936. doi: 10.3934/math.2024770
    [7] Yang Liu, Ruihu Li, Qiang Fu, Hao Song . On the minimum distances of binary optimal LCD codes with dimension 5. AIMS Mathematics, 2024, 9(7): 19137-19153. doi: 10.3934/math.2024933
    [8] Rehab Alharbi, Hibba Arshad, Imran Javaid, Ali. N. A. Koam, Azeem Haider . Distance-based granular computing in networks modeled by intersection graphs. AIMS Mathematics, 2025, 10(5): 10528-10553. doi: 10.3934/math.2025479
    [9] Milica Anđelić, Saleem Khan, S. Pirzada . On graphs with a few distinct reciprocal distance Laplacian eigenvalues. AIMS Mathematics, 2023, 8(12): 29008-29016. doi: 10.3934/math.20231485
    [10] Syafrizal Sy, Rinovia Simanjuntak, Tamaro Nadeak, Kiki Ariyanti Sugeng, Tulus Tulus . Distance antimagic labeling of circulant graphs. AIMS Mathematics, 2024, 9(8): 21177-21188. doi: 10.3934/math.20241028
  • We study the C-algebras of the étale groupoids defined by the asymptotic equivalence relations for hyperbolic automorphisms on the two-dimensional torus. The algebras are proved to be isomorphic to four-dimensional non-commutative tori by an explicit numerical computation. The ranges of the unique tracial states of its K0-groups of the C-algebras are described in terms of the hyperbolic matrices of the automorphisms on the torus.



    Topological data analysis (TDA) is a scientific field lying at the crossroads of applied topology and data analysis: objects like functions and point clouds are typically studied by means of possibly multidimensional complexes of homology groups [29] obtained with different pipelines. Homology groups considered with coefficients in a field K are vector spaces, whose dimension is determined by different kinds of "holes" which can be found in a space and, thus, provide a rich characterization of the shape of the space the analyst is considering. More precisely, each statistical unit induces a whole family of topological spaces indexed on Rn, which then, via homology, produces a family of vector spaces indexed on the same set more precisely, a functor [35] (Rn,)VectK. Such collections of vector spaces are called persistence modules [14], with multidimensional persistence modules being a special reference to the cases in which n>1, and describe the shape of the data by considering interpretable topological information at different resolutions. A topological signature/summary or an invariant of a persistence module is a representation which usually maps persistence modules into a metric space or even a vector space so that some kind of analysis can be carried out. The most used topological summaries for 1-D persistence (i.e., n=1) include persistence diagrams [21], persistence landscapes [12], persistence images [2], and persistence silhouettes [15]. When dealing with 0-dimensional homology and 1-D persistence, if the space X is path connected, another topological summary called merge tree [9] can be employed. A merge tree is a tree-shaped topological summary which captures the evolution of the path connected components π0(Xt) of a filtration of topological spaces {Xt}tR, with XtXt if tt.

    One of the central ideas in TDA is the one of stability properties, i.e., the properties related to the continuity of the operator mapping data into topological signatures. One definition which has gained a lot of success is the one of ε-interleavings between topological representations: when adding some kind of ε-noise to the data, the associated summary "moves" by at most ε. If this happens, then the operator mapping functions to topological representations is a non-expansive Lipschitz operator. Starting from the bottleneck distance for persistence diagrams [22], this idea has been applied to 1-D persistence modules [14], multidimensional persistence modules [34], merge trees [9], sheaves [17,38], and more general situations [10,19]. In this paper, we focus on the problem of studying interleavings for merge trees.

    The main reason to pursue such research direction is that merge trees and persistence diagrams are not equivalent as topological summaries, that is, they do not represent the same information with merge trees being able to distinguish between scenarios which persistence diagrams and all the other equivalent summaries cannot [16,23,33,47]. As a consequence, in recent years, a lot of research sparked on merge trees, mainly driven by the need of having a stable metric structure to compare such objects. Stable and unstable edit distances between Reeb graphs or merge trees have been proposed by [7,20,41,48], while other works focus on (unstable) Wasserstein distances [43], lp distances [13] and interleaving distances between merge trees [3,9,11,25,28] or Reeb graphs [8,17,18,38].

    The problem of computing the interleaving distance between merge trees has already been approached by [3] and [25] in an attempt to approximate the Gromov-Hausdorff (GH) distance when both metric spaces are metric trees. In this situation, in fact, up to carefully choosing the root of the metric trees, the two metrics are equivalent. Both problems approximating the GH distance and computing the interleaving distance have been shown to be NP-hard [3,11], and, thus, even obtaining feasible approximation algorithms for small trees is a daunting task.

    In [3], the authors provide a polynomial time algorithm to approximate the interleaving distance between merge trees via binary search. They build a set which contains the solution of the optimization problem and then obtain a criterion to assess if certain values of the aforementioned set can be excluded from the set of solutions. If the length of the branches of both trees is big enough, one can carry out the binary search over all possible couples of vertices, with one vertex from the first tree and one from the second. If trees have branches that are too small, the decision procedure is coupled with a trimming step to deal with these smaller edges. The algorithm returns an approximation of the interleaving distance, with the approximation factor depending on the ratio between the longest and the shortest edge in the tree and the number of vertices.

    The authors of [25], instead, propose the first algorithm for the exact computation of the interleaving distance. Starting from a novel definition of the interleaving distance, they develop a faster alternative to exclude values from the same set of solutions considered by [3]. By filtering a finite subset of points on the metric trees (in general, bigger than the set of vertices of the "combinatorial" trees) according to a) their heights and b) a candidate optimal value δ, in an up-bottom fashion, points of the second tree are matched to subsets of points of the first tree. Such matching means that all points of the first tree inside in the chosen collection can be potentially mapped in the point of the second tree via some function viable for the interleaving distance. Each such matching (S,w) is then used to establish similar couples (set, point) between the children of the points involved: the set of points of the first tree is to be chosen among the compatible subsets of the children of S, and the point in the second tree among the children of w. If no such couples are found, δ is discarded. The algorithm they propose to compute the interleaving distance has complexity O(n2log3(n)22τττ+2), where n is the sum of the vertices in the two merge trees, and τ is a parameter depending on the input trees, which is defined by the authors. The key issue is that τ can be very big also for small-sized trees: In [25], in Figure 2, the authors showcase a tree with 8 leaves and τ=13 (thus, the complexity of the algorithm is at least O(n2log3(n)2261315)O(n2log3(n)1024).

    Due to its computational complexity, the algorithm by [25] has not been implemented by other authors working with merge trees [16], which instead have exploited another formulation of the same metric, provided by [28]. This last formulation relies on a definition of the interleaving distance between merge trees which are endowed with a fixed set of labels on their vertices. The usual interleaving distance is then shown to be equivalent to choosing an appropriate set of labels potentially upon adding some vertices to the trees for the two given merge trees. This formulation, per se, does not provide computational advantages over the classical one, since evaluating all the possible labelings of a tree would still be unfeasible. However, [16] proposes a labeling strategy which should provide "good" labels. Despite being computationally accessible even for very big trees, this approach has the downside of not providing methods for assessing the error of the produced estimate.

    In this work, we pursue two main objectives. First, we aim to establish the theoretical foundations necessary to derive reliable polynomial upper and lower bounds for the interleaving distance. Second, we seek to develop practical techniques that complement those proposed in [3] and [25]. As demonstrated in Appendix B, our methods offer computational procedures that are more efficient than those in [25] and more reliable than those in [3].

    To do so, we take a perspective which differs from all the aforementioned works: instead of obtaining progressively sharper bounds for the distance by locally looking at differences between trees or instead of looking for optimal labelings, we aim at describing jointly a global matching between two merge trees whose cost returns the exact interleaving distance between them. In this way, we obtain a novel combinatoric formulation of the interleaving distance between merge trees, which we also conjecture to be equivalent to yet another formulation, supported by the already established link between the interleaving and the GH distances between trees [25].

    Expressing distances between trees by matching their vertices is a very common approach for edit distances. Notably, there are very strong relationships between the couplings we use in this paper and the mappings which appear ubiquitously in edit distance literature. This allows us to leverage ideas and techniques developed in such research areas, advancing the theoretic and computational tools at our disposal. In particular, as summarized by Figure 14, in Appendix A, we specifically adopt two key ideas from edit distance frameworks:

    1) Defining constrained variations of computationally expensive metrics, which maintain strong practical performance while improving computational efficiency. Most works on edit distances for merge trees [48,49,52,53] follow this exact path: they define "unconstrained" distances whose computational cost is NP-Hard [30], and then exploit the polynomial algorithms in [32,46,54] to obtain some "constrained" upper bounds, which can also be regarded as a metric on their own;

    2) Developing recursive methods for computing both constrained and unconstrained formulations of distances, thereby reducing the computational burden and enabling recursive mixed/integer linear programming approaches, as in [31,41,52].

    We exploit our novel formulation precisely to produce 1) a constrained version of the interleaving distance and 2) a recursive procedure to obtain upper and lower bounds for such distance via mixed linear programming. As previously stated, the bounds we derive in this way complement the procedures in [3] and [25], thereby extending the applicability of the interleaving distance, as discussed in Appendix B. It is important to note, however, that this comparison is rather technical and considers some particular scenarios, due to the profound different natures of the considered algorithms.

    Given that our recursive procedure returns bounds whose computational costs are very close to the ones of [48,49,52,53], we believe that studying the combination of the recursive procedure on the constrained interleaving distance may similarly lead to polynomial-time bounds for the interleaving distance mirroring what has been achieved for the edit distances in the cited works. See Proposition 6 and Section 9.2.9.

    In the last part of the paper, we test those bounds by comparing them with the method proposed by [16] to estimate the interleaving distance and by tackling a number of simulated and real-data case studies, showcasing also how to empirically assess the validity of our estimates.

    The paper is organized as follows: In Section 2, we introduce merge trees first in a combinatorial fashion and then as "continuous" posets and metric spaces, recalling the definition of interleaving distance. In Section 3, we introduce a partial matching between trees called coupling. Relationships between these matchings and maps between trees are then presented in Section 4. Section 5 and Section 6 prove the equivalence between the usual definition of interleaving distance and the problem of finding optimal couplings.

    In the second part of the manuscript, we leverage our novel formulation to introduce a constrained variant of the interleaving distance in Section 7, followed by the development of decomposition procedures for the interleaving distance in Section 8. We then present estimation methods for the interleaving distance in Section 9, and evaluate their performance through experiments described in Section 10. Finally, Section 11 concludes the main body of the paper with a brief discussion of our findings.

    Appendix A contains two flowcharts, which can help the reader navigating the structure of the manuscript and Table 3, which collects most of the main pieces of notation introduced in the manuscript, and can help the reader navigating the most technical sections. Appendix B features a comparison of our algorithms with the ones of [3] and [25]. Appendix C presents a further simulation dealing with the computational runtimes of the current implementation of our methods. Appendix D contains some of the proofs of the results in the manuscript, while the other ones are reported right after the statement they relate to, as they can help the reader in following the discussion.

    We start by introducing merge trees as combinatorial objects along with their "continuous" counterpart.

    In accordance with other authors, we call a merge tree a rooted tree with a suitable height function defined on its vertices. We now state the formal definition, introducing some pieces of notation which we will use to work with those objects throughout the paper.

    Definition 1. A tree structure T is given by a set of vertices VT and a set of edges ETVT×VT, which form a connected rooted acyclic graph. We indicate the root of the tree with rT. We say that T is finite if VT is finite. The degree of a vertex vVT is the number of edges which have that vertex as one of the extremes. Any vertex with an edge connecting it to the root is its child, and the root is its father: this is the first step of a recursion which defines father and children relationships for all vertices in VT. Vertices with no children are called leaves or taxa and are collected in the set LT. The relationship child<father generates a partial order on VT. The edges in ET are represented by ordered couples (a,b) with a<b. A subtree of a vertex v, called subT(v), is the tree structure whose set of vertices is {xVTxv}.

    In Definition 1, we introduce the term tree structure to differentiate such objects from the generic word "tree", which we will use to indicate either a tree structure, a merge tree, or a metric tree, when the context allows to lighten the notation.

    Given finite sets A,B, we indicate with #A the cardinality of one set and with AB:={aAaB}. Moreover, given a finite poset P, we indicate with maxP the set of all its maxima, and minP the set of all its minima. Given AVT and vVT, we may also write maxv<vA to indicate the maximal elements in the set {vAv<v} according to the poset structure inherited from VT.

    Identifying an edge (v,v) with its lower vertex v gives a bijection between VT{rT} and ET, that is, ETVT{rT} as sets. Given this bijection, we may use ET to indicate the vertices vVT{rT} to simplify the notation.

    Now, we give the notion of least common ancestor between vertices.

    Definition 2. Given a set of vertices A={a1,,an}VT, we define LCA(a1,,an)=minni=1{vVTvai}.

    Now, to obtain a merge tree, we add an increasing height function to a tree structure.

    Definition 3. A merge tree (T,f) is a finite tree structure T such that the root is of degree >1, coupled with a monotone increasing function f:VTR.

    Remark 1. Definition 3 is slightly different from the definition of merge trees found in [28], [40], and other works. In particular, in the present work, we do not need to have a root at infinity and thus, for us, the root is the highest vertex with finite height. Similarly, the function coupled with the tree structure in literature is usually referred to as hT being a "height" function. To avoid ovearloading the notation, since we need to introduce many subscripts and superscripts, we call these functions with more usual functional notations like f or g.

    Assumption 1. To avoid formal complications, we make the following standard genericity assumption for any merge tree (T,f):

    (G) f:VTR is injective.

    Example 1. Given a real valued Morse [37] function f:XR defined on a path connected compact space X, the sublevel set filtration is given by Xt=f1((,t]), together with the maps Xt<t=i:f1((,t])f1((,t]). We can describe the family of sets and maps {π0(Xt)}tR via a merge tree. See, for instance, [16,40]. Similarly, we can consider a finite set CRn and take its Céch filtration: Xt=cCBt(c), with Bt(c)={xRn∣∥cx∥<t}. As before: Xt<t=i:cCBt(c)cCBt(c) and {π0(Xt)}tR can be represented via a merge tree.

    Before proceeding, we need one last graph-related definition which we use to denote some particular kinds of paths on the tree structure of a merge tree.

    Definition 4. Given a merge tree T, a sequence of edges is an ordered sequence of adjacent edges {e1,,en}. This means that we have e1<<en, according to the order induced by the bijection ETVT{rT}, and that ei and ei+1 share a vertex. We will use the notation [v,v] to indicate a sequence of edges which starts in the vertex v and ends before v, with v<v. Thus, the edge associated to v (via ETVT{rT}) is included in the sequence, while the one associated to v is the first one excluded.

    In the left column of Figure 1, the reader can find an example of a merge tree, which can be used to get familiar with the definitions just given. For instance, one can check that, using the notation of the figure, e=LCA(a,b,d) and [a,c,r] is an ordered sequence of edges, while [c,r,d] is not.

    Figure 1.  A merge tree (left) with its associated metric merge tree (right).

    Now we consider the continuous version of a merge tree, intuitively obtained by considering all the points in the edges of a merge tree as points of the tree itself. For a visual intuition of the following ideas, the reader may refer to Figure 1.

    Definition 5. Given a merge tree T, we obtain the associated metric merge tree as follows:

    T=([f(rT),+)((x,x)ET[f(x),f(x)]))/,

    where, for every vVT, f(v)f(v) if v=v. We refer to the points in T with the same notation we use for vertices in vT: xT; with f(x), we indicate the height values of x.

    For every point xVT, we can identify x with the point f(x)[f(x),f(x)], with (x,x)ET, or, equivalently, with f(x)[f(x),f(x)], if (x,x)ET. This induces a well-defined map VTT. Thus, given vVT, with an abuse of notation, we can consider vT. Note that the height function f:TR extends the function defined on the vertices of the merge tree. Every point in TVT belongs to one, and only one, interval of the form [f(x),f(x)]. Thus, we can induce a partial order relationship on T by ordering points first with the partial order in ET and then with the increasing internal order of [f(x),f(x)]. Thus, we can explicitly write down the shortest path metric in T: d(x,x)=f(LCA(x,x))f(x)+f(LCA(x,x))f(x).

    Remark 2. (Notation and previous works) If we start from a morse function f:DR defined on a compact and path connected topological space D, what we call "metric merge tree" is essentially the "merge tree" of f, according to the topological definition given in [9], also called classical merge tree in [16]. Moreover, display posets of persistent sets, as defined in [16], are up to some additional hypotheses equivalent to metric merge trees. However, 1) we are interested at working on merge trees in a combinatorial fashion, so we build merge trees as graphs and then bridge to this continuous construction; and 2) in order to use display posets, we would need to give other technical definitions, which we do not need in this work; thus, we avoid working with such objects. To sum up, all these procedures/definitions are different ways to build an R-space (X,fX:XR) with fX continuous function [18] with the underlying topological space X being the CW-complex associated to a tree. In particular, the display poset highlights the poset structure of such topological space while the approach in [9] describes a geometric construction, which gives the display poset a natural topology.

    For each metric tree T, we have a family of continuous maps, skT:TT, with k0, called structural maps, defined as follows: skT(x)=x with x being the only point in T such that xx and f(x)=f(x)+k.

    Now, we recall the main facts about the interleaving distance between merge trees, relying on [9].

    Definition 6. [9] Two continuous maps α:TG and β:GT between two metric merge trees, (T,f) and (G,g), are ε-compatible, with ε0, if the following hold:

    (I1) g(α(x))=f(x)+ε for all xT and f(β(y))=g(y)+ε for all yG;

    (I2) αβ=s2εG and βα=s2εT.

    The interleaving distance between T and G is then:

    dI(T,G)=inf{εthere are α,β,εcompatible maps}.

    For an example of continuous maps α and β satisfying (I1), see Figure 2. Such maps also satisfy (I2), as shown by Figure 3.

    Figure 2.  An example of maps α and β giving the ε-interleaving of two metric merge trees.
    Figure 3.  A representation of the images of the maps βα (left) and αβ (right), with α and β being the maps in Figure 2.

    The work of [25] shows that the existence of α and β is in fact equivalent to the existence of a single map α with some additional properties, which are stated in the next definition.

    Definition 7. [25] Given two metric merge trees (T,f) and (G,g), an ε-good map α:TG is a continuous map such that:

    (P1) g(α(x))=f(x)+ε for all xT;

    (P2) if α(x)<α(x), then s2εT(x)<s2εT(x);

    (P3) if yGα(T), then, given w=min{y>yyα(T)}, we have g(w)g(y)2ε.

    As anticipated, [25] proves that two merge trees are ε-interleaved if, and only if, there is an ε-good map between them.

    Now, we start our combinatorial investigation. Across the following sections, we need to introduce several pieces of notation, which are also summarized in Table 3 in the appendix.

    Given two merge trees T and G, we would like to match their vertices and compute the interleaving distance relying only on such matches. To match the two graphs, we will use a set CVT×VG, which will tell us which vertex is coupled with which. For such C, we call πT:CVT the restriction of the projection of the cartesian product. Since VT and VG are posets, we can introduce a partial order relationship on any CVT×VG, having (a,b)<(c,d), if and only if, a<c and b<d. Thus, as for each of the sets VT, VG, and any subset of those sets, we can consider the set of the maximal elements and the set of the minimal elements of C (and its subsets), which we indicate with maxC,minC, etc.

    Clearly, a coupling C must satisfy some constraints, which we now introduce.

    We start by defining the "multivalued" function ΛTC:VTP(VT), with P(VT) being the power set of VT.

    ΛTC(v)={maxv<vπT(C),if there exist vVT such that v<v and vπT(C);,otherwise.

    Definition 8. A coupling between two merge trees (T,f) and (G,g) is a set CVT×VG such that:

    (C1) #maxC=1;

    (C2) the projection πT:CVT is injective (the same for G);

    (C3) given (a,b) and (c,d) in C, then a<c if, and only if, b<d;

    (C4) aπT(C) implies #ΛTC(a)1 (the same for G).

    The set of couplings between T and G is called C(T,G). Note that, thanks to (C3), (C1) can be replaced by #maxπT(C)=#maxπG(C)=1.

    We invite the reader to follow the remaining of the section looking at Figure 4, which can help in understanding the comments we present on properties (C1)–(C4) and visualizing the pieces of notation we need to establish.

    Figure 4.  Two examples of couplings, displaying all possible cases of unused vertices and deletions.

    We collect the set of points vET such that #ΛTC(v)=1 in a set, which we call UTC. Instead, the points such that vπT(C) and #ΛTC(v)1 are collected in a set DTC. Note that the sets πT(C),UTC and DTC are a partition of VT.

    A couple (v,w)C means that we match a vertex vVT with a vertex wVG. It is clear from the definition that there can be vertices left out from this matching. We regard these vertices as unnecessary for the coupling C: when we will induce maps between metric merge trees starting from couplings, the position of these vertices inside the metric merge tree G will be completely induced by other vertices in VT. Among the vertices not appearing in the couples of C, we distinguish between two situations: having a vertex v in DTC, informally, means that the edge (v,father(v)) except in certain special cases is not contained in the image of β(α(T)), with α and β as in Definition 6. For this reason, we say that the vertices in DTC and DGC are deleted by the coupling C. Instead, the vertices vUTC are vertices which are unused, ignored by the coupling, and will be of no importance in the computation of the distance.

    In this context, property (C1) is asking that there is a unique maximal couple in C, while (C2) is asking that each vertex is paired with at most one other vertex of the other merge tree. Note the maps α:TG in Definition 6 are not forced to be injective, but, as we will prove in later sections, for our purposes it is enough to couple points just one time. Condition (C3) is asking that the coupling respects the tree structures of T and G; in particular, this implies that (maxπT(C),maxπG(C))C. Lastly, due to condition (C4), we avoid coupling vertices which have only one coupled element below them.

    Given a coupling, we want to associate to a vertex a cost which indicates how much the coupling moves that vertex. In order to do so, we define the following functions:

    ● Define φTC:VTVT so that φTC(x)=min{vVTv>x and #Λ(v)0}. Note that since the set {vVTv>x} is totally ordered, φTC(x) is well-defined;

    ● Similarly, define δTC:VTVT, defined as δTC(x)=min{vVTvx and vπT(C)};

    ● Define χTC:VTVG, defined as χTC(x)=LCA({πG((v,w))vΛT(x)});

    ● Set γTC:VTDTCVG to be:

    γTC(x)={argmin{g(w)(v,w)C,v<x},if #{g(w)(v,w)C,v<x}>0,otherwise.

    Note that if #{g(w)(v,w)C,v<x}>0, by (G), γTC(x) is uniquely defined, and so γTC is not a multivalued function. Moreover, γTC(φTC(x)) is well-defined for any vVT.

    Remark 3. When clear from the context, to lighten the notation, we might omit the subscripts and superscripts. For instance, if we fix CC(T,G) and xVT, we refer to γTC(φTC(x)) as γ(φ(x)). Similarly, we use xU, xD, respectively, for xUTC and xDTC.

    Given xVT, in what follows we will encounter the following objects (see Figure 4),

    η(x):=γ(φ(x)) is the vertex in VG obtained in this way: starting from x, we go toward rT until we meet the first point of VT which has at least one vertex in its subtree which is not deleted; we consider the subtree of such a vertex and take all the vertices of VG which are coupled with elements in this subtree (such a set is not empty); lastly, we take the vertex with the lowest height among this set;

    ● Given x such that xD and #Λ(x)>1, we often consider δ(x); note that for every v such that xv<δ(x), then vD and #Λ(v)>1. Thus, [x,δ(x)] contains only deleted vertices;

    ● Similarly, for x such that xD and #Λ(x)>1, we take χ(x): that is, the lowest point in G, where x can be sent compatibly with C and the tree structures of T and G. Thus, if (x,y)C, yχ(x).

    Remark 4. As anticipated in the introduction, the definition of couplings has strong connections with the definition of mappings given in [41] and which is used to compute the edit distance therein defined. Such definition generalizes more classical definitions of mappings, which are ubiquitous in the literature of edit distances for trees. In particular, one can show that given a coupling CC(T,G), if we add to C:

    ● Couples of the form (a,D) for every aVT such that a is deleted, i.e., aπT(C) and #ΛTC(a)1;

    ● Couples of the form (D,b) for every bVG such that b is deleted, i.e., bπG(C) and #ΛGC(b)1;

    ● Couples of the form (a,G) for every aVT such that a is unused, i.e., aπT(C) and #ΛTC(a)=1;

    ● Couples of the form (G,b) for every bVG such that b is unused, i.e., bπG(C) and #ΛGC(b)=1.

    We obtain a set MC, which is a mapping in the sense of [41].

    We now start introducing our novel formulation of the interleaving distance. We do so in a 3 step procedure which is summarized in Figure 14(a) in the appendix.

    In this section, we establish some correspondence between couplings and maps α:TG, giving also the definition of the cost of a coupling. As a first step, given a coupling C, we induce two maps αC:VTUTCG and βC:VGUGCT, following these rules:

    1) If (x,y)C, then αC(x):=y.

    2) xDTC with #Λ(x)=0, then αC(x):=skT(η(x)) with k being:

    k=f(x)+12(f(φ(x))f(x))g(η(x)) if g(η(x))f(x)+12(f(φ(x))f(x));

    k=0, otherwise.

    Where the idea is that we want to send x above some coupled vertex changing its height "as little as possible". Bear in mind that the sequence [x,φ(x)] should not appear in the image of βα.

    3) If instead xDTC with #Λ(x)>1, then αC(x):=χ(x).

    The map βC is obtained in the same way, exchanging the roles of (T,f) and (G,g). Figure 5(b) shows an example in which βC is obtained starting from a coupling.

    Figure 5.  The red and black couples of adjacent letters indicate a coupling C={(a,b),(b,a),(c,c),(r,r)}C(T,G). In Figure 5(a), the gray arrows represent the map βC:VGVT, which then is naturally extended to the metric trees. The image of such map is then shifted upward using the structure map skT as in Figure 5(b), to obtain βεC, with the shift being indicated by the upward red arrows. The ε-good map βεC is represented in Figure 5(b), via the orange arrows and the shaded orange portion of T.

    As a first result, we obtain that the maps we induce respect the tree structures of the respective merge trees.

    Proposition 1. Consider x,xVTUT: if x<x, then αC(x)αC(x).

    Now, we need to extend αC to a continuous function between T and G. To formally extend the map, we need the following lemma.

    Lemma 1. Any xT such that xπT(C)D, is contained in a sequence of edges [lC(x),uC(x)] such that lC(x)max{vxvπT(C)D}, and uC(x) is either uC(x)=+ or uC(x)=min{vxvπT(C)D}. Moreover, lC(x) is unique if {vUlC(x)vx}=, meanwhile, if {vUlC(x)vx}, there exists a unique lC(x) such that lC(x)D.

    Now, we can obtain a continuous map αC:TG.

    Proposition 2. We can extend αC to a continuous map between T and G. This map, with an abuse of notation, is still called αC.

    Proof. We define αC(x) for xπT(C)DTC. By Lemma 1, we have x[l(x),u(x)], with αC being defined for l(x) and u(x). Whenever l(x) is not unique, we take the unique l(x)πT(C). Clearly, we have f(l(x))<f(x)<f(u(x)). Thus, if u(x)<+, there is a unique λ[0,1] such that f(x)=λf(l(x))+(1λ)f(u(x)). Having fixed such λ, we define αC(x) to be the unique point in [αC(l(x)),αC(u(x))], which is a sequence of edges thanks to Proposition 1 with height λg(αC(l(x)))+(1λ)g(αC(u(x))). If u(x)=+, then x>v with v=maxπT(C). Then, we set αC(x)=y such that yαC(v) and g(y)=g(αC(v))+f(x)f(v). Note that, for x,x such that u(x),u(x)=+, the map αC always preserves distances, and we have g(αC(x))f(x)=g(αC(v))f(v). Thus, at such points it is a continuous function.

    Consider now a converging sequence xnx in T. We know that definitively {xn}nN is contained in one or more edges containing x. Thus, we can obtain a finite set of converging subsequences by intersecting {xn} with such edges. With an abuse of notation, from now on, we use {xn}nN to indicate any such sequence. Each of those edges is contained in a unique sequence of edges of the form [l(x),u(x)], for some x (up to, eventually, taking l(x)D). Thus, {xn}[l(x),u(x)] induces a unique sequence {λn}[0,1] such that f(xn)=λnf(l(x))+(1λn)f(u(x)) and λnλx, with f(x)=λxf(l(x))+(1λx)f(u(x)). By Lemma 1, αC(x)[αC(l(x)),αC(u(x))]. Moreover, by construction, g(αC(xn))λxg(αC(l(x)))+(1λx)g(αC(u(x)))=g(αC(x)). Thus, αC is continuous.

    In order to start relating maps induced by couplings and ε-good maps, we define the cost of the coupling C, which is given in terms of how much the (continuous) αC:TG moves the points of T. To highlight the domains of the maps involved in the following definition, we make use of the identity map IdT:TT, for a metric tree T.

    Definition 9. Given CC(T,G) and xT, we define costC(x)=∣g(αC(x))f(x). Coherently, we define C=max{gαCfIdT,fβCgIdG}.

    Note that, given CC(T,G) and xT, we have one of the following possibilities:

    ● If (x,y)C, costC(x)=∣f(x)g(y);

    ● If xπT(C)D, costC(x)max{costC(l(x)),costC(u(x))}; in fact,

    g(αC(x))f(x)=∣λg(αC(l(x)))λf(l(x))+(1λ)g(αC(u(x)))(1λ)f(u(x))λcostC(l(x))+(1λ)costC(u(x));

    ● We are left with the case xD. We have two different scenarios:

    a) If #Λ(x)=0, then costC(x)=max{(f(φ(x))f(x))/2,g(η(x))f(x)}. We point out that (f(φ(x))f(x))/2 is the cost of deleting the path [x,φ(x)] in two steps: when applying α we halve the distance between x and φ(x), and then with β we obtain f(βα(x))=f(φ(x)). This can happen if below w (with (δ(x),w)C) there is "room" to send x at height f(x)+(f(φ(x))f(x))/2; if, instead, η(x) is higher than f(x)+(f(φ(x))f(x))/2, we simply send x to η(x);

    b) If #Λ(x)>1, we have costC(x)=∣f(x)g(χ(x)).

    As a consequence, we point out two facts:

    1) For every α:TG such that, for every xπT(C)D, α(x)=αC(x), we have:

    gαCfIdT≤∥gαfIdT.

    2) If we fix some total ordering of the elements in VTVG, so that we can write VTVT=(a1,,an), then C=max(cost(a1),,cost(an)).

    In this section, we prove the first fundamental interaction between couplings and the interleaving distance between merge trees, with our final goal being to replace the problem of finding optimal maps with the combinatorial problem of obtaining optimal couplings. Figure 5 summarizes all the steps of the procedure, which is formally addressed in this section. Throughout the section, we fix a coupling C and some ε≥∥C.

    We build a map αεC:TG, defined as αεC(x)=skxG(αC(x)) with kx=f(x)+εg(αC(x)) depending on x. Note that, since g(αC(x))f(x)ϵ for every xT, we always have kx0. Moreover, by construction, g(αεC(x))=g(αC(x))+f(x)+εg(αC(x))=f(x)+ε.

    We want to prove that αεC is an ε-good map.

    Remark 5. The map αεC is such that, given x,xT with x<x, we have αεC(x)<αεC(x).

    As a first step, we obtain that αεC satisfies some necessary conditions in order to be ε-good, it is in fact continuous, and moves points "upward" by ε.

    Proposition 3. The map αεC is a continuous map between T and G such that for every xT, we have g(αεC(x))=f(x)+ε.

    Before proving the main result of this section, we need a short lemma.

    Lemma 2. Consider (v,w),(v,w)C, and take x=LCA(v,v), y=LCA(w,w). Then, f(x)g(y)∣≤ε.

    At this point, we are ready to prove the remaining properties which make αεC an ε-good map.

    Theorem 1. The map αεC defined in previous lines is an ε-good map.

    This time, we start from an ε-good map α:TG and our aim is to induce a C with Cε.

    Given a metric merge tree T, we define the following map: T:TVT given by T(x)=max{vVTvx}. Note that, if xVT, then T(x)=x; otherwise, x is an internal vertex of one edge (a,b) (possibly with b=+) and T(x)=a. With this notation, we introduce the following maps:

    ϕ:VTVG,vT(α(v)),ψ:VGVT,wG(β(w)).

    Note that we always have g(ϕ(v))g(α(v))f(v)+ε. These maps will be pivotal in the proof of the upcoming theorem, since they will help us in closing the gap between the continuous formulation of the interleaving distance and the discrete matching of merge trees via couplings. The reader should refer to Figure 6 for a visual example of the above definitions.

    Figure 6.  Given the two maps α,β of Figure 2, the shaded gray represents the image of those maps, the red arrows give the maps ψ (left) and ϕ (right), see Section 6, and the red letters next to the black ones indicate the coupling {(a,a),(d,d)}, satisfying Theorem 2. Such coupling is compatible with the procedure outlined in the proof of Theorem 2 (upon perturbing T to meet the generality condition (G)).

    We prove a corollary which characterizes the maps we just defined.

    Corollary 1. Let v,vVT; if v<v, then ϕ(v)ϕ(v).

    Clearly, an analogous result holds for ψ, which implies that, in the setting of the corollary, ψ(ϕ(v))ψ(ϕ(v)).

    Now, we prove the main result of this section.

    Theorem 2. Given α ε-good map between T and G, there is a coupling C between T and G such that Cε.

    Putting together Theorems 1 and 2, we see that two merge trees are ε-interleaved if, and only if, there is a coupling C between the merge trees such that C=ε. Thus, computing the interleaving distance amounts to finding a minimal-cost coupling between two merge trees. As a byproduct of this result, we obtain another proof that the interleaving distance between metric merge tree can be found as a minimum and not just as an infimum as in Definition 6, for the set of available couplings is finite.

    We close this section with a conjecture which we want to investigate in future works, which stems from one of the definitions of the GH distance [1] and the already exploited link between such distance in the case of metric merge trees and the interleaving distance [25].

    Given any map α:TG, we define H(α):=maxxTg(α(x))f(x). Similarly, for any map μ:TT, which thus rearranges the points of T, we set 2D(μ)=maxxTd(x,μ(x)) (where d is the shortest path metric on T).

    Conjecture 1. We conjecture that:

    dI(T,G)=minα,β{H(α),H(β),D(βα),D(αβ)},

    where α:TG and β:GT vary in the set of continuous monotone maps.

    In this section, exploiting couplings, we define a variation of the interleaving distance, which we call constrained interleaving distance, following the literature on tree edit distances. In particular, we borrow the constrained condition for couplings from [54].

    Definition 10. A coupling C is constrained if the following hold:

    (CONS) for every a,b,cET and a,b,cET such that (a,a),(b,b),(c,c)C, we have:

    LCA({a,b})c if and only if LCA({a,b})c.

    Restricting the optimization domain to constrained mappings, we obtain the constrained interleaving distance.

    Definition 11. We define the constrained interleaving distance between two merge trees T,T as:

    dcI(T,T):=min{CCC(T,T) and C is constrained}.

    As already mentioned, constrained definitions of mappings have been successfully employed to bring the computational cost of edit distances for merge trees into the realm of polynomial time algorithms [43,48,49,52,53]. Since our couplings are closely related to the mappings employed in [39,52] (see Remark 4), we intend to use this formulation as a pivotal building block to obtain practically accessible and reliable upper bounds also for the interleaving distance.

    We point out that the definition of dcI depends heavily on the formulation of the interleaving distance via couplings, as condition (CONS) is much harder to characterize via the maps αC or αεC. For instance, even if C is a constrained coupling, we may have a vertex c with two children {a,b} such that b is coupled, a is deleted, and αC(a)αC(b). In this case, we would have:

    αC(a)=LCA({αC(a),αC(b)})<αC(c) but LCA({a,b})=c.

    The same scenario can be replicated also with αεC.

    Due to the challenges and technicalities arising in characterizing the maps αC coming from a constrained coupling C, we leave a proper investigation of this constrained definition to future works. In particular, we conjecture that dCI is a metric between merge trees. Plus, together with the recursive procedure described in Section 8, we plan on using it to obtain a polynomial time algorithm viable for applications.

    Now, we exploit the formulation of the interleaving distance via couplings to obtain recursive decomposition properties inspired by the ones of edit distance between trees. In particular, we prove that, under some conditions, optimal couplings can be computed via the solution of smaller independent subproblems, which are then aggregated to solve the global one. If the aforementioned conditions do not apply, we still obtain lower and upper bounds for the interleaving distance.

    The main result of the section involves couplings with some strong properties, which we call special and collect under the following notation:

    Co(T,G):={CC(T,G)argminv<maxπT(C)f(v)πT(C) and argminw<maxπG(C)g(w)πG(C)}.

    Before proceeding, we need a few pieces of notation used to lighten the dissertation and one last technical definition. Given xVT and yVG, we define Tx=subT(x) and Gy=subG(y). Moreover, CR(T,G):={CC(T,G)(rT,rG)C}. Similarly, we have CoR(T,G):=CR(T,G)Co(T,G).

    Definition 12. Given a partially ordered set (A,<), A is an anti-chain if for any a,bA, ab, neither b>a or a>b hold. A subset of a poset is an anti-chain if it satisfies this condition with the restriction of the poset structure.

    Now, we introduce the combinatorial objects we will use to decompose the following optimization problem: minCC(T,G)C. The reader may look at Figure 7 to find examples involving the following definition.

    Figure 7.  The letters in red show three examples of subsets of VT such that: the two leftmost examples are anti-chains (see Definition 13), while the rightmost example is not. The red branch in the leftmost tree signifies that if we have some coupling C such that red vertices give the set πT(C), then #ΛC(c)=2, and so c is deleted.

    Definition 13. We define C(T,G) as the set of all CVT×VG such that:

    (A0) C{(LCA(πT(C)),LCA(πG(C)))}C(T,G);

    (A1) C is an anti-chain.

    The intuition behind the definition of C(T,G) is that we are assuming that we already know argmin{CCCR(Tx,Gy)} for every xET and yEG, and we use CC(T,G) to optimally aggregate these results to find the optimal global coupling between T and G. We formalize such an idea in the following definition.

    Definition 14. Let Cx,yCR(Tx,Gy) for every x,yVT×VG. Given CC(T,G), with r=LCA(πT(C)) and r=LCA(πG(C)), we define the extension of C by means of {Cx,y}(x,y)C as:

    e(C)={(r,r)}((x,y)CCx,y).

    The set of all possible extensions of C is called E(C). If Cx,yargmin{CCCR(Tx,Gy)} for all x,y, then we call the extension minimal. We collect all minimal extensions of C in the set Em(C), which is never empty. If all Cx,y and e(C) are special couplings, then we call the extension special. We collect all special extensions of C in the set Eo(C), which is never empty. We name Eom(C)=Em(C)Eo(C) the set of minimal special extensions of C; this set could be empty.

    Since extensions are couplings, it is obvious that:

    dI(T,G)minCC(T,G)minCEm(C)C. (8.1)
    dI(T,G)minCC(T,G)minCEo(C)C. (8.2)

    See also Figure 8(a). Moreover, it is also clear that any coupling is an extension of some CC(T,G) and, thus:

    dI(T,G)=minCC(T,G)minCE(C)C.

    The upcoming theorem states that there are strong relationships between dI and extensions obtained via a fixed family of Cx,yCR(Tx,Gy).

    Figure 8.  Two figures related to decomposition and extensions of couplings.

    Theorem 3. (Decomposition) Consider two merge trees T and G and take a collection {Cx,y}(x,y)VT×VG with Cx,yargmin{CCCR(Tx,Gy)}. Given CC(T,G), we name e(C) the extension obtained by means of {Cx,y}. We have:

    minCC(T,G)max{coste(C)(v)vπT(e(C))or#Λe(C)(v)>0}dI(T,G). (8.3)

    Moreover, if Cx,y is also a special coupling for every x,y, we have:

    dI(T,G)=minCC(T,G)e(C). (8.4)

    We remark that Theorem 3 is in some sense unexpected: if Eqs (8.1) and (8.2) are in some sense trivial, Eqs (8.3) and (8.4) are not. We highlight that the couplings {Cx,y} are independently fixed at the beginning and not chosen with some joint optimization strategy. Moreover, the proof of Eq (8.4) depends strongly on the possibility to find optimal couplings which are also special.

    The aim of this section is to exploit the coupling formulation of the interleaving distance and its properties to find ways to get upper and lower bounds for dI. More in details: Section 9.1 resorts to pruning to reduce the complexity of the trees and finds upper bounds for dI, while Section 9.2 takes a more complex approach: via a recursive decomposition of couplings obtains lower and upper bounds for dI.

    We briefly introduce the pruning operator Pε. Given a merge tree T and ε>0, the merge tree Pε(T) is obtained through a recursive procedure: given a leaf x and its father x, if f(x)f(x)<ε, we say that x is a small-weight leaf; we want to remove all small-weight leaves (unless two or more of them are siblings, i.e., children of the same father) and all degree 2 vertices from T. In this case, we want to remove all leaves but the one being the lowest leaf. To make this procedure well-defined and to make sure that, in the end, no small-weight leaves are left in the tree, we need to choose some ordering of the leaves and to resort to recursion.

    (P) Take a leaf x such that f(father(x))f(x) is minimal; if f(father(x))f(x)<ε, remove x and its father if it becomes a degree 2 vertex after removing x.

    We take T0=T and apply operation (P) to obtain T1. On the result, we apply again (P) obtaining T2, and we go on until we reach a fixed point Tn=Tn+1=Pε(T) of such sequence. To shed some light on this definition, we prove the following lemma; Figure 9 can be helpful in following the statements.

    Figure 9.  A pruning operator applied on a merge tree T. The red branches are removed from the tree, while the orange ones are kept and represent the metric merge tree Pε(T). We highlight that the root of the tree changes. Lastly, note that deleting a instead of b would give isomorphic merge trees.

    Lemma 3. Given T, ε>0, and Pε(T), we have a natural injective map VPε(T)VT which identifies vertices in Pε(T) with vertices in T. Call CεVPε(T)×VT the set of couples induced by VPε(T)VT. The following hold:

    1) Cε is a coupling;

    2) LPε(T)LT, and for every v and v such that v<v and f(v)f(v)ε, there is xLPε(T) such that LCA(x,v)<v; in particular, argminfLPε(T);

    3) For every vVTVPε(T), we have #ΛTCε(v)1; in particular, if vDTCε, we have #ΛTCε(v)=0 and f(φCε(v))f(v)<ε;

    4) The map: ηCε:DTCεVT such that f(ηCε(x))<f(x) for all xDTCε;

    5) Cεε/2.

    Having characterized the pruned trees with the above lemma, we can obtain the following proposition.

    Proposition 4. Given two merge trees T and G, we have:

    dI(T,G)max{dI(Pε(T),Pε(G)),ε/2}.

    As a result, if the number of leaves of T and G is too high, we know that we can prune them, reducing the computational complexity of dI, to obtain an estimate from above of dI(T,G).

    We close this section with a conjecture we would like to investigate in the future.

    Conjecture 2. The coupling Cε is a minimizing coupling.

    In this section, we exploit Theorem 3 to obtain a dynamical approach to estimate, from above and below, an optimal coupling between two merge trees by solving recursively binary linear programming (BLP) problems.

    As a first step, we separate the problem of finding CC(T,G) with a cost-minimizing extension into two "independent" problems: the first one is to find a minimal-fixed root coupling, and the second one is to compute the cost of deleting the vertices which are not in the subtrees whose roots are fixed by C. To make the upcoming lemma more clear, we establish the following notation. Suppose r=LCA(πT(C)) and r=LCA(πG(C)). Then for vr, we set:

    vr=min{vr and vv},

    and fr=minvrf(v). Lastly, we define:

    Hr,r=max{maxvrf(vr)f(v)2,maxvrgrf(v),maxwrg(wr)g(w)2,maxwrfrg(w)}. (9.1)

    Note that Hr,r does not depend on the chosen extension of C, and, in fact, it depends only on r,r. The upcoming lemma states that Hr,r accounts for the deletions of all the vertices which are not below r or r.

    Lemma 4. Consider T,G merge trees and CC(T,G), with r=LCA(πT(C)) and r=LCA(πG(C)). Let Cr,r be any extension of C. With an abuse of notation, we consider Cr,r as the coupling in C(Tr,Gr) consisting of the same couples as Cr,r. Then,

    Cr,rmax{Cr,r,Hr,r}. (9.2)

    If Cr,r is a special coupling, the inequality becomes an equality.

    Since the computation of Hr,r can be easily accessed in a greedy fashion, from now on, we focus on estimating Cargmin{CCCR(T,G)}.

    Now, we outline a procedure to estimate Cargmin{CCCR(T,G)}. As in Theorem 3, we assume that we already have computed the couplings Cx,yCR(Tx,Gy) that we want to use to extend any CC(T,G), as in Theorem 3. So for instance, if we want to work with special extensions, we assume that we have Cx,yargmin{CCCoR(Tx,Gy)} for all xET and yEG, and we exploit them to obtain a special extension of some CC(T,G). If we want to work with minimal extensions, instead, we assume to have Cx,yargmin{CCCR(Tx,Gy)}. We anticipate that both kinds of extensions are important for our purposes: special extension will be used to produce upper bounds for the interleaving distance, while minimal extension will be used, exploiting Theorem 3, to obtain lower bounds.

    As before, we set: fx=minvxf(v) and gy=minwyg(w). Lastly, we fix a constant K>0 such that K>maxxVT,yVGf(x)g(y). In Section 9.2.5, we point out which steps of the upcoming procedure may produce errors.

    We consider the following set of binary variables: ax,y for xET and yEG; ux for xET and uy for yEG. The vector of all variables (upon choosing some arbitrary ordering) will be referred to as V.

    We also introduce the following auxiliary variables, which are linear functions of ax,y, ux, and uy:

    cx=yax,y and cy=xax,y;

    Λx=vxcv and Λy=wycw;

    dx=vxcv+v>xcv and dy=wycw+w>ycw.

    Given such vatiable, we define the following linear constraints:

    (1) For every lLT: lx<rTcx1 and for every lLG: ly<rGcy1;

    (2) ux0.5Λx and uy0.5Λy for every x and y;

    (3) uxmxΛx+qx and uymyΛy+qy for every x and y. With mx,qx being any pair of (m,q) such that the following are satisfied: q<0, m+q<0, 2m+q>0, m(#LT)<1 (analogously for my,qy);

    (4) (Only for special extensions) Let ˜x=argminvVTf(v) and ˜y=argminwVGg(w); then, we ask ˜xx<rTcx1 and ˜yy<rGcy1.

    The set of vectors of variables which satisfy these constraints is called either Km(T,G) or Ko(T,G), depending on whether constraints (4) are, respectively, included or not. Note that Km(T,G)Ko(T,G). To lighten the notation, we often avoid explicit reference to T and G when talking about the set of possible solutions, and, when we do not wish to distinguish between Km and Ko, we simply write K. Now, we briefly comment on the variables and constraints to try to give some intuition about their roles:

    ● The variables ax,y are used to match x with y, i.e., add the couple (x,y) to the coupling. In particular, constraints (1) ensure that, if VK, the set CV:={(x,y)ET×EGax,y=1} belongs to C(T,G);

    ● The variables dx and ux are used to determine if x must be deleted. Start with dx. The main idea, which our optimization procedure is based on, is that if av,w=1 with v>x, it means that x is taken care of by the coupling Cv,w and, thus, we want to "ignore" such x. In other words, having dx=0 implies that x is deleted with #Λ(x)=0;

    ● Now, we turn to ux. Observe that, if Λx{0,1}, then ux=0 while if Λx2, ux=1. Note that Λx#LT and Λy#LG. Constraints (2) and (3) are fundamental to fix the value of ux, depending linearly on Λx: ux=1 if, and only if, xLCA(v,v), with av,w=1 and av,w=1, for some w,wVG. Thus, having ux=1 means that x is deleted with #Λ(x)>1;

    ● Lastly, we comment on constraints (4). These constraints are asking that there is one point above the lowest vertex in each tree which is coupled by CV; this in particular implies that, if we assume Cx,yCo(Tx,Gy), then ˜x and ˜y (with ˜x=argminvVTf(v) and ˜y=argminwVGg(w)) are never deleted. Thus, {(rT,rG)}((x,y)CVCx,y) is a special extension, i.e., a coupling in CoR(T,G).

    As a consequence of these observations, any CC(T,G) induces a unique vector of variables VK, such that CV=C. Moreover, if Cx,yCo(Tx,Gy) and VKo, then {(rT,rG)}(x,y)CVCx,y is a special extension of CV.

    A key fact that we highlight is that, by property (A1), if x=LCA(v,v), with av,w=1, av,w=1 and x<rT, then δ(x)=rT due to the anti-chain condition. So, we know that any vertex x such that xx<rT, is in DTCV. Thus, given xVT, and with xf being its father, the "local" objective function depends on the following quantities:

    yax,yCx,y: it is the cost of matching Tx and a subtree Gy, for some y, if (x,y) is added to C, i.e., if ax,y=1. If Cx,y is not a minimal coupling, this is an upper bound;

    g(rG)f(x)ux: it is an upper bound to the cost of deleting x, with #Λ(x)>1;

    Ax=0.5(f(xf)fx)(1dx): in case x is deleted with #Λ(x)=0, it gives a lower bound to the deletion of Tx, as it is equal to deleting the lowest point below x to the height of the father of x, in two step. It is an exact value if xf=φ(x) and g(η(x))fx<Ax;

    Bx={yav,y(gyfx)Kdx}vET,v<xf: in case x is deleted with #Λ(x)=0, it helps giving an upper bound to the deletion of Tx, depending on what happens below xf. If there is v<xf such that v is coupled with y and Cx,y is a special coupling, then we know that gyg(η(x)), and so gyfxg(η(x))fx. Thus, if Cx,y is a special coupling, max{maxBx,Ax}cost(x).

    For any xET, we define:

    Γ(x):=max{yax,yCx,y,(g(rG)f(x))ux,Ax,maxBx},

    similarly, for any yEG:

    Γ(y):=max{((f(rT)g(y))uy,Ay,maxBy}.

    In full analogy, we set:

    Γ(x):=max{yax,yCx,y,Ax},

    and

    Γ(y):=Ay

    for any yEG.

    Lastly, Γ(rT):=Γ(rT):=∣rTrG.

    With an abuse of notation, we write:

    Γ(V):=maxxETEGΓ(x), (9.3)
    Γ(V):=maxxVTVGΓ(x). (9.4)

    Note that all the functions we defined in this section depend on T, G, and on the family {Cx,y}(x,y)VT×VG, but we avoid explicit reference to those dependencies for notational convenience.

    We sum up the main properties of the definitions we have just stated with the following proposition.

    Proposition 5. Given CC(T,G) and Cx,yCoR(Tx,Gy) for all (x,y)C, we call:

    Co:={(rT,rG)}((x,y)CCx,y).

    If Co is a special extension, then:

    CoΓ(V) (9.5)

    with V being the unique vector in Ko such that CV=Co.

    Vice versa, given CC(T,G), and Cx,yCR(Tx,Gy) with minimal cost for all (x,y)C, we call

    Cm:={(rT,rG)}((x,y)CCx,y).

    Then,

    Γ(V)max{costCm(v)vπT(Cm)or#ΛCm(v)>0} (9.6)

    with V being the unique vector in Km such that CV=Cm.

    Putting together Theorem 3 and Proposition 5, we obtain as a corollary:

    Corollary 2. Consider T,G merge trees, and take for every (x,y)ET×EG:

    1) Cx,yargminCCR(Tx,Gy)C;

    2) Cx,yargminCCoR(Tx,Gy)C.

    Then,

    minVKmΓ(V)minCCR(T,G)CminVKoΓ(V),

    where Γ is computed with the collection {Cx,y} and Γ is computed with the collection {Cx,y}.

    Now, we get rid of the fixed roots, obtaining an approximation for dI(T,G) by putting together Theorem 3, Lemma 4, and Proposition 5. To do so, we consider (x,y)ET×EG, take the subtrees Tx and Gy, and apply the framework introduced in Section 9.2.3 and Section 9.2.4, replacing T and G with, respectively, Tx and Gy. We write Km(x,y), Ko(x,y), Γ(x,y), and Γ(x,y) to imply that the constraints and the objective functions are defined using the subtrees Tx and Gy. To lighten the notation, we simply write V to identify the optimization variables. In other words, when we write Γ(x,y)(V), we are implying VKo(x,y), etc.

    Corollary 3. In the same setting as Corollary 2, for each (x,y)VT×VG, we have:

    Wx,y:=minVKm(x,y)Γ(x,y)(V)minCCR(Tx,Gy)CWx,y:=minVKo(x,y)Γ(x,y)(V),

    where Γ(x,y) is computed with {Cx,y} and Γ(x,y) is computed with {Cx,y}.

    Consequently,

    min(x,y)VT×VGmax{Hx,y,Wx,y}dI(T,G)min(x,y)VT×VGmax{Hx,y,Wx,y}. (9.7)

    We take this section to briefly isolate which are the situations in which the procedure outlined in Section 9.2.3 and Section 9.2.4 may produce errors, w.r.t. the true interleaving distance.

    1) g(rG)f(x)ux: clearly, rG (resp., rT) is an upper bound for χ(x) (resp., χ(y)) with x (resp., y) being deleted with #Λ(x)>1 (resp., #Λ(y)>1), and so g(rG)f(x)ux is an upper bound to the cost of the corresponding deletion.

    2) Bx: for vDT with #Λ(v)=0, x=φ(v), and x=father(x), we may have f(v)g(η(v))∣≤maxBx.

    We point out two facts. First, Γ does not involve any of the aforementioned situations. Second, in the definition of Γ, the biggest potential source of error are the terms g(rG)f(x)ux, which can overestimate the cost of deleting internal vertices in order to swap father-children relationships. We point out that metrics for merge trees like [43,48], even in their unconstrained formulation, do not even account for the possibility of deleting internal vertices in order to swap father-children relationships. Because of this, they are unstable and authors resort to preprocessing strategies, trying to make up for this source of instability. See also [41] for more details. Also, replacing g(rG)f(x)ux with the exact deletion cost would turn the linear optimization problem into a polynomial one.

    In Section 9.2.3, we have introduced a set of linear constraints needed to optimize a nonlinear function (either Γ or Γ) of the form minVKmaxiFi(V) for some real-valued functions Fi which are linear in V. See Section 9.2.4. We can turn this into a linear optimization problem by exploiting a standard trick, introducing auxiliary variables and additional constraints.

    Suppose we need to find minsmax(f(s),g(s)), with f,g functions of s. We introduce the variable u, with the constraints uf(s) and ug(s), and then solve the linear problem minsRn,uf(s),ug(s)u, with n equal to the number of constrains. We want to use this procedure to compute minVKΓ(V) and minVKΓ(V). We write down the details only for minVKΓ(V), and the case for Γ(V) follows analogously.

    Given xET, we define F1x=yax,yCx,y, F2x=(g(rG)f(x))ux, and F3x=Ax. Given yEG, instead, we set F1y=(f(rT)g(y))uy, and F2y=Ay. Analogously, we have F1rT=∣f(rT)g(rG). Having fixed a total ordering on VT={a0,,an} and VG={b0,,bm}, respectively, we call F the vector containing all the components Fix and Fjy of all the points taken in the chosen order:

    F:=(F1a1,F2a1,F3a1,,F1ai,F2ai,F3ai,,F1b1,F2b1,,F1bm,F2bm)=(Fi)iI1, (9.8)

    for some set I1 indexing the components of F.

    Similarly, for every x, we order the elements of the set Bx, so that we can write Bx as a vector Bx=(B1x,,Bhx,), then we define the vector:

    B:=(B1a0,,Bh0a0,,B1ai,,Bhiai,,B1b0,,Bt0b0,,B1bj,,Btjbj)=(Bi)iI2, (9.9)

    for some set I2 indexing the components of B.

    Lastly, we introduce the real valued auxiliary variable z and add the following linear constraints to the ones presented in Section 9.2.3:

    (5) zFi(V) for all iI1;

    (6) zBi(V) for every iI2,

    and then solve minz,Vz with all these constraints. We call Z(V) the set of admissible values defined by constraints (5) and (6), as a function of V.

    We stress again that Fi and Bi are linear in V; so the final problem is linear with mixed binary valued variables. In case of Γ, we repeat the same operations, omitting the constraints in (6).

    In this section, the results obtained in Section 8 and the formulation established in the previous parts of Section 9 are used to obtain the algorithm implemented to estimate the metric dI between merge trees. We need to introduce some last pieces of notation in order to describe the "bottom-up" nature of the procedure.

    Given xVT, define lenT(x) to be the cardinality of {vVTxvrT} and len(T)=maxvVTlenT(v); similarly, lvlT(x)=len(T)lenT(x), and so lvl1T(n)={vVTlvlT(v)=n}.

    The key property for our bottom-up procedure is that: lvlT(x)>lvlT(v) for any vsubT(x). Thus, we will compute some quantity ˜Wx,y for some xlvl1T(n) and ylvl1G(m), assuming that we will have already computed ˜Wv,w for all v, w in VT, VG such that lvlT(v)<n and lvlG(w)<m. We write down Algorithm 1, which refers to the computation of minΓ. Note that thanks to constraints (4) this recursive procedure is always guaranteed to provide a special coupling. However, clearly, not all special couplings satisfy constraints (4).

    For a couple (x,y)VT×VG, we write ˜Z(x,y)(V) to indicate the set of admissible values as a function of V, described by constraints (5) and (6), where the functions Fi and Bi are given by the components of Γ(x,y), apart from Fv1=wEGyav,wCv,w, which is replaced by some quantity wEGyav,w˜Wv,w we will define in the upcoming lines. In particular, ˜Wv,w, which we will have computed at iterations prior to the one involving (x,y), is an estimate of Cv,w. See Algorithm 1.

    Algorithm 1 Bottom-up algorithm for an upper bound to dI(T,G)
    Nlen(T)
    Mlen(G)
    n,m0
    while nN or mM do
      for (x,y)VT×VG such that lvlT(x)n and lvlG(y)m do
       Hx,ySolution of Equation (9.1)
       ˜Wx,yminz˜Z(x,y)(V),VKo(x,y)z       We know ˜Wv,w for all (v,w) needed
      end for
      nn+1
      mm+1
    end while
    return min(x,y)VT×VGmax{Hx,y,˜Wx,y}

    The case for Γ follows the same algorithm, but instead of ˜Wx,y, we have:

    ˜Wx,y=minz˜Z(x,y)(V),VKm(x,y)z,

    where ˜Z(x,y)(V) stands for the set of admissible values described by constraints (5), with the functions Fi given by the components of Γ(x,y), apart from Fv1=wEGyav,wCv,w, which is replaced by wEGyav,w˜Wv,w for every vETx. By Section 9.2.5, such a recursive procedure produces no error when replacing Fv1 with wEGyav,w˜Wv,w.

    In other words, for any collection Cx,yargminCCmR(Tx,Gy)C, with (x,y)ET×EG, we have:

    min(x,y)VT×VGmax{Hx,y,˜Wx,y}=min(x,y)VT×VGmax{Hx,y,Wx,y}dI(T,G),

    giving a valid lower bound for dI. Similarly, since constraints (4) are not satisfied by all special couplings, given any collection Cx,yargminCCoR(Tx,Gy)C, with (x,y)ET×EG, we have:

    dI(T,G)min(x,y)VT×VGmax{Hx,y,Wx,y}min(x,y)VT×VGmax{Hx,y,˜Wx,y},

    giving a valid upper bound for dI.

    We make a brief observation to take care of the interactions arising between the estimation errors described in Section 9.2.5 and the bottom-up procedure proposed in Section 9.2.7.

    Consider the setting of Section 9.2.4 and suppose that, instead of having computed the optimal couplings Cx,y, we have some nonoptimal Cx,y as it is the case for Wx,y and ˜Wx,y - with an error bound ex,y such that Cx,yCx,y∣<ex,y. We immediately see that the errors ex,y affect only the components of the form yax,yCx,y, as these are the only parts of the optimization procedure which depend on Cx,y. Moreover, the potential errors occurring in Γ and Γ, appear at the level of Bx and ux (see Section 9.2.5), but ux and Bx do not depend on Cx,y nor on ux and Bx, for some other x. This implies that the only interaction between errors is of the form max{a,b}, but they never aggregate in any way in the objective functions. Thus, the final error in the algorithm presented in Section 9.2.7 is the maximum of all the "independent" errors occurring at every iteration.

    Proposition 6. (Computational cost) Let T and T be two merge trees with full binary tree structures with dim(T)=#ET=N and dim(T)=#ET=M.

    Then,

    ● to compute ˜Wx,y for every (x,y)ET×ET with Algorithm 1, we need to solve O(MN) mixed binary linear optimization problems, with O(MN) variables and O(Mlog2(M)+Nlog2(N)) linear constraints;

    ● to compute ˜Wx,y for every (x,y)ET×ET with Algorithm 1, we need to solve O(MN) mixed binary linear optimization problems, with O(MN) variables and O(log2(M)+log2(N)) linear constraints.

    Proof. In a full binary tree structure, at each level a, we have 2a vertices. Let A=len(T) and A=len(T). We have that, for any vertex vVT at level a, the cardinality of the path from v to any of the leaves in subT(v) is Aa and the number of leaves in subT(v) is 2Aa. The number of vertices in subT(v) is instead 2Aa+11 and the number of edges 2Aa+12.

    Consider vVT at level a and wVT at level a. According to Section 9.2.3 to compute Γ with Tv and Tw, we need (2Aa+12)(2Aa+12)+2Aa+12+2Aa+12 binary variables and 2Aa+2Aa+2(2Aa+12)+2(2Aa+12) linear constraints (if we consider constraints (1)–(3) in Section 9.2.3) in the nonlinear problem. Constraint (4) adds just 1 further constraint, so we can ignore it. To linearize the problem, we need to add 1 real valued auxiliary variable and 3(2Aa+12)+2(2Aa+12) constraints to take into account for the set F and additional:

    ak=02k(2Aak+11)+as=02s(2Aas+11)

    constraints for the set B. Simplifying, we have: (a+1)(2Aa+11)+(a+1)(2Aa+11)O(2A+2A) constraints for B.

    Putting the things together we have:

    (2Aa+12)(2Aa+12)+2Aa+12+2Aa+12 (9.10)

    binary variables and

    (2Aa+2Aa+5(2Aa+12)+4(2Aa+12)+a(2Aa+11)+a(2Aa+11) (9.11)

    linear constraints.

    Thus, to minimize Γ, we need to solve (2L+11)(2A+11) linear binary optimization problems, each with 2A2A+O(A+A) variables and O(2A+2A) constraints. Instead, for Γ, we need to solve (2A+11)(2A+11) linear binary optimization problems, each with 2A2A+O(A+A) variables and O(A+A) constraints. Substituting A=log2(N) and A=log2(M) in Eqs (10) and (11) gives the result.

    We highlight that the classical edit distance between unlabeled and unordered trees, obtained with BLP [31], can be computed by solving O(NM) BLP problems with O(NM) variables and O(log2(M)+log2(N)) constraints, O(N+M) if we count also the constraints restricting the integer variables to {0,1}, as the authors of [31] do. Thus, the approximating procedures we developed are not far from the computational complexity of the renowned tree edit distance: min˜Wx,y differs by some linear factors while min˜Wx,y has the same complexity.

    As already mentioned, most of the edit distances for merge trees, in their unconstrained formulation, have equal or higher computational complexity w.r.t. the classical edit distance between unlabeled and unordered trees. Thus, the fact that the authors of those metrics obtain polynomial-time algorithms under constrained formulations supports our conjecture that a similarly constrained formulation of the interleaving distance could yield practically useful polynomial bounds.

    In Appendix B, we provide a comparison between the computational complexity of our methods and those presented in [3] and [25].

    In this section, we extensively look at the practical behavior of the upper and lower bounds of dI that we obtained in Section 9.2. We name du(T,G) the upper bound obtained with Algorithm 1, and dl(T,G) the lower bound obtained with the analogous algorithm. Generally speaking, we aim at showcasing that we can use du in many practical scenarios, and we have a number of tools at our disposal to check the reliability of the obtained results. More in details:

    ● We use the upper bound du as reference estimate of dI: we compare it with the upper bound developed in [16], use it to analyze datasets, etc.;

    ● We use the lower bound dl and the bottleneck distance between persistence diagrams to find the potential error range of the computed upper bound, exploiting the fact that dBdI [9]. More precisely we check the relative discrepancy:

    Δ:=(dumax{dl,dB})/du. (10.1)

    With this quantity, we can obtain an upper bound on the error between dI and du via dudIΔdu;

    ● We use the sup norms between the considered functions to check if du shows instances of unstable behavior, i.e., du(Tf,Tg)>∥fg.

    We perform three kinds of simulations: in Section 10.1, we compare our estimates of dI with the method presented by [16], which, to the best of our knowledge, is the only publicly available code to estimate dI; in Section 10.2, we look at the empirical stability of du by checking how adding pointwise noise to a smooth function is reflected on the associated merge trees in terms of du; in Section 10.3, we tackle some benchmark classification case studies involving some functional datasets.

    The method proposed by [16] turns the unlabeled problem of the interleaving distance between merge trees into a labeled interleaving problem by proposing a suitable set of labels. The optimal labeling would give the exact value of the interleaving distance, but, in general, this procedure just returns an upper bound. We call dlab the upper bound obtained with the labeled method proposed by [16].

    For any fixed i, we generate a couple of point clouds Cki={(xkj,ykj)j=1,,ni}R2, with k=1,2, according to the following process:

    xkjiidN(5,σx,k) j=1,,ni,ykjiidN(5,σy,k) j=1,,ni,σx,kN(3,1),σy,kN(3,1),

    where N(μ,σ) indicates a Gaussian distribution.

    Note that ni regulates the number of leaves in the trees (which we fix before sampling Cki). From Cki we obtain the single linkage hierarchical clustering dendrogram TCki (that is, the merge tree representing the Vietoris Rips filtration of Cki), and then compute du(TC1i,TC2i), dl(TC1i,TC2i), and dlab(TC1i,TC2i). The distance dlab is computed with the code available at https://github.com/trneedham/Decorated-Merge-Trees, while du is computed via the procedure described in Section 9.2.7. For each ni{3,,15}, we sample 100 pairs of point clouds and then, similarly to Eq (10.1), compute the relative difference between the two estimates: (dlabdu)/du. Note that there are no absolute values.

    We repeat the same experiment, this time with ni{100,...,109}. Since in this case du requires too much time to be computed exactly, we exploit Proposition 4 and consider the smallest ε>0 such that Pε(TC1i) and Pε(TC2i) have fewer or equal than 15 leaves. We then call dopt(TC1,TC2)=max{ε/2,du(Pε(TC1),Pε(TC2))}.

    For ni{100,...,109}, we compute the relative difference between dlab and dopt with the formula: (dlabdopt)/dopt. Note, again, that there are no absolute values.

    The results of the simulations can be seen in Figure 10. Looking at Figure 10, we see that, in the context of our data-generating process, the estimate given by dlab is very unreliable, overestimating dI by a median of at least 50–60% of its actual value; in particular, there are some outliers which are completely off, even w.r.t. the values obtained with du and dopt, with errors of more than 3 times the actual value.

    Figure 10.  Descriptive statistics of the relative differences between du, dopt, and dlab, with du being the upper bound of dI we described in Section 9.2, dopt being the pruning-assisted upper bound we obtained using the results in Section 9.1, and dlab being the estimation procedure of [16]. Note that no absolute value appears in the formulas of the relative differences. For more details, see Section 10.1.

    The estimates obtained with du in this simulation have a negligible difference with the actual values of dI: across all ni{3,,15}, the maximum value of the relative difference Δ between du and dl was 107. Note that, for the other values of ni, we cannot give a lower bound using dl on the pruned trees.

    As a conclusive remark, we say that the computational advantages of the labeled approach are immense and potentially adequate even for real-time applications, but from our simulation, we see that the results need to be taken with care, for the estimates produced are not always good. On the other hand, with the present implementation, the computational cost of our approach becomes prohibitive quite quickly as the number of leaves in the trees increases, even if there might be situations like the one in this simulation in which the scheme we used for ni100 can produce good estimates. We think that interactions between the two approaches could lead to substantial speed-ups: being able to fix the value of some variables in our algorithm exploiting the labeling scheme by [16] could greatly reduce the dimensionality of the problem and, thus, its computational cost.

    Now, we want to verify, experimentally, if du retains good stability properties w.r.t. the universal stability properties of the interleaving distance.

    To do so, we pick a standard model in functional data analysis and look at how the distance between the merge trees associated to noisy observations and the original smooth function behave as we increase the noise in the model. Roughly speaking, the model considers a smooth function f, randomly sample a set of points in the domain where we evaluate f, and add pointwise noise to these values. The final set of couples is a single datum in the generated dataset.

    We generate f by interpolating, with cubic splines, a set of randomly chosen couples in [0,1]×R. The distribution that we use to sample this set of couples is the following: we take 0=x1<<xN=1, N=20, forming a regular grid in [0,1]. Then, yi, i=1,,N, are sampled independently from a Gaussian with mean 0 and standard deviation 50. The couples {(xi,yi)}i=1,,N are interpolated to obtain f. Other methods to sample a smooth function could be employed as well.

    We sample the noisy observations of f according to the following model, which belongs to a family of standard models for functional data (see [45]). We name such model (Mσf).

    (Mσf) Let X be a random variable distributed with uniform density on [0,1], and consider the random variable Y such that YX=xf(x)+ϵ, with ϵ being an independent Gaussian with mean zero and standard deviation σ.

    We obtain each of our partially observed noisy functions drawing i.i.d. samples from (Mσf). Specifically, for each value of σ{0.1,1,10,15,25}, we sample a dataset Dσ where each of the 100 i.i.d. observations (i.e., each partially observed function) is obtained by sampling independently n=50 couples from (Mσf).

    We compute the merge tree Tf of the smooth function f via linear interpolation of {(˜xi,f(˜xi))}i=1,,300, with {˜xi}i=1,,300 being a uniform grid on [0,1]. Then, for each {(ai,bi)}ji=1,,50Dσ, j=1,100, we compute the merge tree of the linear interpolation of the sampled couples without extending it to the whole interval [0,1], and we estimate, via du and dl, the interleaving distance between the merge tree obtained from the smooth f and from the noisy observation. As σ, which controls the magnitude of the noise, increases, we are interested in observing the behavior of the distance between the observed and the true merge trees. The sampled partial observations of functions have also been extended on [0,1] via a nearest value extension (NVE) technique, both for plotting purposes and to compute the sup norm between the extended functions, to check the stability of the upper bound du. See Figure 11.

    Figure 11.  Plots related to the simulated datasets featured in Section 10.2. Each of the partially observed functions, generated according to the model (Mσf), have been extended on [0,1] using nearest value extrapolation.

    To better portrait the stability of du, we benchmark our upper bound against a metric for merge trees with very different stability properties: the edit distance dE defined in [41]. Such edit distance can be computed exactly with a computational complexity similar of the ones of the upper and lower bounds we develop in this work, and satisfies the inequalities:

    dI(T1,T2)dE(T1,T2)2(dim(T1)+dim(T2))dI(T1,T2).

    This implies that dE can grow linearly with the size of the trees.

    Figure 11(d) summarizes the behavior of du and dE in response to increasing levels of noise. More precisely:

    ● The orange line interpolates, for each value of σ, the median of:

    {du(Tf,Tjσ)Tjσ is the merge tree of {(ai,bi)}ji=1,,50Dσ,j=1,30};

    ● The green line interpolates, for each value of σ, the median of:

    {dE(Tf,Tjσ)Tjσ is the merge tree of {(ai,bi)}ji=1,,50Dσ,j=1,30};

    ● The blue line interpolates, for each value of σ, the median of:

    {ffjσfjσ is the NVE on [0,1] of {(ai,bi)}ji=1,,50Dσ,j=1,30}.

    The vertical bars in Figure 12 represent 1.5 times the interquartile range (IQR) below and above the median (i.e., the boxplots). In other words, we are checking how far from the true merge trees the observed merge trees tend to fall, as noise increases, where far is either in terms of du or dE. This is precisely where stability properties become crucial, as they provide robustness to support the analyst's findings.

    Figure 12.  Lines resulting from the interpolation of the medians of the distances dE and du, measured between the merge trees of f and the merge trees obtained from their noisy observations contained in Dσ, for different values of σ. For each value of σ, we report also the sup of the difference between the smooth function and the noisy observations (extended on [0,1]). The dashed line represents the line (σ,6σ), plotted for reference. For each σ, the vertical bars represent the boxplots of the data. We clearly see how the values of dE "explode", while du behaves as we would expect from dI: the universal properties of dI guarantee that dI is controlled by the difference between the functions. See Section 10.2 for more details.

    For a reference, we also report the plot of the line (σ,6σ): we know that for every (X,Y) distributed according to model (Mσf), P(Y(f(X)3σ,f(X)+3σ)X=x)1, for every x[0,1]. In particular, for {(Xi,Yi)}i=1,,50 i.i.d., we have P(maxi=1,,50Yif(Xi)∣<6σ)0.9. In other words, [0,6σ] is a 0.9 confidence interval for the max difference between the smooth functions and the noisy observations on the grid given by the realizations of {Xi}i=1,,50. Note that, despite this, outside such grid, i.e., outside the points were the functions are observed, the error between the smooth functions and the extended noisy observations can be higher. Still, the line (σ,6σ) can serve as a rough reference for the pointwise difference between the smooth functions and the noisy observations.

    In Figure 12, we see that the deviation between the true and the observed merge trees behave as we would expect: the deviation in terms of du increases with the magnitude of the noise σ, being controlled by the sup norm between the smooth functions and the extension on [0,1] of the noisy observations. The deviation in terms of dE, instead, explodes as σ increases, as dE depends on a combination of 1) the magnitude of noise and 2) the number of oscillations in the functions. When sampling and interpolating the partial and noisy observations of f, we are also potentially increasing the number of oscillations of the data w.r.t. the original f: while Tf has 5 leaves, for σ=0.1, the number of leaves ranges from 8 to 4, with a median of 6; for σ=1, the number of leaves ranges from 12 to 4, with a median of 9; for σ=10, the number of leaves ranges from 18 to 9, with a median of 14; for σ=15, the number of leaves ranges from 19 to 11, with a median of 15; for σ=25, the number of leaves ranges from 19 to 11, with a median of 15. Note that, at most, we can have 19 leaves, given that we sample on a grid of 50 points.

    All of what we just said is coherent with the behavior we observe in Figure 12. The line (σ,6σ) further supports the coherence of our findings: for small values of σ, the error introduced by having to extend the observed values from a random grid to [0,1] overshadows the pointwise error; meanwhile, for σ=25 we see that the whole boxplot of the sup norms is below the dashed line.

    To conclude this simulation, we double-check the results we obtained with du by computing also the relative discrepancies with dl, i.e., Δ, as in Eq (10.1). In Table 1, we report some percentiles - 95th,90th,85th,80th of the values Δ as a function of σ, showing that in most cases we have negligible uncertainties in our computations. As σ increases, some non-negligible discrepancies start to appear in a limited number of calculations. However, even in the worst case scenario, 85% of the distances still have a relative discrepancy between du and dI, which is guaranteed to be lower than 6%. No unstable behaviors of du, i.e., instances of du(Tf,Tg)>∥fg, were registered. As a result, we can look at the results of this simulation with a very high level of confidence.

    Table 1.  A table reporting some percentiles 95th,90th,85th,80th of the relative discrepancies between the upper and lower bounds Δ=(dudl)/du for the simulation presented in Section 10.2: the different values of σ represent the different magnitude of noise that we introduce when sampling our observations. The table shows that in the vast majority of the cases, we are guaranteed a negligible error in our computations. When noise levels are high, and so when the merge trees become more and more different, some discrepancy between du and dl appears.
    σ=0.1 σ=1 σ=10 σ=15 σ=25
    95th Perc. <1013 <1014 0.23 0.22 0.18
    90th Perc. <1014 <1014 0.09 0.06 0.14
    85th Perc. <1014 <1014 104 <102 0.06
    80th Perc. <1014 <1015 <1014 <107 0.02

     | Show Table
    DownLoad: CSV

    Having tested du (and dl) in some controlled, simulated scenarios, now we test it on some benchmark functional datasets, tackling some classification problems.

    The datasets we considered are suitable for our purposes for three reasons:

    1) They are freely available and easily accessible via the scikit-fda python package [44];

    2) Their complexity (in terms of number of statistical units, and number of oscillations of each function) allows the use of our estimation algorithms;

    3) We can use the sup of the difference between the functions to check the stability of du.

    To further validate our results, we build two kinds of alternative pipelines: a first one obtained by replacing du between merge trees, with dB between the associated persistence diagrams; and a second one using directly the original functions. The general pipeline we follow for du and dB is the following: we compute merge trees/diagrams, take pairwise distances, fit an embedding of the resulting metric space into an Euclidean space Isomap [6], and use quadratic discriminant analysis (QDA) on the embedded vectors. Since all the functions in each dataset are evaluated on the same grid, we also fit some "Baseline" pipelines, obtained by reducing the vectors' dimensionality with principal component analysis (PCA), to avoid overfitting, and using QDA directly on the resulting vectors. The parameters for the Isomap embeddings (the dimension and the number of points in a neighborhood) and the dimension of the PCA have been selected using leave-one-out cross validation, maximizing accuracy. We report the parameters and the results of the best performing models in Table 2.

    Table 2.  Results of the classification tasks considered in Section 10.3. Each column deals with a different task, while each row contains the leave-one-out accuracy, and the best performing parameters obtained with dB, du (or dopt), or the baseline models. For dB and du, we also report the parameters of the Isomap embedding (neigh,dim), while for the Baseline models we report the PCA dimension. Parameters were selected to maximize leave-one-out accuracy.
    Results
    Octane NOx Growth Tecator Beef
    dB 1 0.88 0.84 0.98 0.58
    du(dopt) 1 0.86 0.87 0.98 0.68
    Baseline 1 0.94 0.93 0.97 0.7
    Optimal parameters
    Octane NOx Growth Tecator Beef
    dB (2,8) (14,44) (6,71) (52,209) (12,17)
    du(dopt) (2,8) (4,5) (10,8) (10,181) (12,38)
    Baseline 2 6 2 4 8

     | Show Table
    DownLoad: CSV

    We emphasize that the goal of these simulations is not to promote TDA tools over classical statistical methods, but rather to illustrate the practical behavior of our computational procedures and provide a blueprint for their application. In this regard, comparing du and dl with more traditional and arguably more suitable approaches offers a grounded assessment of their effectiveness. Additionally, we investigate whether the theoretical inequality dBdI [9], which suggests that dI has greater discriminative power than dB, also holds practical significance in these benchmark scenarios.

    Accordingly, the selected datasets were not chosen specifically for their suitability to TDA techniques. This contrasts with applications such as those in [42], where functions must be analyzed up to warpings or re-parametrizations (see also [36]). In such cases, it has been shown that classical methods require complex and cumbersome pipelines to achieve reliable results, whereas the straightforward TDA approaches used here remain effective.

    The datasets and the corresponding data analysis problems we consider are the following:

    ● "Octane" dataset [24]: contains near infrared spectra of gasoline samples, with wavelengths ranging from 1102nm to 1552nm with measurements every 2 nm; it contains six outliers to which ethanol was added, as sometimes required by some regulations. We want to build a classifier able to perform outlier detection. The dataset contains 39 observations, with 6 outliers, and the resulting merge trees have between 4 and 5 leaves, with a median of 4 leaves. In Figure 13(a), we see that there is clear separation between the outliers and the other data. Coherently, Table 2 shows that all the different models we considered successfully achieved this task;

    Figure 13.  Plots related to the case studies presented in Section 10.3.

    ● "NOx" dataset [26]: contains hourly measured daily nitrogen oxides (NOx) emissions in the Barcelona area. Since NOx causes ozone formation and contributes to global warning, it is of interest the identification of days with abnormally large emissions to allow the implementation of actions able to control their causes, which are primarily the combustion processes generated by motor vehicles and industries. The observations are labeled depending on if the emission curve was recorded during working days or during the weekend. We aim to reconstruct this labeling via a supervised classification problem. The dataset contains 115 observations, with 79 working days and 39 non-working days, and the resulting merge trees have between 2 and 9 leaves, with a median of 5 leaves. In Figure 13(b), we can visually appreciate that the problem is more challenging compared to the previous one. Despite that, Table 2 shows that we achieve very high accuracy, in cross-validation, with our pipelines, with the baseline approach outscoring the TDA one, and persistence diagrams achieving slightly better results than merge trees;

    ● "Growth" dataset [50]: is a popular dataset in functional data analysis, often referred to as "The Berkeley Growth Study", which contains the height measurements (in cm) of 54 girls and 39 boys between the ages of 1 and 18 years, see Figure 13(c). This dataset has been considered by a number of works, including works dealing with the problem of aligning/re-parametrizing the growth curves, factoring out of the analysis the "personal biological clock" [51] of each individual, to be able to more clearly identify any growth patterns in the data. See [51] and references therein. One interesting problem, often considered, is to compute the first derivative of the growth curves and try to recognize the different growth dynamics between boys and girls. This is also what we do: we smooth the functions via some kernel smoother (with a bandwidth of 3 and with the same kernel employed in [42]), consider the numerical derivatives of the smoothed curves, and try to discriminate boys from girls. The dataset contains 93 observations, 54 girls and 39 boys, and the merge trees of the derivatives have between 4 and 9 leaves, with a median of 5 leaves. In Table 2, we see that all the models achieve good results on this data, with the baseline being the best performing model and merge trees outscoring persistence diagrams;

    ● "Tecator" dataset (https://lib.stat.cmu.edu/datasets/tecator): this publicly available data is collected by a tool called "Tecator Infratec Food and Feed Analyzer", working in the wavelength range 850–1050 nm by the near infrared transmission principle, see Figure 13(d). Each sample contains a curve extracted from meat with different moisture, fat and protein contents. Many different data analysis questions and problems can be set up with this data. Relying on the derivatives of the curves, we consider the problem, as in [27], to discriminate between the curves coming from high fat meat (i.e., fat content >20) and low fat meat (i.e., fat content 20). The dataset contains 215 observations, 77 high fat samples, and 138 low fat samples, and the merge trees of the derivatives have between 2 and 10 leaves, with a median of 5 leaves. Table 2 reports the cross-validation accuracy of the classification models. We see that the TDA approaches, both with du and dB, outperform the baseline model, with merge trees again doing slightly better than persistence diagrams.

    ● "Beef" dataset [5]: contains five classes of mid-infrared spectrograms of beef, taken from a total of 60 cooked samples of minced silverside cuts with different kinds of adulterations, ranging from pure beef to beef adulterated with four different types of offal. This dataset is more complex than the previous ones, especially in terms of the number of oscillations of the spectrograms, with the corresponding trees ranging from 30 up to 64 leaves. As a consequence, we have resorted to the approach presented in Section 10.1, computing dopt instead of du, pruning merge trees so that they are left with at most 14 leaves. Most of the pruned trees have 14, leaves, while approximately 1/3 of the trees have 13 leaves, and two have 12 leaves. The leave-one-out results of this classification case study are reported in Table 2 and show that merge trees perform very similarly to the Baseline method, doing much better than persistence diagrams.

    In the first four datasets, du gives very good estimates of dI, with Δ being less than 0.008 for 95% of the computed distances in each dataset. We did observe, for the "Tecator" dataset, 7 instances of du exceeding . Since the values by which du exceeded were always below 108, we believe it is due to numerical errors. For the "Beef" dataset, we cannot give lower bounds, but we checked for unstable behaviors of dopt, and none were recorded.

    To conclude, we have shown several case studies in which du produced reliable estimates of dI, validated by controlling the outputs with lower bounds (via dl and dB) and upper bounds (via ). The results obtained via the implemented pipelines were bench-marked against established TDA and classical tools, showing equal or superior performances w.r.t. persistence diagrams in all but one situation.

    In this work, we introduced a novel graph-matching formulation for the interleaving distance between merge trees, leveraging global matchings rather than the traditional approach based on ε-good maps. This reformulation not only provides a new perspective on the metric but also facilitates computational advancements by enabling recursive decomposition techniques and constrained formulations inspired by the literature on edit distances. As a first practical result of our approach, we propose upper and lower bounds using a dynamic linear binary programming framework.

    While we leave to future works the problem of adapting the dynamic linear binary programming algorithm to the constrained interleaving distance, conjecturing that it would result in a polynomial time upper bound, our empirical evaluation demonstrated that the presented approach reliably estimates the interleaving distance and provides more controlled error estimates compared to existing methods. Through a comprehensive set of case studies, we validated the practical efficacy of our bounds by benchmarking them against both traditional TDA methods and classical statistical approaches.

    However, our method also comes with computational trade-offs: the reliance on integer programming comes with a high computational burden, making the approach suitable only for small to moderately sized datasets. Another drawback of the presented approach is that it only provides a-posteriori error bounds: we don't have methods to provide the reliability of our tools before actually computing the distance, or for calibrating their precision. In particular, it is not possible to set an arbitrary precision level by choosing the value of some parameter.

    On top of the aforementioned research direction aiming at proving theoretical and computational properties of the constrained interleaving distance, our findings also suggest some additional directions. The relationship between optimal couplings and continuous monotone maps, as highlighted in Section 6, hints at yet another possible formulation of the interleaving distance purely in terms of metric tree mappings. Given that this is among the first works practically employing the interleaving distance, further comparisons between the interleaving distance and other merge tree distances could help elucidate their respective stability and sensitivity properties in different data analysis contexts.

    In conclusion, our work provides both theoretical advancements and practical tools for computing and understanding the interleaving distance. Even if computational efficiency remains a challenge, we have provided effective tools to work in small-data regimes, and we believe that we have laid out the main steps for finally making the interleaving distance a disposable tool for analyzing data in topological data analysis.

    The author declares that he have not used Artificial Intelligence (AI) tools in the creation of this article.

    This work was carried out as part of my PhD Thesis, under the supervision of Professor Piercesare Secchi. The author also acknowledge the support of the Wallenberg AI, Autonomous Systems and Software Program (WASP), and of the SciLifeLab and Wallenberg National Program for Data-Driven Life Science (DDLS), which fund the project: Topological Data Anlysis of Functional Genome to Find Covariation Signatures.

    The author declares no conflict of interest.

    To facilitate navigating through the manuscript, we prepared some flowcharts (Figure 14), and a table (Table 3), summarizing the structure of our theoretical investigation and the most important pieces of notation we introduce.

    Figure 14.  Flowcharts.
    Table 3.  A table containing the most important functions which are defined throughout the manuscript. Such functions are fundamental, especially for the results contained in the first half of the manuscript.
    Notation Definition
    skT:TT Structural map of a metric tree T;
    shifts upwards the points by kR
    πT:CVT Restirction of Cartesian Product Projection
    ΛTC:VTP(VT) ΛTC(v)=maxv<vπT(C) or 
    φTC:VTVT φTC(x)=min{vVTv>x and #Λ(v)0}
    δTC:VTVT δTC(x)=min{vVTvx and vπT(C)}
    χTC:VTVG χTC(x)=LCA({πG((v,w))vΛT(x)})
    γTC:VTDTCVG argmin{g(w)(v,w)C,v<x} or 
    ηTC:VTVG ηTC(x)=γTC(φTC(x))
    αC:TG The continuous map induced by a coupling C
    αεC:TG The ε-good map induced by a coupling C:
    αεC(x)=skxG(αC(x)) with kx=f(x)+εg(αC(x))
    [lC(x),uC(x)] lC(x)max{vxvπT(C)D};
    uC(x) is either uC(x)=+ or uC(x)=min{vxvπT(C)D}
    T:TVT T(x)=max{vVTvx}
    ϕ:VTVG ϕ(v)=(α(v))
    ψ:VGVT ψ(w)=(β(w))

     | Show Table
    DownLoad: CSV

    We now discuss the computational complexity and practical performance of our algorithms in comparison to those developed in [3] and [25]. To begin with, we note that, to the best of our knowledge, no public implementations of the latter two algorithms are currently available. As a result, we do not provide a case study to evaluate their runtimes or the reliability of their approximations. Similarly, the original works do not include any examples or simulations.

    Moreover, these algorithms differ significantly in nature: ours is a BLP-based estimation algorithm equipped with a posteriori error bounds; [3] offers an approximation within a tree-dependent multiplicative factor; and [25] presents an exact fixed-parameter tractable (FTP) algorithm. Given these fundamental differences, making a direct comparison is far from straightforward.

    Nonetheless, we attempt a comparison based on both theoretical and practical considerations, focusing on selected cases that, while specific, remain meaningful and illustrative.

    We start by focusing on the algorithm in [25]. The authors of [25] show that determining if dI(T,G)k, can be run in O((N+M+2)2log3(N+M+2)22τττ+2) (with N,M being as in Proposition 6), where τ depends on k, T, and G. Thus, in order make some explicit computations, we pick a situation in which we can parametrize τ in terms of tree complexity. To such extent, we consider a merge tree T with full binary tree structure, as in Proposition 6. We denote this tree by Tl, where l=len(Tl), and we assume that the height difference between each node and its child is exactly 1. In other words, the structure of Tl is fully described by l. Since each internal vertex has degree 3, for each k<l, we have τ(k)32k+1. In fact, k-balls, as defined in [25], are full binary trees of length k. Note that this is neither a worst-case scenario nor a best-case scenario analysis: it is just a reference situation in which we can make some explicit computations. For instance, having the length of the edges decay as lvlTl(v) grows would yield bigger values of τ(k).

    Suppose that we want to check if the interleaving distance is bounded from above by k=1,2,3,4,5. Then, the leading coefficient in the computational cost is, respectively, at least 1018,1043,1098,10223,10501, which then needs to be multiplied by (N+M+2)2log3(N+M+2). Note that for k=5, regardless of G, it is already challenging to compute the complexity of the algorithm, let alone running the algorithm itself to decide if dI(Tl,G)k. Plus, we point out that if G has one leaf, we have dI(Tl,G)=l, and having l=4,5 is well within the range of complexity contained in our runtime simulations. See Appendix C.

    To conclude this first comparison, we offer a complementary perspective by referencing an FTP algorithm developed in [4] for a specific edit distance between trees. More precisely, [4] addresses the computation of the edit distance between unordered, labeled trees with positive integer costs.

    We stress that the following discussion is intended as a rough, qualitative comparison: to establish a meaningful basis for comparison between the algorithm in [25] and the one in [4], we will consider additional specific scenarios that help align the assumptions and problem settings of the two approaches.

    This comparison is motivated by the following considerations:

    1) In Section 9.2.9, we show that computing min˜Wx,y is as expensive as computing the tree edit distance between two unlabeled weighted trees (with the same tree structures as the trees considered to compute the interleaving distance). Plus, the edit distance between unlabeled weighted trees can be framed as an edit distance between unordered labeled trees by setting relabeling, insertion, and deletion costs equal to the absolute difference between the weights of the involved edges (or zero, if only one edge is involved), and making sure that the labels sets of the two trees are disjoint;

    2) To the best of our knowledge, [4] is the only work providing an FTP algorithm to exactly compute an edit distance between either unordered or unlabeled trees.

    The results in [4] are not explicitly given as polynomials in max{M,N}, but rather focus on the existence of a polynomial upper bound on the time complexity in terms of max{M,N}. However, they do provide an approximation of the leading coefficient for such polynomial: O(2.62k). We set ak=2.62k. We want to compare such coefficient with bk=22τ(k)τ(k)τ(k)+2262k+1(32k+1)32k+1+2, which is a lower bound for the leading coefficient of the algorithm in [25], assuming T=Tl. Clearly, directly comparing dI(T,G)<k and dE(T,G)<k, with T and G having the same tree structures as, respectively, T and G, is not fair. In fact, dI is a "sup" kind of distance, collecting only the biggest difference between trees, while dE collects all the differences between trees, which are added to obtain the final value. Thus, when computing dE, we multiply k by the number of edges in T and G, i.e., we compare dI(T,G)<k with dE(T,G)<k(M+N). For N=M=20, and k=1,2,3,4,5, we obtain: while .

    Now, we focus on [3]. The authors of the paper provide a time algorithm, to approximate up to a factor of , with being equal to the ratio of the longest edge across and , over the smallest edge across and . More precisely, it takes to determine if up to a multiplicative factor of:

    Note that depends on and , and cannot be controlled or improved by changing some parameters.

    Let be the value returned by the algorithm in [3]. We know that , and thus:

    So, we can get a maximum range for the relative error introduced by , by taking , similarly to what was done with Eq (10.1). We compute such error ranges across the first four datasets considered in Section 10.3 to compare them against the ones we obtained with .

    We obtained the following mean values for the error ranges: ("Growth"), ("Octane"), (""), ("Tecator"). We point out that this means that, on average, the error range produced by is more than of . Our approach, instead, produced much more reliable estimates, as for of our distances, was less than . Thus, we argue that, whenever feasible, our estimation should be preferred to the techniques in [3].

    We now present a simulation study to showcase the computational runtimes of the implementation that was used in the case studies in Section 10. The data generating pipeline is the same as in Section 10.1, with the cardinality of the point cloud, i.e., the number of leaves in the trees, being . For each value of , we sample couples of point clouds, and compute betwen the associated merge trees. We report the boxplots of the obtained runtimes in Figure 15.

    Figure 15.  Boxplots of the runtimes, as a function of the number of leaves in each merge tree, for the simulation in Appendix C.

    We close the simulation with the following disclaimer: given the nature of the present work, our focus has not been on optimizing the code or the computational environment. For instance, we used a non-commercial version of the Gurobi 12 solver, without having ample RAM dedicated exclusively to these computations. Consequently, the reported runtimes do not accurately reflect the potential efficiency of a fully optimized implementation. See state-of-the-art implementations of metrics with similar complexity, such as [31].

    We consider separately the different combinations arising from the considered vertices belonging to or .

    ● The thesis is obvious if ;

    ● If and , then , with . Otherwise, (C3) is violated since , and so, ;

    ● If and , then with ;

    ● Lastly, suppose and consider the following cases:

    – If , then we have , and, thus, . This entails ;

    – If , then we have , and the thesis clearly follows;

    – Lastly, suppose that and ; then, .

    Given , if , we have a well-defined upper extreme . Moreover, since is totally ordered, is unique. If , i.e., , then .

    Now, consider . Since , then . Thus, also is well-defined. On top of that, since by hypotheses, we are considering only vertices such that , . Thus, is a nondegenerate sequence of edges.

    Suppose and . Clearly, ; so consider . We know , thus, , which is absurd since .

    Lastly, suppose (and so ) and . Then, for any . Let and consider . If for all , we are done. Clearly, there cannot be vertices with which are in . So suppose there is , with . Since , we have , but then , which means for some . Clearly, in there can be no vertex contained in apart from . Thus, , which is absurd.

    We need to check continuity. Consider in . We know that their order relationships is preserved by and . On top of that, yields and the result follows.

    We know that . Thus, are either deleted or coupled. Note that and . If both of them are coupled, then . Suppose is coupled with and is deleted. Then, and . Since and , we obtain the thesis.

    First, we check continuity, and then we check (P1)–(P3).

    is continuous by Proposition 3.

    ● (P1) holds by Proposition 3.

    ● Now, we prove (P2). Suppose we have and set .

    We can suppose and ; otherwise, at least one between and holds. Suppose the second one holds, then and and . Thus, (P2) holds. The same if the first one holds.

    Now, we show that if , we can find such that:

    ;

    ;

    and .

    Note that, in this case, upon calling , we have: by Lemma 2, , and so, , which means that . Thus, .

    We enumerate all the possible situations for ; clearly, the same hold for :

    : then ;

    : then with ; by hypothesis, and ;

    – if and : then .

    ● Now, we prove (P3). If , then . In fact, if with , then , since there are and . Then, we know that there is with .

    We know that ; thus, , and the thesis follows.

    We build by subsequently adding couples starting from an empty set. The proof is divided in sections which should help the reader in following the various steps.

    Before starting, following [25], we can take induced by so that and are -compatible.

    Step 1. Leaves of

    Step 1.1. Selecting the coupled leaves

    We consider the following set of leaves:

    (D.1)

    We give a name to the condition:

    so that we can more easily use it during the proof. Note that we can avoid treating the case thanks to .

    The set is the set of leaves which will be coupled by , while all other leaves will be deleted: we add to all the couples of the form with . We characterize those couples with the following proposition.

    Lemma 5. Given , then if and only if . Moreover, for every such that does not hold, there is such that .

    Proof. The first part of the proof reduces to observing that if, and only if, .

    Now, consider such that does not hold. We know there is such that . If , we are done; otherwise, there is such that . Note that . Thus, we can carry on this procedure until we find . Note that ; thus, in a finite number of steps, we are done.

    Step 1.2. Cost bound on couples

    Now, we want to prove the following proposition, which gives an upper bound for the cost of the couples added to .

    Lemma 6. Given , then .

    Proof. Suppose the thesis does not hold. Since , contradicting the thesis means that we have such that:

    .

    Let . If holds, then . Let . Note that . We have . Since , we also have with , which is absurd by Lemma 5.

    Step 1.3. Cost bound on deletions

    In this step, we prove the following proposition which gives an analogous bound to the one of Lemma 6, but for the deleted leaves of .

    Lemma 7. Given , then there exists such that:

    ● There is such that ;

    .

    Proof. Since does not hold for , we use Lemma 5 to obtain such that . However, since is an -good map, we have which implies . Thus, ends the proof.

    Lemma 7 implies that, using the notation of the proposition, . Then, implies that . Thus, . Since , we have that the cost of deleting any with is less then .

    Step 2. Leaves of

    Lemma 8. Given , there exists such that:

    ● There is with and ;

    .

    Proof. Consider . Let leaf. We have . If we are done for . If does not hold by Lemma 5, it means that there is such that . We are done since .

    Note that if for some , then, by Lemma 8, . In fact, using the notation of Lemma 8, in this case, we have by definition. As in Step 1.3, we have that the cost of the deletion of any such that and is at most .

    Step 3. Internal vertices

    Now, we need to extend the coupling taking into account the internal vertices of . We will do so after simplifying our merge trees in two different ways: First, we remove all vertices which are deleted with , and then we take out all inessential internal vertices.

    Step 3.1. Pruning

    Let and . We define as the merge tree obtained from deleting the following set of vertices (and the corresponding edges): satisfying both and . Note that, for any , either there is with or, for any leaf below , we can apply Lemma 7 and the consequential observations.

    The tree is obtained from in an analogous way: any time we have satisfying both and , is deleted from , along with the edge .

    Before proceeding, we point out that, by construction, the leaves of are exactly .

    Step 3.2. Restricting

    Thanks to Corollary 1, we have that anytime we delete some vertex in to obtain and that vertex is in the image of , we are sure that also its counterimage is deleted from .

    Now, for every , by construction we have that belongs to an edge removed from if, and only if, is deleted from . Consider . Then, , and thus, is deleted as well. This entails that is deleted as well. All of this put together implies that we can restrict to (the metric tree obtained from ), and its image lies in (obtained from ).

    Step 3.3. -Good restriction

    We define . We want to prove that is still an -good map. Clearly, and still hold upon restricting the domain. We just need to show . Suppose there is such that . We distinguish between two cases: and . Consider scenario : clearly implies that , and so, is deleted; scenario instead means that there is with . Clearly, , and, thus, is a leaf of , which means . This also implies that , and, in particular, , thus condition is satisfied.

    Step 3.4. Properties of and removal of inessential vertices

    We know that is injective. On top of that, we have proved that, if , then for some . Thus, is a bijection.

    From now on we will ignore any vertex such that . Formally, we introduce (and ) obtained from () removing all the vertices such that , (similarly, such that ): consider . We remove from and replace the edges and with the edge . We do this operation recursively until no vertices such that can be found.

    Step 3.5: Coupling and deleting internal vertices

    We start with the following lemma:

    Lemma 9. For every , and , we have and .

    Proof. Let . Then, with being the leaves of . Then, for every , and, thus, . Clearly, for analogous reasons. Thus, . For , we can make an analogous proof.

    Let and . Let and . We know that and , and so .

    Thus, we proceed as follows: we order all the internal vertices of according to their height values and we start from the higher one (the root of ; note that this, in general, is not the root of ), coupling it with the other root (thus, is verified). Consider a lower , and let be the leaves of , (and, thus, ). Let . If the leaves of are , we add the couple , otherwise, we skip which will be deleted, with . Note that . Moreover, by Lemma 9, . Going from the root downward we repeat this procedure for every .

    We clearly have:

    ● (C3) is satisfied;

    ;

    ● If is deleted, then and ;

    ● If is deleted, then and by Lemma 9.

    Step 4. Coupling Properties and Costs

    First, we verify that is a coupling:

    See Step 3.5;

    See Step 1 and Step 3.5, we carefully designed the coupling so that no vertex is coupled two times;

    It is explicitly proven in Step 3.5;

    In Step 3.4, we remove all vertices such that to obtain and ; all the couples in are either leaves of vertices in and . So, since all leaves of and are coupled, its impossible to have a vertex belonging to any tree such that .

    Lastly, we verify its costs:

    ● If , then as verified in Step 1.2 for , and in Step 3.5 for being an internal vertex;

    ● If , with , it is verified that in Step 1.3; if instead , we verify in Step 2; in both cases, we have a vertex lower than which, by construction, is coupled, and the cost of the couple is less then . Thus, the deletion of costs less than or equal to ;

    ● If , with , then we verify in Step 3.5 that the cost of this deletion is less than or equal to .

    This concludes the proof.

    First, we prove the second part of the statement.

    Given optimal coupling such that , we define . Clearly, . For any , consider such that with . By hypothesis, we know that exists for every . Note that .

    We want to prove that the extension satisfies . The set clearly is a coupling since for every , and are disjoint, and the same for . We need to consider the costs of different kinds of vertices separately. In particular, we indicate with whenever we have such that for some . If this condition does not hold, we say that holds for . The same definitions apply also for vertices in . For instance, holds for all vertices in and . Similarly, if holds for , then it holds also for all .

    ● Consider such that holds for some ; we have that , and , so depends only on . Thus, by construction, ;

    ● Suppose now holds for ; by construction, in this case, we have , , and, thus, . Suppose : holds for by means of some , and, thus, (thanks to the properties of couplings in ). Since , we then have . Lastly, consider . In this case, we have , and, thus, . Note that in all these situations, does not depend on the chosen extension.

    Exchanging the role of and , we obtain the same results for the vertices of .

    The first part of the statement follows with analogous reasoning, by dropping the special coupling constraints and focusing on the vertices . In particular, the special couplings constraints are necessary only if .

    1) This is due to the fact that all vertices of are coupled, and, thus, is still a rooted tree and does not alter the order relationships of the remaining vertices;

    2) Since removes at most one leaf from the previous tree and adds no new leaves, it is clear that . Consider now . Define:

    Note that both vertices are well-defined and .

    Let . We show that can't be deleted by the recursive application of (P), which also implies the thesis. Suppose it is deleted.

    By construction, is the last leaf, in terms of recursive applications of (P) and among the leaves below , that are deleted. In fact, at any step along the recursion, any sibling of will always have a lower distance to the common father. Thus, cannot be deleted until it has siblings, i.e., all other leaves below are deleted as well. This also means that, at some point along the recursion, becomes the father of . After all the siblings of are removed from the tree, finally, becomes a degree vertex with as an only child, and it is removed as well. So, becomes the father of . At this point we have:

    which is absurd as, for to be deleted, we must have ;

    Consider ; let . By construction, . We know that a vertex is removed from if, at a certain point along the recursive application of , it becomes a degree vertex or a small-weight leaf. Thus, the vertex is not removed from unless , which is absurd because then . Thus, . As a consequence, if, and only if, .

    The vertex is the first vertex above such that there is a leaf , , with . If , then by point there exist with which is absurd;

    Consider two leaves with , with and . On top of that, suppose . By point (3) we have which means that is a small-weight leaf, which is absurd;

    Combining points (3) and (4), we see that we only have deletions with and .

    Consider . Then, can be seen also as a coupling . Let be as in Lemma 3.

    We partition the vertices into three sets:

    ● The case is not a concern since the cost doesn't change when considering or ;

    ● If , then or . We ignore the first case. In the second case, , is such that , and so ;

    ● If , then and ; by construction . If , then also is deleted and, since , .

    So, we are left with the case . If , then, again, , for . Thus, we always have .

    Exchanging the role of and , and repeating the same observations, we obtain that, if we consider , we have .

    ● If , then and , so depends only on ;

    ● If , then it is either unused, if , or it is deleted with . Then the cost of such deletion is either or . If , , and so we are done. Otherwise, we simply have , and, thus, implying the lower bound that we needed to prove.

    We just explore the different pieces of the cost function to assess the thesis:

    ● In the case of , the cost of coupling and is given by ;

    ● If , we have which is the cost of deleting with . That is, any time with for . Recall that, in this case, we have ;

    ● Deleting with is taken care by the remaining part the cost function. For a vertex , we indicate with its father, and set and . Now, we try to unveil the meaning of and . Recall that deleting with corresponds to having . If , then and ; while if , both and are negative. Consider now . Then, , with being father of some . Clearly, as well and . In particular, (the same holds for ). Now, we turn to . If , then , and so , entailing . Instead, if , we have by construction with . Then, and so:

    (D.2)

    Given , with and , thanks to Corollary 2, we can approximate with and the costs of the vertices in and . By Lemma 4, then takes care of the vertices and .



    [1] Homeomorphic conjugacy of automorphisms of the torus. Proc. Amer. Math. Soc. (1965) 16: 1222-1225.
    [2] C. Anantharaman-Delaroche and J. Renault, Amenable Groupoids, L'Enseignement Mathématique, Genéve, 2000.
    [3] The structure of higher-dimensional non-commutative tori and metric Diophantine approximation. J. Reine Angew. Math. (1997) 492: 179-219.
    [4] R. Bowen, Equilibrium States and the Ergodic Theory of Anosov Diffeomorphisms, Lecture Notes in Math., Vol. 470. Springer-Verlag, Berlin-New York, 1975.
    [5] A class of -algebras and topological Markov chains. Invent. Math. (1980) 56: 251-268.
    [6] G. A. Elliott, On the -theory of the –algebra generated by a projective representation of a torsion-free discrete abelian group, in Operator Algebras and Group Representations, Pitman, Boston, MA, 17 (1984), 157–184.
    [7] The Rohlin property for shifts of finite type. J. Funct. Anal. (2005) 229: 277-299.
    [8] -theoretic duality of shifts of finite type. Comm. Math. Phys. (1997) 187: 509-522.
    [9] J. Kaminker, I. Putnam and J. Spielberg, Operator algebras and hyperbolic dynamics, Operator Algebras and Quantum Field Theory, 525-532, Int. Press, Cambridge, MA, 1997.
    [10] Ring and module structures on dimension groups associated with a shift of finite type. Ergodic Theory Dynam. Systems (2012) 32: 1370-1399.
    [11] Asymptotic continuous orbit equivalence of Smale spaces and Ruelle algebras. Canad. J. Math. (2019) 71: 1243-1296.
    [12] Topological conjugacy of topological Markov shifts and Ruelle algebras. J. Operator Theory (2019) 82: 253-284.
    [13] N. C. Phillips, Every simple higher dimensional non-commutative torus is an AT algebra, preprint, arXiv: math.OA/0609783.
    [14] -algebras from Smale spaces. Canad. J. Math. (1996) 48: 175-195.
    [15] I. F. Putnam, Hyperbolic Systems and Generalized Cuntz–Krieger Algebras, Lecture Notes, Summer School in Operator Algebras, Odense August 1996.
    [16] I. F. Putnam, A homology theory for Smale spaces, Mem. Amer. Math. Soc. 232 (2014), No. 1094. doi: 10.1090/memo/1094
    [17] The structure of -algebras associated with hyperbolic dynamical systems. J. Funct. Anal. (1999) 163: 279-299.
    [18] J. Renault, A groupoid approach to -algebras, Lecture Notes in Math., 793, Springer-Verlag, Berlin, Heidelberg and New York, 1980.
    [19] Cartan subalgebras in -algebras. Irish Math. Soc. Bull. (2008) 61: 29-63.
    [20] -algebras associated with irrational rotations. Pacific J. Math. (1981) 93: 415-429.
    [21] Projective modules over higher-dimensional non-commutative tori. Canad. J. Math. (1988) 40: 257-338.
    [22] D. Ruelle, Thermodynamic Formalism, Addison-Wesley, Reading, Mass., 1978.
    [23] Non-commutative algebras for hyperbolic diffeomorphisms. Invent. Math. (1988) 93: 1-13.
    [24] On factor representations and the -algebra of canonical commutation relations. Comm. Math. Phys. (1972) 24: 151-170.
    [25] Differentiable dynamical systems. Bull. Amer. Math. Soc. (1967) 73: 747-817.
    [26] K. Thomsen, -algebras of homoclinic and heteroclinic structure in expansive dynamics, Mem. Amer. Math. Soc., 206 (2010), No. 970. doi: 10.1090/S0065-9266-10-00581-8
  • This article has been cited by:

    1. Thijs Beurskens, Steven Van Den Broek, Arjen Simons, Willem Sonke, Kevin Verbeek, Tim Ophelders, Michael Hoffmann, Bettina Speckmann, 2025, ParkView: Visualizing Monotone Interleavings, 979-8-3315-0581-3, 118, 10.1109/PacificVis64226.2025.00018
  • Reader Comments
  • © 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2513) PDF downloads(174) Cited by(1)

Article outline

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog