Title: Approximate Labeled Subtree Homeomorphism
1Approximate Labeled Subtree Homeomorphism
- Ron Pinter
- Oleg Rokhlenko
- Dekel Tsur
- Michal Ziv-Ukelson
2Subtree Homeomorphism
Pattern
Text
3Subtree Homeomorphism
Pattern
Text
4Subtree Homeomorphism
Pattern
Text
5Subtree Homeomorphism
Pattern
Text
6Subtree Homeomorphism
Pattern
Text
7Best Homeomorphism
Pattern
Text
2 deletions
1 deletion
8Approximate Labelled Subtree Homeomorphism (ALSH)
?i,j
T1
-1 -1
2 -2
-2 2
T2
LSH score 12
LSH score 5
9Subtree Homeomorphism Complexity
m - of vertices in P n - of vertices in T
- Rooted trees nm1.5/logm Shamir and Tsur, 99
- Unrooted trees nm1.5/logm Shamir and Tsur, 99
10Our Results
ALSH Rooted Unrooted
Unordered Trees
Ordered Trees
m - of vertices in P n - of vertices in T
11Applications
- Analysis of metabolic pathways
- Semantic queries against semistructured databases
(represented as e.g. XML documents) - Natural language processing
- trees represent sentences
- nodes are labeled by words and sentential forms
12Agenda
- ALSH for unordered trees
- rooted unordered
- unrooted unordered
- Improving time complexity by graph compression
techniques - rooted and unrooted unordered
- ALSH for ordered trees
- rooted ordered
- unrooted ordered
13Related Work
- M.J. Chung. O(N2.5) time algorithms for the
subgraph homeomorphism problem on trees. 1987. - R. Shamir and D. Tsur. Faster subtree
homeomorphism. 1999. - G. Valiente. Constrained tree inclusion. 2003.
- P. Kilplelainer and H. Mannila. Ordered and
unordered tree inclusion. 1995. - M.A. Steel and T. Warnow. Kaikoura tree
theorems Computing the maximum agreement
subtree. 1993 - M. Farach and M. Thorup. Fast comparison of
evolutionary trees. 1995 - M.Y. Kao, T.W. Lam, W.K. Sung, and H.F. Ting.
Cavity matchings, label compressions, and
unrooted evolutionary trees. 2000
14ALSH on Rooted Unordered Trees
x1 x2 u
y1 w11 w12 w1u
y2 w21 w22 w2u
y3 w31 w32 w3u
v
15ALSH on Rooted Unordered Trees
x1 x2 u
y1 w11 w12 w1u
y2 w21 w22 w2u
y3 w31 w32 w3u
v
16ALSH on Rooted Unordered Trees
P
T
Weighted Assignment on G
u
v
x1
x2
y1
y2
y3
y1
x1 x2 u
y1 w11 w12 w1u
y2 w21 w22 w2u
y3 w31 w32 w3u
v
x1
y2
x2
y3
17Time Complexity of ALSH Rooted Unordered Trees
P
T
ui
vj
x1
xki
y1
ylj
- The algorithm computes an assignment for each
pair , where and
. - The assignment for ui and vj is computed using
the bipartite graph , where
. - Fredman Tarjan 87 show how to compute
AssignmentScore(G) in
.
18Time Complexity of ALSH Rooted Unordered Trees
and
Summing up all (ui,vj) node pairs
Observation 1
Observation 1
Under the similarity assumption weighted
assignment can be solved in O(V0.5Elog(VC))
Gabow and Tarjan, 89. And ALSH can be solved
in O(m1.5nlog(nC)) .
19Unrooted Unordered Trees (naïve approach)
Pattern (P)
Text (T)
u
v
Weighted Assignment 2x4
20Unrooted Unordered Trees (naïve approach)
Pattern (P)
Text (T)
u
v
Weighted Assignment 2x4
21Unrooted Unordered Trees (naïve approach)
O(nm3nm2logn) !!!
Pattern (P)
Text (T)
u
v
Weighted Assignment 2x4
22Unrooted Unordered Algorithm
Pattern (P)
Text (T)
y1
x1
y2
x2
y3
x3
y4
u
v
X
Y
Weighted Assignment 3x4
23Unrooted Unordered Algorithm
Pattern (P)
Text (T)
y1
x1
y2
x2
y3
xq
x3
y4
u
v
X
Y
24Unrooted Unordered Algorithm
Pattern (P)
Text (T)
y1
y1
x1
y2
y2
x2
y3
y3
xq
x3
y4
y4
u
v
X
Y
One augmentation !
25Unrooted Unordered Algorithm
Pattern (P)
Text (T)
y1
x1
y2
x2
y3
x3
y4
u
v
X
Y
xq
26Unrooted Unordered Algorithm
Pattern (P)
Text (T)
y1
x1
y2
x2
y3
x3
y4
u
v
X
Y
xq
One augmentation !
27Unrooted Unordered Algorithm
Pattern (P)
Text (T)
y1
x1
y2
x2
y3
xq
x3
y4
u
v
X
Y
28Unrooted Unordered Algorithm
Pattern (P)
Text (T)
y1
x1
y2
x2
y3
xq
x3
y4
u
v
X
Y
One augmentation !
29Decremental Property of Weighted Assignment
- Lemma
- Let
- be a bipartite graph
-
- for
- Computing the weighted assignments for the
series of bipartite graphs , for
can be done in time
30Compressed Graph
- Motivation
-
- Assuming a constant-sized label alphabet.
- Using the notion of clique partition of a
bipartite graph. - Feder and Motwani 1991, Shamir and Tsur 1999
31Compressed Graph
y1
P
T
x1
u
v
y2
x1
x2
y1
y2
y3
x2
y3
- Each node in bipartite graph represents the whole
subtree. - Bounded alphabet the number of distinct
trees is bounded. - Lemma
- The number of distinct labeled rooted trees in a
forest of n vertices is
32Compressed Graph
Graph G
X
2
7
3
2
4
4
4
5
7
Y
P
T
u
v
x1
x3
y1
y2
y4
x2
y3
X
Y
33Compressed Graph
Graph G
Graph G
X
2
7
3
0
2
0
0
4
4
4
5
C
7
7
2
3
5
4
4
Y
P
T
u
v
x1
x3
y1
y2
y4
x2
y3
X
Y
34Compressed Graph
- Lemma
- The assignment between and can be computed
in time -
- where
- d(u) is the number of neighbors of node u.
- D(u) is the number of distinct trees in the
forest of trees rooted at neighbors of u. - c(v) is the number of children of node v.
E V logV
35Compressed Graph
- Observation 2
- The sum of vertex degrees in an unrooted tree P
is
- Summing up the work over all pairs we
get
Observation 1
Observation 2
36Time Complexity
- Lemma
-
- (similar to Shamir and Tsur 1999)
- Thus the algorithm computes the optimal ALSH
solution for two rooted unordered trees in
37ALSH on Rooted Ordered Trees
y1
x1
y2
x2
y3
P
T
u
v
x1
x2
y1
y2
y3
38ALSH on Rooted Ordered Trees
y1
x1
y2
x2
y3
P
T
u
v
x1
x2
y1
y2
y3
39ALSH on Rooted Ordered Trees
y1
The main property NO CROSS-EDGES in the
bipartite graph !!!
x1
y2
x2
y3
P
T
u
v
x1
x2
y1
y2
y3
40ALSH on Rooted Ordered Trees
y1
The main property NO CROSS-EDGES in the
bipartite graph !!!
x1
y2
x2
y3
P
T
u
v
x1
x2
y1
y2
y3
41ALSH on Rooted Ordered Trees
y1
x1
y2
x2
y3
42ALSH on Rooted Ordered Trees
0
0
0
y1
x1
?(x1,y1)
-8
y2
ki1
x1
-8
?(x2,y2)
x2
y3
x2
y1
y2
y3
lj1
Observation 1
Observation 1
43ALSH on Unrooted Ordered Trees(Cyclic order)
A
B
E
D
C
44Cyclic String Comparison
ABCDE
P
n
m
Source
Destination Column ns Maxima
45Cyclic String Comparison
A
B
E
C
D
T
T
P
P
P
P
46Cyclic String Comparison
A
B
E
C
D
T
T
P
P
P
P
47Cyclic String Comparison
A
B
E
C
D
T
T
P
P
P
P
48Cyclic String Comparison
A
B
E
C
D
T
T
P
P
P
P
49Cyclic String Comparison
A
B
E
C
D
T
T
P
P
P
P
50Cyclic String Comparison
A
B
E
C
D
T
T
P
P
P
P
51Time Complexity
A
Real numbers score metric Maez 1990
B
E
Rational numbers score metric Schmidt
1998
C
D
T
P
P
P
P
52Time Complexity
A
Real numbers score metric Maez 1990
B
E
Rational numbers score metric Schmidt
1998
C
D
Observation 1
Observation 1
53ALSH on Unrooted Ordered Trees(Linear order)
A
B
E
D
C
54ALSH on Unrooted Ordered Trees(Linear order)
y1 y2 . . . . . yl
A
B
C
D
E
Dynamic Programming Matrix
FORWARD
BACKWARD
55ALSH on Unrooted Ordered Trees(Linear order)
y1 y2 . . . . . yl
searching the best i-1,j i1,j1
Forward
A
B
C
D
E
Backward
56ALSH on Unrooted Ordered Trees(Linear order)
y1 y2 . yj yj1 . . yl
searching the best i-1,j i1,j1
Forward
A
B
C
D
E
Backward
57Time Complexity
- Weighted assignment for each pair (u,v) can be
computed in
- And summing up the work for all pairs (u,v)
Observation 1
Observation 1
58Acknowledgments
- Seffi Naor
- Amihood Amir
- Gabriel Valiente
- Ydo Wexler
- Carmel Kent