1 / 58

Approximate Labeled Subtree Homeomorphism

- Ron Pinter
- Oleg Rokhlenko
- Dekel Tsur
- Michal Ziv-Ukelson

Subtree Homeomorphism

Pattern

Text

Subtree Homeomorphism

Pattern

Text

Subtree Homeomorphism

Pattern

Text

Subtree Homeomorphism

Pattern

Text

Subtree Homeomorphism

Pattern

Text

Best Homeomorphism

Pattern

Text

2 deletions

1 deletion

Approximate Labelled Subtree Homeomorphism (ALSH)

?i,j

T1

-1 -1

2 -2

-2 2

T2

LSH score 12

LSH score 5

Subtree Homeomorphism Complexity

m - of vertices in P n - of vertices in T

- Rooted trees nm1.5/logm Shamir and Tsur, 99
- Unrooted trees nm1.5/logm Shamir and Tsur, 99

Our Results

ALSH Rooted Unrooted

Unordered Trees

Ordered Trees

m - of vertices in P n - of vertices in T

Applications

- Analysis of metabolic pathways
- Semantic queries against semistructured databases

(represented as e.g. XML documents) - Natural language processing
- trees represent sentences
- nodes are labeled by words and sentential forms

Agenda

- ALSH for unordered trees
- rooted unordered
- unrooted unordered

- Improving time complexity by graph compression

techniques - rooted and unrooted unordered

- ALSH for ordered trees
- rooted ordered
- unrooted ordered

Related Work

- M.J. Chung. O(N2.5) time algorithms for the

subgraph homeomorphism problem on trees. 1987. - R. Shamir and D. Tsur. Faster subtree

homeomorphism. 1999. - G. Valiente. Constrained tree inclusion. 2003.
- P. Kilplelainer and H. Mannila. Ordered and

unordered tree inclusion. 1995. - M.A. Steel and T. Warnow. Kaikoura tree

theorems Computing the maximum agreement

subtree. 1993 - M. Farach and M. Thorup. Fast comparison of

evolutionary trees. 1995 - M.Y. Kao, T.W. Lam, W.K. Sung, and H.F. Ting.

Cavity matchings, label compressions, and

unrooted evolutionary trees. 2000

ALSH on Rooted Unordered Trees

x1 x2 u

y1 w11 w12 w1u

y2 w21 w22 w2u

y3 w31 w32 w3u

v

ALSH on Rooted Unordered Trees

x1 x2 u

y1 w11 w12 w1u

y2 w21 w22 w2u

y3 w31 w32 w3u

v

ALSH on Rooted Unordered Trees

P

T

Weighted Assignment on G

u

v

x1

x2

y1

y2

y3

y1

x1 x2 u

y1 w11 w12 w1u

y2 w21 w22 w2u

y3 w31 w32 w3u

v

x1

y2

x2

y3

Time Complexity of ALSH Rooted Unordered Trees

P

T

ui

vj

x1

xki

y1

ylj

- The algorithm computes an assignment for each

pair , where and

. - The assignment for ui and vj is computed using

the bipartite graph , where

. - Fredman Tarjan 87 show how to compute

AssignmentScore(G) in

.

Time Complexity of ALSH Rooted Unordered Trees

- Observation 1

and

Summing up all (ui,vj) node pairs

Observation 1

Observation 1

Under the similarity assumption weighted

assignment can be solved in O(V0.5Elog(VC))

Gabow and Tarjan, 89. And ALSH can be solved

in O(m1.5nlog(nC)) .

Unrooted Unordered Trees (naïve approach)

Pattern (P)

Text (T)

u

v

Weighted Assignment 2x4

Unrooted Unordered Trees (naïve approach)

Pattern (P)

Text (T)

u

v

Weighted Assignment 2x4

Unrooted Unordered Trees (naïve approach)

O(nm3nm2logn) !!!

Pattern (P)

Text (T)

u

v

Weighted Assignment 2x4

Unrooted Unordered Algorithm

Pattern (P)

Text (T)

y1

x1

y2

x2

y3

x3

y4

u

v

X

Y

Weighted Assignment 3x4

Unrooted Unordered Algorithm

Pattern (P)

Text (T)

y1

x1

y2

x2

y3

xq

x3

y4

u

v

X

Y

Unrooted Unordered Algorithm

Pattern (P)

Text (T)

y1

y1

x1

y2

y2

x2

y3

y3

xq

x3

y4

y4

u

v

X

Y

One augmentation !

Unrooted Unordered Algorithm

Pattern (P)

Text (T)

y1

x1

y2

x2

y3

x3

y4

u

v

X

Y

xq

Unrooted Unordered Algorithm

Pattern (P)

Text (T)

y1

x1

y2

x2

y3

x3

y4

u

v

X

Y

xq

One augmentation !

Unrooted Unordered Algorithm

Pattern (P)

Text (T)

y1

x1

y2

x2

y3

xq

x3

y4

u

v

X

Y

Unrooted Unordered Algorithm

Pattern (P)

Text (T)

y1

x1

y2

x2

y3

xq

x3

y4

u

v

X

Y

One augmentation !

Decremental Property of Weighted Assignment

- Lemma
- Let
- be a bipartite graph
- for

- Computing the weighted assignments for the

series of bipartite graphs , for

can be done in time

Compressed Graph

- Motivation
- Assuming a constant-sized label alphabet.
- Using the notion of clique partition of a

bipartite graph. - Feder and Motwani 1991, Shamir and Tsur 1999

Compressed Graph

y1

P

T

x1

u

v

y2

x1

x2

y1

y2

y3

x2

y3

- Each node in bipartite graph represents the whole

subtree. - Bounded alphabet the number of distinct

trees is bounded. - Lemma
- The number of distinct labeled rooted trees in a

forest of n vertices is

Compressed Graph

Graph G

X

2

7

3

2

4

4

4

5

7

Y

P

T

u

v

x1

x3

y1

y2

y4

x2

y3

X

Y

Compressed Graph

Graph G

Graph G

X

2

7

3

0

2

0

0

4

4

4

5

C

7

7

2

3

5

4

4

Y

P

T

u

v

x1

x3

y1

y2

y4

x2

y3

X

Y

Compressed Graph

- Lemma
- The assignment between and can be computed

in time - where
- d(u) is the number of neighbors of node u.
- D(u) is the number of distinct trees in the

forest of trees rooted at neighbors of u. - c(v) is the number of children of node v.

E V logV

Compressed Graph

- Observation 2
- The sum of vertex degrees in an unrooted tree P

is

- Summing up the work over all pairs we

get

Observation 1

Observation 2

Time Complexity

- Lemma
- (similar to Shamir and Tsur 1999)
- Thus the algorithm computes the optimal ALSH

solution for two rooted unordered trees in

ALSH on Rooted Ordered Trees

y1

x1

y2

x2

y3

P

T

u

v

x1

x2

y1

y2

y3

ALSH on Rooted Ordered Trees

y1

x1

y2

x2

y3

P

T

u

v

x1

x2

y1

y2

y3

ALSH on Rooted Ordered Trees

y1

The main property NO CROSS-EDGES in the

bipartite graph !!!

x1

y2

x2

y3

P

T

u

v

x1

x2

y1

y2

y3

ALSH on Rooted Ordered Trees

y1

The main property NO CROSS-EDGES in the

bipartite graph !!!

x1

y2

x2

y3

P

T

u

v

x1

x2

y1

y2

y3

ALSH on Rooted Ordered Trees

y1

x1

y2

x2

y3

ALSH on Rooted Ordered Trees

0

0

0

y1

x1

?(x1,y1)

-8

y2

ki1

x1

-8

?(x2,y2)

x2

y3

x2

y1

y2

y3

lj1

Observation 1

Observation 1

ALSH on Unrooted Ordered Trees(Cyclic order)

A

B

E

D

C

Cyclic String Comparison

ABCDE

P

n

m

Source

Destination Column ns Maxima

Cyclic String Comparison

A

B

E

C

D

T

T

P

P

P

P

Cyclic String Comparison

A

B

E

C

D

T

T

P

P

P

P

Cyclic String Comparison

A

B

E

C

D

T

T

P

P

P

P

Cyclic String Comparison

A

B

E

C

D

T

T

P

P

P

P

Cyclic String Comparison

A

B

E

C

D

T

T

P

P

P

P

Cyclic String Comparison

A

B

E

C

D

T

T

P

P

P

P

Time Complexity

A

Real numbers score metric Maez 1990

B

E

Rational numbers score metric Schmidt

1998

C

D

T

P

P

P

P

Time Complexity

A

Real numbers score metric Maez 1990

B

E

Rational numbers score metric Schmidt

1998

C

D

Observation 1

Observation 1

ALSH on Unrooted Ordered Trees(Linear order)

A

B

E

D

C

ALSH on Unrooted Ordered Trees(Linear order)

y1 y2 . . . . . yl

A

B

C

D

E

Dynamic Programming Matrix

FORWARD

BACKWARD

ALSH on Unrooted Ordered Trees(Linear order)

y1 y2 . . . . . yl

searching the best i-1,j i1,j1

Forward

A

B

C

D

E

Backward

ALSH on Unrooted Ordered Trees(Linear order)

y1 y2 . yj yj1 . . yl

searching the best i-1,j i1,j1

Forward

A

B

C

D

E

Backward

Time Complexity

- Weighted assignment for each pair (u,v) can be

computed in

- And summing up the work for all pairs (u,v)

Observation 1

Observation 1

Acknowledgments

- Seffi Naor
- Amihood Amir
- Gabriel Valiente
- Ydo Wexler
- Carmel Kent