1 / 26

Incremental Maintenance of XML Structural Indexes

- Ke Yi1, Hao He1, Ioana Stanoi2 and Jun Yang1
- 1Department of Computer Science, Duke University
- 2IBM T. J. Watson Research Center

Motivation

- XML is gaining tremendously in popularity in

recent years - Used to represent many kinds of data
- Major DB vendors are rushing to incorporate

solutions for native XML repositories and

retrieval - IBM DB2, Oracle , Microsoft SQL Server
- Tamino, Natix, X-Hive,

Overview

paper

1

13

section

section

2

title

14

title

3

8

section

4

section

experiments

exp

intro

algorithm

15

16

exp

5

title

6

9

title

10

algorithm

7

proof

17

A(k)-index

11

18

1-index

about

proof

about

12

uses

Label Path Expressions

paper

/paper/section/algorithm

1

13

section

section

2

title

14

title

3

8

section

4

section

experiments

exp

intro

algorithm

15

16

exp

5

title

6

9

title

10

algorithm

7

proof

17

A(k)-index

11

18

1-index

about

proof

about

12

uses

Structural Indexes

- Why do we need them?
- Speedup the evaluation of path expressions
- Provides a structural summary of the data graph
- Structural indexes
- DataGuide Goldman Widom 97
- 1-index Milo Suciu 99
- A(k)-index Kaushik et al. 02, D(k)-index Qun

et al. 03,M(k)-index He Yang 04 - Integration of structural indexes and inverted

listsKaushik et al. 04 - Focus on maintenance
- Has a major effect on index efficiency
- Remains an overlooked issue

Outline

paper

1

13

section

section

2

title

14

title

3

8

section

4

section

experiments

exp

intro

algorithm

15

16

exp

5

title

6

9

title

10

algorithm

7

proof

17

A(k)-index

11

18

1-index

about

proof

about

12

uses

1-Index Definition

- Constructed by using bisimilarity
- Definition based on stability
- Partition data nodes into index nodes
- dnode (v) and inode (Iv)
- Iu is vs index parent if u is vs parent
- An inode is stable if all of its dnodes have the

same index parents - In a 1-index, all inodes are stable

Iu

u

Iv

v

1-Index Example

paper

paper

1

1

13

section

section

14

title

section

2

2,4,8,13

section

8

4

section

15

3

exp

exp

title

algorithm

exp

algorithm

16

title

15,16

10

3,5,9,14

6,10

6

9

algorithm

title

5

title

18

proof

about

11

17,18

proof

17

7

proof

7

about

11

about

uses

proof

12

12

/paper/section/algorithm

uses

data graph

1-index

1-Index Quality

paper

- Assigning dnodes that are bisimilar into

different inodes - does not affect correctness,
- but does affect efficiency
- The quality of an index

1

section

2,4,8,13

2,4

8,13

exp

title

algorithm

15,16

3,5,9,14

6,10

proof

11

17,18

inodes

7

- 1 X 100

about

proof

inodes in the minimum 1-index

12

uses

Ideal quality 0

Previous Results

- Construction
- The PT algorithm Paige Tarjan 87, in time O(m

log n) - m edges, n - nodes
- Edge changes
- The propagate algorithm Kaushik et al. 02
- Quality of the 1-index after update
- No guarantee on the quality of the resulted index
- 3 5 after 500 edge insertions in experiments
- Subgraph addition
- Index-reconstruction

Edge Insertion An Example (1)

R

R

R

A

B

A

B

A

B

C1

C2

C3

C1, C2

C3

C3

C1

C2

D1

D2

D3

D3

D1, D2

D3

D1, D2

Data Graph

1-Index

Split 1

Edge Insertion An Example (2)

R

R

R

A

B

A

B

A

B

C3

C1

C2

C2, C3

C1

C2, C3

C1

D3

D1

D2

D3

D1

D2

D2, D3

D1

Split 2

Merge 1

Merge 2

Indeed the minimum 1-index for the data graph

after update Not a coincidence!

Minimum Minimal Indexes

- Minimum with the smallest number of inodes
- Minimal no two inodes can be merged

R

R

R

A1

A2

A1

A2

A1,A2

B2

B1

B2

B1

B1,B2

Data graph Minimum 1-index

Minimal 1-index

Quality Guarantee

- Theorem The split/merge algorithm always

maintains a minimal 1-index - Lemma For acyclic data graphs, there is a unique

minimal 1-index - The minimum 1-index is always maintained
- For cyclic data graphs, there could be more than

one minimal 1-index - One of them is maintained

Outline

paper

1

13

section

section

2

title

14

title

3

8

section

4

section

experiments

exp

intro

algorithm

15

16

exp

5

title

6

9

title

10

algorithm

7

proof

17

A(k)-index

11

18

1-index

about

proof

about

12

uses

A(k)-Index Definition

- k-bisimilarity
- Definition based on stability
- A(0)-index partition by label
- A(k)-Index
- An inode in A(k)-index is stable if all of its

dnodes have the same index parents in

A(k-1)-index - Only interested in paths of length k
- Shown to be much smaller and more efficient than

1-index Kaushik et al. 02 - But, no efficient maintenance algorithms are

known!

A(k)-index Example

R

R

R

R

A

B

A

B

A

B

A

B

C3

C1

C2

C2,C3

C1

C2,C3

C1

C1,C2,C3 C4,C5,C6

C6

C4

C5

C4

C5,C6

C4,C5,C6

Data graph A(2) (1-index)

A(1) A(0)

Maintenance of A(i)-index requires the

information in A(i-1)-index

A(k)-index Refinement Tree

R

R

R

R

A

B

A

B

A

B

A

B

C3

C1

C2

C2,C3

C1

C2,C3

C1

C1,C2,C3 C4,C5,C6

C6

C4

C5

C4

C5,C6

C4,C5,C6

Data graph A(2) (1-index)

A(1) A(0)

A(k)-index Refinement Tree

R

R

R

R

A

B

A

B

A

B

A

B

C3

C1

C2

C

C

C

C

C

C6

C4

C5

C

C

C

Data graph A(2)

A(1) A(0)

- Reduce storage cost
- Reduce maintenance cost

0.5 13 additional storage

Quality Guarantee

- Theorem The split/merge algorithm always

maintains A(k)-index - Lemma There is a unique minimal A(k)-index for

any data graph, acyclic or cyclic

a minimal

the minimum

Outline

paper

1

13

section

section

2

title

14

title

3

8

section

4

section

experiments

exp

intro

algorithm

15

16

exp

5

title

6

9

title

10

algorithm

7

proof

17

A(k)-index

11

18

1-index

about

proof

about

12

uses

Experiments on Edge Changes

- Datasets
- Real-life IMDB (272,000 nodes)
- Benchmark XMark (198,000 nodes)
- Setup
- First delete a portion of existing ID-REF links
- Then do random mixed insertions/deletions
- Compare with
- 1-index propagate ( reconstruction)
- A(k)-index recompute affected portion (

reconstruction)

Experiment Results 1-index

Experiment Results A(k)-index

running times

Conclusions

- The first solutions for the maintenance (edge

subgraph additions/deletions) of 1-index and

A(k)-index that are both effective and efficient - Effective quality guarantee on the resulted

index - Efficient the algorithms themselves are fast
- Thank you!

Graphical Illustration

size

valid 1-index

merge

split

index

the index can only grow in size due to splitting,

if merging is not enforced