Incremental Maintenance of XML Structural Indexes - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Incremental Maintenance of XML Structural Indexes

Description:

Incremental Maintenance of XML Structural Indexes. Ke Yi1, Hao He1, ... 1-index [Milo & Suciu 99] A(k)-index [Kaushik et al. 02], D(k)-index [Qun et al. 03] ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 27
Provided by: keyi
Category:

less

Transcript and Presenter's Notes

Title: Incremental Maintenance of XML Structural Indexes


1
Incremental Maintenance of XML Structural Indexes
  • Ke Yi1, Hao He1, Ioana Stanoi2 and Jun Yang1
  • 1Department of Computer Science, Duke University
  • 2IBM T. J. Watson Research Center

2
Motivation
  • XML is gaining tremendously in popularity in
    recent years
  • Used to represent many kinds of data
  • Major DB vendors are rushing to incorporate
    solutions for native XML repositories and
    retrieval
  • IBM DB2, Oracle , Microsoft SQL Server
  • Tamino, Natix, X-Hive,

3
Overview
paper
1
13
section
section
2
title
14
title
3
8
section
4
section
experiments
exp
intro
algorithm
15
16
exp
5
title
6
9
title
10
algorithm
7
proof
17
A(k)-index
11
18
1-index
about
proof
about
12
uses
4
Label Path Expressions
paper
/paper/section/algorithm
1
13
section
section
2
title
14
title
3
8
section
4
section
experiments
exp
intro
algorithm
15
16
exp
5
title
6
9
title
10
algorithm
7
proof
17
A(k)-index
11
18
1-index
about
proof
about
12
uses
5
Structural Indexes
  • Why do we need them?
  • Speedup the evaluation of path expressions
  • Provides a structural summary of the data graph
  • Structural indexes
  • DataGuide Goldman Widom 97
  • 1-index Milo Suciu 99
  • A(k)-index Kaushik et al. 02, D(k)-index Qun
    et al. 03,M(k)-index He Yang 04
  • Integration of structural indexes and inverted
    listsKaushik et al. 04
  • Focus on maintenance
  • Has a major effect on index efficiency
  • Remains an overlooked issue

6
Outline
paper
1
13
section
section
2
title
14
title
3
8
section
4
section
experiments
exp
intro
algorithm
15
16
exp
5
title
6
9
title
10
algorithm
7
proof
17
A(k)-index
11
18
1-index
about
proof
about
12
uses
7
1-Index Definition
  • Constructed by using bisimilarity
  • Definition based on stability
  • Partition data nodes into index nodes
  • dnode (v) and inode (Iv)
  • Iu is vs index parent if u is vs parent
  • An inode is stable if all of its dnodes have the
    same index parents
  • In a 1-index, all inodes are stable

Iu
u
Iv
v
8
1-Index Example
paper
paper
1
1
13
section
section
14
title
section
2
2,4,8,13
section
8
4
section
15
3
exp
exp
title
algorithm
exp
algorithm
16
title
15,16
10
3,5,9,14
6,10
6
9
algorithm
title
5
title
18
proof
about
11
17,18
proof
17
7
proof
7
about
11
about
uses
proof
12
12
/paper/section/algorithm
uses
data graph
1-index
9
1-Index Quality
paper
  • Assigning dnodes that are bisimilar into
    different inodes
  • does not affect correctness,
  • but does affect efficiency
  • The quality of an index

1
section
2,4,8,13
2,4
8,13
exp
title
algorithm
15,16
3,5,9,14
6,10
proof
11
17,18
inodes
7
- 1 X 100
about
proof
inodes in the minimum 1-index
12
uses
Ideal quality 0
10
Previous Results
  • Construction
  • The PT algorithm Paige Tarjan 87, in time O(m
    log n)
  • m edges, n - nodes
  • Edge changes
  • The propagate algorithm Kaushik et al. 02
  • Quality of the 1-index after update
  • No guarantee on the quality of the resulted index
  • 3 5 after 500 edge insertions in experiments
  • Subgraph addition
  • Index-reconstruction

11
Edge Insertion An Example (1)
R
R
R
A
B
A
B
A
B
C1
C2
C3
C1, C2
C3
C3
C1
C2
D1
D2
D3
D3
D1, D2
D3
D1, D2
Data Graph
1-Index
Split 1
12
Edge Insertion An Example (2)
R
R
R
A
B
A
B
A
B
C3
C1
C2
C2, C3
C1
C2, C3
C1
D3
D1
D2
D3
D1
D2
D2, D3
D1
Split 2
Merge 1
Merge 2
Indeed the minimum 1-index for the data graph
after update Not a coincidence!
13
Minimum Minimal Indexes
  • Minimum with the smallest number of inodes
  • Minimal no two inodes can be merged

R
R
R
A1
A2
A1
A2
A1,A2
B2
B1
B2
B1
B1,B2
Data graph Minimum 1-index
Minimal 1-index
14
Quality Guarantee
  • Theorem The split/merge algorithm always
    maintains a minimal 1-index
  • Lemma For acyclic data graphs, there is a unique
    minimal 1-index
  • The minimum 1-index is always maintained
  • For cyclic data graphs, there could be more than
    one minimal 1-index
  • One of them is maintained

15
Outline
paper
1
13
section
section
2
title
14
title
3
8
section
4
section
experiments
exp
intro
algorithm
15
16
exp
5
title
6
9
title
10
algorithm
7
proof
17
A(k)-index
11
18
1-index
about
proof
about
12
uses
16
A(k)-Index Definition
  • k-bisimilarity
  • Definition based on stability
  • A(0)-index partition by label
  • A(k)-Index
  • An inode in A(k)-index is stable if all of its
    dnodes have the same index parents in
    A(k-1)-index
  • Only interested in paths of length k
  • Shown to be much smaller and more efficient than
    1-index Kaushik et al. 02
  • But, no efficient maintenance algorithms are
    known!

17
A(k)-index Example
R
R
R
R
A
B
A
B
A
B
A
B
C3
C1
C2
C2,C3
C1
C2,C3
C1
C1,C2,C3 C4,C5,C6
C6
C4
C5
C4
C5,C6
C4,C5,C6
Data graph A(2) (1-index)
A(1) A(0)
Maintenance of A(i)-index requires the
information in A(i-1)-index
18
A(k)-index Refinement Tree
R
R
R
R
A
B
A
B
A
B
A
B
C3
C1
C2
C2,C3
C1
C2,C3
C1
C1,C2,C3 C4,C5,C6
C6
C4
C5
C4
C5,C6
C4,C5,C6
Data graph A(2) (1-index)
A(1) A(0)
19
A(k)-index Refinement Tree
R
R
R
R
A
B
A
B
A
B
A
B
C3
C1
C2
C
C
C
C
C
C6
C4
C5
C
C
C
Data graph A(2)
A(1) A(0)
  • Reduce storage cost
  • Reduce maintenance cost

0.5 13 additional storage
20
Quality Guarantee
  • Theorem The split/merge algorithm always
    maintains A(k)-index
  • Lemma There is a unique minimal A(k)-index for
    any data graph, acyclic or cyclic

a minimal
the minimum
21
Outline
paper
1
13
section
section
2
title
14
title
3
8
section
4
section
experiments
exp
intro
algorithm
15
16
exp
5
title
6
9
title
10
algorithm
7
proof
17
A(k)-index
11
18
1-index
about
proof
about
12
uses
22
Experiments on Edge Changes
  • Datasets
  • Real-life IMDB (272,000 nodes)
  • Benchmark XMark (198,000 nodes)
  • Setup
  • First delete a portion of existing ID-REF links
  • Then do random mixed insertions/deletions
  • Compare with
  • 1-index propagate ( reconstruction)
  • A(k)-index recompute affected portion (
    reconstruction)

23
Experiment Results 1-index
24
Experiment Results A(k)-index
running times
25
Conclusions
  • The first solutions for the maintenance (edge
    subgraph additions/deletions) of 1-index and
    A(k)-index that are both effective and efficient
  • Effective quality guarantee on the resulted
    index
  • Efficient the algorithms themselves are fast
  • Thank you!

26
Graphical Illustration
size
valid 1-index
merge
split
index
the index can only grow in size due to splitting,
if merging is not enforced
Write a Comment
User Comments (0)
About PowerShow.com