Distributed Database System Part 2 : Integrating heterogeneous data source - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Distributed Database System Part 2 : Integrating heterogeneous data source

Description:

Distributed Database System Part 2 : Integrating heterogeneous data source 2001 5 10 ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 60
Provided by: ackr
Category:

less

Transcript and Presenter's Notes

Title: Distributed Database System Part 2 : Integrating heterogeneous data source


1
Distributed Database SystemPart 2 Integrating
heterogeneous data source
  • 2001? 5? 10?
  • ???
  • ??????? ????
  • ?????? ? ????? ???

2
Contents
  • Multi-DBMS
  • TP Monitor solution
  • Wrapper/Mediator solution
  • Data Replication

3
Multi-DBMS
4
Introduction
  • Distributed DBMS
  • Top-down approach
  • ?? data source? ?? ???? ???? ?? ??? ???, data?
    ????? ??
  • Multi-DBMS (Federated DBMS)
  • Bottom-up approach
  • ?? ???? heterogeneous? ?? data source?? ??
    schema? ??, ?? interface? ??? ??? ?? ??? ??

5
  • A database system that resides on top of existing
    local database systems(LDBs)
  • Provides a uniform environment in which the user
    can access data from heterogeneous LDBs
  • Maintains a single global database schema that
    integrates the schemas of the LDBs
  • When the global queries are issued,
  • Decomposes and translates the global queries into
    queries for processing by LDBSs.
  • Merges the results from them and generates the
    final result.
  • Supports distributed transaction management over
    LBDs

6
Motivation
  • Distributed Computing Environment
  • Many Different DBMSs have been independently
    developed and used
  • Different data schema
  • Relevant data may be stored in distributed and
    heterogeneous database
  • Access to heterogeneous data sources of
    different DBMS can be required

7
  • A few different ways
  • Converting and migrating all data from one DBMS
    to another
  • The application in one DBMS may need to be
    converted to run in another
  • Using gateways for specific pairs of DBMS
  • Gateway approach does not support transaction
    management

8
History
  • 1980?? ??
  • Research prototype? ??? ??
  • 1990?? ?
  • ?? Multi-DBMS? ??? ?? (UniSQL/M)
  • ???? Multi-DBMS? ?? ? ??
  • ??
  • UniSQL/M??? ??? ?? Multi-DBMS? ???? ??
  • ??? vendor?? ??? ? ??? ???? model? ??? product?
    ??
  • Middleware(TP Monitor, ODBC?), server-to-server
    communication? ?? ???? ???? ??? ??

9
Global Schema
  • Schema used in Multi-DBMS
  • Integration of the schemas exported from LDBs
  • Schema conflicts
  • Arise within the process of integrating different
    local database schemas
  • Resolution techniques for schema conflict
  • Renaming entities and attributes
  • Homogenizing representations for different
    expression, units, or levels of precision
  • Homogenizing attributes
  • Type coercion, defining default values,
    concatenation attributes, and projecting parts of
    a composition hierarchy

10
Global Query Processing
  • Definition
  • To access data that's on multivendor database
    servers
  • Query Processing
  • Multi-DBMS that supports a single common data
    model and a single global query language on top
    of different types of existing systems

11
  • Global Query Processing
  • global query decomposition
  • Normalization, analysis, simplification
  • Relational calculus to relational algebra
    transformation
  • Data localization
  • Query modification
  • local query translation
  • Each subquery is translated to a query or queries
    of the corresponding local database system
  • Send localized queries to a local database system
  • results assembly
  • The local results returned by the subquries are
    combined into the global result

12
Global query
Query decomposition and global optimization
Multi-DBMS Layer
SQ1 SQ2 SQ3 SQn
SQ Sub-Query LQ Local Query
Query translator1
Query translator2
Query translator3
Query translatorn
LQ1 LQ2 LQ3 LQn
Local DBMS Layer
DB1
DB2
DB3
DBn
13
Global Transaction Management
  • Definition
  • To deal with the problems of always keeping
    distributed databases in a consistent state even
    when concurrent accesses and failures occur
  • Multidatabase Transaction Processing
  • Global transaction manager(GTM)
  • submits global transaction operations to the
    local DBMSs through the local transaction agent
  • coordinate two phase commit
  • A set of local transaction manager(LTM)

14
Global Transactions
Ti Tj
GTM
GTM Global Transaction Manager LTM Local
Transaction Manager
LTM
LTM
ti1 tj1
ti2 tj2
Local Transactions
Local Transactions
DBMS
DBMS
15
Multi-DBMS Example (UniSQL/M)
  • Introduction
  • Multi-DBMS for managing a heterogeneous
    collection of relational and object-oriented
    database systems
  • Use SQL/M which is the data definition and
    manipulation language
  • Two component of global schema
  • Proxy transformation schema ( export schema)
  • Virtual Class conceptual schema ( global
    schema)
  • Support UniSQL/X, Oracle, Sybase, Informix

16
  • History
  • 1990? ?? ??? ??? ?? UniSQL? ??
  • MCC?? ????? ????? DBMS ORION? ?? ?? ??
  • ??????? SQL? ??
  • Multidatabase Architecture
  • 1997? 12?, ????????? ?? UniSQL???? ????? ??? ??
    ??
  • 1999? 6?, UniSQL 4.0K ??
  • 2001? ??, UniSQL 5.0 version3?? ????? ????? ????
    ???? ??

17
  • Architecture

Global DBMS
Vclasse
Vclasse
Vclasse
UniSQL/M
Proxy
Proxy
Proxy
Proxy
Proxy
Global Transaction Manager
Local DBMS
Local DBMS
Master
Master
Driver
Driver
Driver
Informix
Oracle
Sybase
18
create vclass COMPANY ( name string,
total_sale integer, net_profit integer,
address string, country string, grade
integer) as select name, total_sale,
net_profit, address, NULL, grade from
DOMESTIC union all select name, total_sale,
net_profit, NULL, country, grade from
FOREIGN
virtual class
proxy
create proxy DOMESTIC on Oracle ( name
string, total_sale integer, net_profit
integer, address string, grade
integer) as select cname, gross_selling,net_gai
n_address,grade from DomesticCompany
create proxy FOREIGN on Sybase ( name
string, total_sale integer, net_profit
integer, country string, grade
integer) as select comp_name,gross_sale,pure_gain
,country, grade from ForeignCompany
Oracle
Sybase
DomesticCompany
ForeignCompany
gross_ selling
net_ gain
comp_ name
gross_ sale
pure_ gain
grade
address
grade
country
cname
19
  • ????
  • Local database ??
  • UniSQL/M? global database ??
  • UniSQL/M? local database?? ???? proxy? ??
  • ??? proxy? ???? vclass? ??
  • Vclass? ???? ?? ???? ??

20
TP Monitor Solution
21
Transaction
  • An ACID unit of work
  • ACID Properties
  • Atomicity
  • Consistency
  • Isolation
  • Durability
  • By Industrial Requirement
  • Example application finance(banking, brokerage,
    ), insurance, healthcare, telecom,
    reservations(hotel, car, rail, air), inventory
    control, retail/distribution ...
  • In DBMS
  • A collection of operations that performs a single
    logical function in a database application

22
Global Transaction
  • Environment Shift
  • Distributed information in network
  • Downsizing
  • Global(Distributed) Transaction Transaction run
    on multiple sites
  • 2 Phase Commit Protocol
  • Coordinate the action of distributed transaction
  • Synchronize updates on multiple site so that they
    either all fail of all succeed (Atomicity)
  • 2 Phase Locking Protocol
  • Isolation
  • Methods
  • Transaction Processing Monitor
  • Multi-DBMS
  • ex) UNISQL ...
  • DBMS vendors approach

23
Two Phase Commit Protocol
c
c
Phase 1
p
p
p
p
p
p
Prepare-to-commit
Ready-to-commit
c
c
Phase 2
p
p
p
p
p
p
Commit or Abort
Complete or Abort
C coordinator P participant
24
Two Type of Transaction Processing
  • TP Heavy
  • Using Transaction Processing Monitor
  • Support transaction on single server or multiple
    heterogeneous server
  • All resource, not just data-centric ones
  • TP Lite
  • Database-centric approach using stored
    procedure, trigger...
  • Sybase integrate some of TP monitor functions
    inside the DBMS engine in 1986
  • Can not support global transaction control

25
Transaction Processing Monitor
  • Definition
  • An operating system for transaction processing
  • Transaction application programmer? transaction?
    ACID ??? ?? ??? ? ??? ??? ?? ???????, ??? ??? ???
    ???? interface? procedure?? ????
  • Multi-resource manager ???? distributed
    transaction processing??? ???? middleware

client
Resource Manager
TP Monitor
client
Resource Manager
client
Resource Manager
client
26
  • ??
  • Process Management
  • Funneling work
  • Monitoring transaction execution
  • Load balancing
  • (Distributed) Transaction Management
  • Distributed transaction processing ??
  • ACID

27
TP Monitor Standard
  • X/Opens DTP(Distributed Transaction Processing)
    Reference Model
  • Define the components of a transaction-based
    system and locate the interface between them
  • 4 Components
  • Application Program
  • Resource Manager
  • Communication Resource Manager
  • Transaction Manager
  • 1993, Version1
  • 1994, Version2
  • ISO-TP(ISO0026)
  • Define the transaction identifiers and the
    two-phase commit protocol in a commit tree

28
  • 1 TM Environment

Application(AP)
XATMI, TxRPC, CPI-C
TX
Transaction Manager (TM)
Resource Manager (RM)
Communication Resource Manager (CRM)
XA
XA
Resource
communication
29
  • n TMs Environment

AP
TM
RM
CRM
OSI-TP
Resource
AP
AP
TM
TM
RM
CRM
RM
CRM
OSI-TP
OSI-TP
Resource
Resource
30
Product
  • Close TP monitors
  • IBM CICS
  • The largest market share by far of any TP monitor
    product
  • Since 1968, to improve the efficient of mainframe
    operating system environment
  • CICS/VSE, CICS/MVS, CICS/400, CICF for AIX, CICS
    for OS/2, CICS for Windows/NT
  • Ported to other UNIX platforms
  • IBM IMS(Information Management System)
  • Used by 40 of the worlds largest company
  • Since late 1960s
  • Online database and transaction processing at a
    time when nearly all data processing was done in
    batch
  • Todays IMS can interoperate with CICS and DB2

31
  • Open TP monitor
  • Novell(BEA) Tuxedo
  • Developed by ATT Bell Laboratory primary to
    service telecommunication applications in 1984
  • Purchased by Novell in 1996
  • The most popular open TP monitor
  • IBM(Transarc) Encina
  • Portable TP Monitor
  • Transarc was founded in 1984 by several
    Carregie-Mellon Univ researchers
  • Purchased by IBM in 1994
  • Microsoft Transaction Server
  • Component-based TP monitor product
  • Object-oriented
  • Easy to use, develop, deploy and management
  • In 1996, MS DTC(Microsoft Distributed Transaction
    Coordinator) was embedded in SQL Server 6.5 and
    Windows/NT

32
  • Transarc Encina
  • Provide distributed transaction processing
    toolkit to application developer
  • TRPC Transaction Remote Procedure Call
    Service LOG Log Service
  • TRAN Distributed Transaction Service VOL
    Volume Service
  • TM/XA Transaction Manager/XA Service
    LOCK Lock Service
  • REC Recovery Service

Application Programming Interface
TRPC
TM/XA
REC
LOCK
TRAN
LOG
VOL
Base Development Environment
33
Wrapper/Mediator Solution
34
Introduction
  • Addressed by Gio Wiederhold(Stanford Univ.) in
    IEEE Computer March 1992
  • Mediators in the Architecture of Future
    Information Systems
  • Wrapper
  • Heterogeneous data source? encapsulate?? ??? ????
    ?? layer? ???? ????
  • Mediator
  • ?? layer? ???? ?? ??? ???? ???, ?? data ??? ??
    ???? ??? ???? ????
  • (A software module that exploits encoded
    knowledge about certain sets or subsets of data
    to create information for a higher layer of
    applications)
  • Wrapper, Mediator? ??? ???? Architecture

Application
Mediator
wrapper
wrapper
wrapper
wrapper
Data Source
Data Source
Data Source
Data Source
35
  • Example UniSQL/M

Mediator
Global DBMS
Vclasse
Vclasse
Vclasse
UniSQL/M
Proxy
Proxy
Proxy
Proxy
Proxy
Global Transaction Manager
Local DBMS
Local DBMS
Master
Master
Driver
Driver
Driver
Informix
Oracle
Sybase
36
Motivation
  • ??? ??? ???? ??? ?? ???? ??
  • ex. Heterogeneous DBMS, Web page in WWW ?
  • ??? ???? ???? ??? ??? ???? ?? ??
  • ex. Multi-DBMS, Intelligent Agent ?
  • ?? ?? ???? ???? ?? ???? ??? ???? access?? ?? ??
  • ex. ODBC, JDBC ?

37
Example Garlic Project
  • IBM Almaden Research Center?? 1995? ?? ??
  • Multimedia information system? ???? ?? ??
  • Heterogeneous database system ? non-database
    system? ???? ?? ??? ??? data? ??
  • DB2 DataJoiner?? product? ???
  • Heterogeneous Database Join
  • Transparent Access to Heterogeneous Data Source
  • Global Optimization
  • Special Data Management/Access

38
  • Architecture
  • Garlic
  • Heterogeneous legacy data? ?? ??? ??? ????
    middleware system
  • Data? ?? ???? ??? ?? ??? ??
  • Repository Wrapper
  • Heterogeneous repository? ??? data? type? ??? ???
    ???? Garlic? ??
  • Garlic ?? protocol ? repository native protocol?
    ??

...
Client
Client
Client
Garlic (mediator)
Metadata
Query Processor
...
Wrapper
Wrapper
Wrapper
Wrapper
Relational DB
Object Oriented DB
Image Archive
Sound Archive
39
  • IBM DB2 Joiner

40
  • Issue
  • Garlic(mediator) ??? ???? ??? modeling? ???
  • Garlic query processor?? query planning? ?? ?
    wrapper?? ???? ??? ??? ??? ???
  • Garlic query processor?? query? ??? ? wrapper?
    ??? ??? ???

41
Data Replication
42
Introduction
  • ?? ???? ??? data item? ?? node? ?? ??? ??
  • ??
  • Replication is often used in many distributed
    system to provide a higher level of Performance,
    Reliability and Availability
  • Highly available system
  • Fault tolerance capabilities
  • Improved response time
  • Load sharing

??? data item?? ??? ??? ????? ???? access
algorithm ??
43
Replica Control Protocol
  • Algorithm that control access to replicated
    data
  • Replica ???? ??? data item?? ? copy
  • ????
  • Data correctness
  • ?? ?? ??
  • Performance
  • Availability ?? ??, ??? operation? ?? ??? ??
  • Response Time Operation? ????? ??? ??? ?? ????
    ?? ??
  • Cost

44
  • Correctness
  • Single-copy serializability ? ????? ??
  • Single-copy
  • ?? system? ??? data item? ???? ??? ??
  • Serializability
  • ??? ???? ?? operation?? ??? ? operation? ???? ??
    ?? ?? ??? ??

45
  • Eager replica update protocol
  • Update all the replicas of an item as part of a
    single transaction
  • Ensure that executions are serializable
  • Non-scalable
  • Lazy replica update protocol
  • Propagate updates to replicas through independent
    transaction after the original transaction commit
  • Become popular with database vendors due to their
    superior performance characteristics
  • If used indiscriminately, result in
    non-serializable execution

46
Primary Copy
  • ??
  • Primary copy replicated? data item??? ??? ?? ?
  • Data? update? ? primary copy? update
  • Primary copy? ??? node?? ?? replica?? ?? update?
    ???
  • ??
  • Read ??? data item? local node? ???, ??? ??
  • Local node? ??? data item? ???, ?? node? read?
    ??
  • Write Primary copy? write ??? ??
  • Primary copy? ??? ?? node? data item? ?????
    propagation

Propagate
Primary Copy
S0
S1
S2
S3
R/W
R
R
R
47
  • Design Idea
  • ? data item? ?? access??? ?? ??? node? primary
    copy? ??
  • Local read? write???? data? ?? access ??? ?? ??
  • ??
  • Large system? ?? primary copy? bottle neck?? ????
  • Replica? ??? ?? ?? node???? read ??
  • ?? write ??
  • Network partition?? ?? ??? primary copy? ??? ? ??
  • Primary copy? ???? ?? node? fail?? ?? node? data
    item? primary copy? ??
  • Network partition? ? ??? primary copy? ???? data
    consistency? ??

48
Quorum Consensus
  • ??
  • Quorum group operation? ???? ??? permission?
    ??? ?? node?? group
  • Quorum set quorum group? ??
  • Read, write? ? operation? quorum group? ?? ??
    node???? permission? ???? ?? ??
  • Version number? ???? ?? ??? write? replica? ???
  • ????
  • Read, write quorum group???? ?? ??? ??? node? ??
  • Write??? ?? quorum group???? ?? ??? ??? node? ??

v 2 x 3
v 2 x 3
v 2 x 3
Quorum Sets R A,B, A,C , B,C , D W
A,B,D, A,C,D, B,C,D
v 1 x 4
v 1 x 4
v 1 x 4
v 1 x 4
A
B
C
D
Read
Write(x3)
49
  • Weighted Voting
  • Quorum set? ???? ??? ??
  • ??? node? vote?? ?? ??
  • ??? operation? ?? ??? quorum? ??? vote? node????
    ???? ??? ??
  • ??? quorum?? ??? system ?? ??? ???
  • Read One Write All (ROWA)
  • Read majority / Write majority

? 1) Read Quorum 1 Write
Quorum 5 ? 2) Read Quorum 3
Write Quorum 3
A
B
C
D
Vote 1
Vote 1
Vote 2
Vote 1
50
  • ??
  • ??? data item? ? ??? ???? write? ???? data
    consistency? ?? ? ? ??
  • Network partition? ?? ??? ?? ??? ???? ?? ??
  • ??
  • Read operation? ??? ??? ?? overhead? ??
  • ?? Node? fail?? ?? ?? system? available?? ?? ???
    ?? ? ??

51
Available Copies
  • ??
  • ??? Read One Write All ??
  • Read operation ??? available replica??? read
  • Write operation ?? available replica? write
  • Directory Oriented Available Copies Method
  • Directory Data item? replica? ??? ?? node??
    list
  • Include(xa) Node A? data item x? replica? ???
    directory? ??
  • exclude(xa) Node A? data item x? replica? ???
    ??? directory? ?? ??
  • Read Operation
  • Directory?? date item x ? ?? node? ??? read ? ??
  • ???? directory??? ?? node? ??? ???
  • Write Operation
  • Directory?? data item x ? ?? ?? node? ??? write
    request
  • ??? node???? write? ???? ??? node? directory??
    ???? write ???

52
  • ??
  • ??? access? read operation? ??? ? ??
  • ?? data availability? ????
  • ??
  • Network partition? ???? ?, data consistency? ????
    ???

x
y
x
y
A
B
C
D
x
y
x
y
A
B
C
D
Network partition
53
Product
  • ?? DBMS ?? replication ??? ??
  • Oracle, Sybase, Informix
  • Tamdem Machine
  • Distributed file system
  • Distributed operation system

54
Conclusion
  • Integrating heterogeneous data source
  • Multi-DBMS
  • TP Monitor solution
  • Wrapper/Mediator solution
  • Data Replication

55
Reference
  • Multi-DBMS
  • Principles of Distributed Database Systems by
    M.Tamer Ozsu and Patrick Valduriez, Prentice
    Hall, 1999
  • Distributed Database Lecture Note(KAIST CS Dept,
    Stanford CS Dept.)
  • UniSQL Homepage (http//www.unisql.com)
  • TP Monitor
  • The Essential Client/Server Survival Guide by
    Robert Orfali, Dan Harkey, and Jeri Edwards, John
    Wiley Sons, Inc., 1996
  • Philip A. Bernstein, Transaction Processing
    Monitors, Communication of the ACM, November
    1990, p75 - 86
  • C. Mohan, Transaction Processing and Distributed
    Computing in the Internet Age, TPDC Talk,
    Presentation TP, 1998
  • Distributed Transaction Processing Reference
    Model, Version 3, The Open Group, 1996
  • Distributed Transaction Processing The TX
    Specification, The Open Group, 1991
  • Distributed Transaction Processing The XA
    Specification, The Open Group, 1995
  • Principles of Transaction Processing for the
    System Professional by Philip A. Bernstein and
    Eric Newcomer, Morgan Kaufmann, 1996

56
  • Wrapper/Mediator
  • Gio Wiederhold, Mediators in the Architecture of
    Future Information Systems, IEEE Computer, March
    1992, p38 - 49
  • Mary Tork Roth, and Peter Schwarz, Dont Scrap
    It, Wrap It! A Wrapper Architecture for Legacy
    Data Source, Proceeding of the 23rd VLDB
    Conference Athens, Greece, 1997
  • Mary Tork Roth, Manish Arya, Laura M. Haas,
    Michael J. Carey, William F. Cody, Ronald Fagin,
    Peter M. Schwarz, Joachim Thomas II and Edward L.
    Wimmers, The Garlic Project, SIGMOD Conf. 1996,
    p557
  • Data Replication
  • Distributed Operating System by Andrew S.
    Tanenbaum, Prentice Hall, 1995
  • Replicated Data Management in Distributed
    Systems, Readings in Distributed Computing
    Systems (T. L. Casavant and M. Singhal, Editors),
    IEEE Computer Society Press, Los Alamitos, C.A.,
    1994, pages 572--591.

57
????1 Middleware
  • Middleware? ?? CTR
  • client? server?? ?? server? server??? high-level
    communication? ????, API? ??? software layer
  • ??
  • ??? ??? ???? ???
  • ?? ????? middleware? porting?? ?? API? ???? ??
  • ?? ??? ????? ??
  • ?? ?? ??????? ?? ??? interface? ??
  • ??
  • Message oriented middleware(MOM), Remote
    Procedure Call(RPC),
  • Object Request Broker(ORB), Distributed
    Computing Environment(DCE),
  • Online Transaction Processing Monitor(OLTP
    Monitor), Database Connectivity Middleware

Client
Server
Server
Client
middleware
Network communication services
58
????2 DBMS Gateway
  • Some database vendors extend two-phase commit to
    multiple databases
  • Oracles Open Gateway
  • Manage the two-phase commit across heterogeneous
    XA-compliant database
  • Gateway between oracle server and other database
    server

client
DBMS A
Oracle Open Gateway
Oracle Server
client
DBMS B
Oracle Open Gateway
client
Client view of oracle server
59
????3 Application Server
  • Oracles Application Server
  • ???? ???? ??? ????(Network Computing
    Architecture)? ?? ????? web server, object
    request broker, TP monitor?? middleware ??? ??
  • Informix support gateway to access distributed
    database
  • IBMs DB2 DataJoiner can access and join tables
    located across multiple data sources

Oracle Server
X/Open XA
WEB client
Oracle Application Server
...
DBMS A
CORBA client
DBMS B
Application logic
...
Write a Comment
User Comments (0)
About PowerShow.com