Title: Distributed Database System Part 2 : Integrating heterogeneous data source
1Distributed Database SystemPart 2 Integrating
heterogeneous data source
- 2001? 5? 10?
- ???
- ??????? ????
- ?????? ? ????? ???
2Contents
- Multi-DBMS
- TP Monitor solution
- Wrapper/Mediator solution
3Multi-DBMS
4Introduction
- Distributed DBMS
- Top-down approach
- ?? data source? ?? ???? ???? ?? ??? ???, data?
????? ?? - Multi-DBMS (Federated DBMS)
- Bottom-up approach
- ?? ???? heterogeneous? ?? data source?? ??
schema? ??, ?? interface? ??? ??? ?? ??? ??
5- A database system that resides on top of existing
local database systems(LDBs) - Provides a uniform environment in which the user
can access data from heterogeneous LDBs - Maintains a single global database schema that
integrates the schemas of the LDBs - When the global queries are issued,
- Decomposes and translates the global queries into
queries for processing by LDBSs. - Merges the results from them and generates the
final result. - Supports distributed transaction management over
LBDs
6Motivation
- Distributed Computing Environment
- Many Different DBMSs have been independently
developed and used - Different data schema
- Relevant data may be stored in distributed and
heterogeneous database - Access to heterogeneous data sources of
different DBMS can be required
7- A few different ways
- Converting and migrating all data from one DBMS
to another - The application in one DBMS may need to be
converted to run in another - Using gateways for specific pairs of DBMS
- Gateway approach does not support transaction
management
8History
- 1980?? ??
- Research prototype? ??? ??
- 1990?? ?
- ?? Multi-DBMS? ??? ?? (UniSQL/M)
- ???? Multi-DBMS? ?? ? ??
- ??
- UniSQL/M??? ??? ?? Multi-DBMS? ???? ??
- ??? vendor?? ??? ? ??? ???? model? ??? product?
?? - Middleware(TP Monitor, ODBC?), server-to-server
communication? ?? ???? ???? ??? ??
9Global Schema
- Schema used in Multi-DBMS
- Integration of the schemas exported from LDBs
- Schema conflicts
- Arise within the process of integrating different
local database schemas - Resolution techniques for schema conflict
- Renaming entities and attributes
- Homogenizing representations for different
expression, units, or levels of precision - Homogenizing attributes
- Type coercion, defining default values,
concatenation attributes, and projecting parts of
a composition hierarchy
10Global Query Processing
- Definition
- To access data that's on multivendor database
servers - Query Processing
- Multi-DBMS that supports a single common data
model and a single global query language on top
of different types of existing systems
11- Global Query Processing
- global query decomposition
- Normalization, analysis, simplification
- Relational calculus to relational algebra
transformation - Data localization
- Query modification
- local query translation
- Each subquery is translated to a query or queries
of the corresponding local database system - Send localized queries to a local database system
- results assembly
- The local results returned by the subquries are
combined into the global result
12Global query
Query decomposition and global optimization
Multi-DBMS Layer
SQ1 SQ2 SQ3 SQn
SQ Sub-Query LQ Local Query
Query translator1
Query translator2
Query translator3
Query translatorn
LQ1 LQ2 LQ3 LQn
Local DBMS Layer
DB1
DB2
DB3
DBn
13Global Transaction Management
- Definition
- To deal with the problems of always keeping
distributed databases in a consistent state even
when concurrent accesses and failures occur - Multidatabase Transaction Processing
- Global transaction manager(GTM)
- submits global transaction operations to the
local DBMSs through the local transaction agent - coordinate two phase commit
- A set of local transaction manager(LTM)
14Global Transactions
Ti Tj
GTM
GTM Global Transaction Manager LTM Local
Transaction Manager
LTM
LTM
ti1 tj1
ti2 tj2
Local Transactions
Local Transactions
DBMS
DBMS
15Multi-DBMS Example (UniSQL/M)
- Introduction
- Multi-DBMS for managing a heterogeneous
collection of relational and object-oriented
database systems - Use SQL/M which is the data definition and
manipulation language - Two component of global schema
- Proxy transformation schema ( export schema)
- Virtual Class conceptual schema ( global
schema) - Support UniSQL/X, Oracle, Sybase, Informix
16- History
- 1990? ?? ??? ??? ?? UniSQL? ??
- MCC?? ????? ????? DBMS ORION? ?? ?? ??
- ??????? SQL? ??
- Multidatabase Architecture
- 1997? 12?, ????????? ?? UniSQL???? ????? ??? ??
?? - 1999? 6?, UniSQL 4.0K ??
- 2001? ??, UniSQL 5.0 version3?? ????? ????? ????
???? ??
17Global DBMS
Vclasse
Vclasse
Vclasse
UniSQL/M
Proxy
Proxy
Proxy
Proxy
Proxy
Global Transaction Manager
Local DBMS
Local DBMS
Master
Master
Driver
Driver
Driver
Informix
Oracle
Sybase
18create vclass COMPANY ( name string,
total_sale integer, net_profit integer,
address string, country string, grade
integer) as select name, total_sale,
net_profit, address, NULL, grade from
DOMESTIC union all select name, total_sale,
net_profit, NULL, country, grade from
FOREIGN
virtual class
proxy
create proxy DOMESTIC on Oracle ( name
string, total_sale integer, net_profit
integer, address string, grade
integer) as select cname, gross_selling,net_gai
n_address,grade from DomesticCompany
create proxy FOREIGN on Sybase ( name
string, total_sale integer, net_profit
integer, country string, grade
integer) as select comp_name,gross_sale,pure_gain
,country, grade from ForeignCompany
Oracle
Sybase
DomesticCompany
ForeignCompany
gross_ selling
net_ gain
comp_ name
gross_ sale
pure_ gain
grade
address
grade
country
cname
19- ????
- Local database ??
- UniSQL/M? global database ??
- UniSQL/M? local database?? ???? proxy? ??
- ??? proxy? ???? vclass? ??
- Vclass? ???? ?? ???? ??
20TP Monitor Solution
21Transaction
- An ACID unit of work
- ACID Properties
- Atomicity
- Consistency
- Isolation
- Durability
- By Industrial Requirement
- Example application finance(banking, brokerage,
), insurance, healthcare, telecom,
reservations(hotel, car, rail, air), inventory
control, retail/distribution ... - In DBMS
- A collection of operations that performs a single
logical function in a database application
22Global Transaction
- Environment Shift
- Distributed information in network
- Downsizing
- Global(Distributed) Transaction Transaction run
on multiple sites - 2 Phase Commit Protocol
- Coordinate the action of distributed transaction
- Synchronize updates on multiple site so that they
either all fail of all succeed (Atomicity) - 2 Phase Locking Protocol
- Isolation
- Methods
- Transaction Processing Monitor
- Multi-DBMS
- ex) UNISQL ...
- DBMS vendors approach
23Two Phase Commit Protocol
c
c
Phase 1
p
p
p
p
p
p
Prepare-to-commit
Ready-to-commit
c
c
Phase 2
p
p
p
p
p
p
Commit or Abort
Complete or Abort
C coordinator P participant
24Two Type of Transaction Processing
- TP Heavy
- Using Transaction Processing Monitor
- Support transaction on single server or multiple
heterogeneous server - All resource, not just data-centric ones
- TP Lite
- Database-centric approach using stored
procedure, trigger... - Sybase integrate some of TP monitor functions
inside the DBMS engine in 1986 - Can not support global transaction control
25Transaction Processing Monitor
- Definition
- An operating system for transaction processing
- Transaction application programmer? transaction?
ACID ??? ?? ??? ? ??? ??? ?? ???????, ??? ??? ???
???? interface? procedure?? ???? - Multi-resource manager ???? distributed
transaction processing??? ???? middleware
client
Resource Manager
TP Monitor
client
Resource Manager
client
Resource Manager
client
26- ??
- Process Management
- Funneling work
- Monitoring transaction execution
- Load balancing
- (Distributed) Transaction Management
- Distributed transaction processing ??
- ACID
27TP Monitor Standard
- X/Opens DTP(Distributed Transaction Processing)
Reference Model - Define the components of a transaction-based
system and locate the interface between them - 4 Components
- Application Program
- Resource Manager
- Communication Resource Manager
- Transaction Manager
- 1993, Version1
- 1994, Version2
- ISO-TP(ISO0026)
- Define the transaction identifiers and the
two-phase commit protocol in a commit tree
28Application(AP)
XATMI, TxRPC, CPI-C
TX
Transaction Manager (TM)
Resource Manager (RM)
Communication Resource Manager (CRM)
XA
XA
Resource
communication
29AP
TM
RM
CRM
OSI-TP
Resource
AP
AP
TM
TM
RM
CRM
RM
CRM
OSI-TP
OSI-TP
Resource
Resource
30Product
- Close TP monitors
- IBM CICS
- The largest market share by far of any TP monitor
product - Since 1968, to improve the efficient of mainframe
operating system environment - CICS/VSE, CICS/MVS, CICS/400, CICF for AIX, CICS
for OS/2, CICS for Windows/NT - Ported to other UNIX platforms
- IBM IMS(Information Management System)
- Used by 40 of the worlds largest company
- Since late 1960s
- Online database and transaction processing at a
time when nearly all data processing was done in
batch - Todays IMS can interoperate with CICS and DB2
31- Open TP monitor
- Novell(BEA) Tuxedo
- Developed by ATT Bell Laboratory primary to
service telecommunication applications in 1984 - Purchased by Novell in 1996
- The most popular open TP monitor
- IBM(Transarc) Encina
- Portable TP Monitor
- Transarc was founded in 1984 by several
Carregie-Mellon Univ researchers - Purchased by IBM in 1994
- Microsoft Transaction Server
- Component-based TP monitor product
- Object-oriented
- Easy to use, develop, deploy and management
- In 1996, MS DTC(Microsoft Distributed Transaction
Coordinator) was embedded in SQL Server 6.5 and
Windows/NT
32- Transarc Encina
- Provide distributed transaction processing
toolkit to application developer - TRPC Transaction Remote Procedure Call
Service LOG Log Service - TRAN Distributed Transaction Service VOL
Volume Service - TM/XA Transaction Manager/XA Service
LOCK Lock Service - REC Recovery Service
-
-
Application Programming Interface
TRPC
TM/XA
REC
LOCK
TRAN
LOG
VOL
Base Development Environment
33Wrapper/Mediator Solution
34Introduction
- Addressed by Gio Wiederhold(Stanford Univ.) in
IEEE Computer March 1992 - Mediators in the Architecture of Future
Information Systems - Wrapper
- Heterogeneous data source? encapsulate?? ??? ????
?? layer? ???? ???? - Mediator
- ?? layer? ???? ?? ??? ???? ???, ?? data ??? ??
???? ??? ???? ???? - (A software module that exploits encoded
knowledge about certain sets or subsets of data
to create information for a higher layer of
applications) - Wrapper, Mediator? ??? ???? Architecture
Application
Mediator
wrapper
wrapper
wrapper
wrapper
Data Source
Data Source
Data Source
Data Source
35Mediator
Global DBMS
Vclasse
Vclasse
Vclasse
UniSQL/M
Proxy
Proxy
Proxy
Proxy
Proxy
Global Transaction Manager
Local DBMS
Local DBMS
Master
Master
Driver
Driver
Driver
Informix
Oracle
Sybase
36Motivation
- ??? ??? ???? ??? ?? ???? ??
- ex. Heterogeneous DBMS, Web page in WWW ?
- ??? ???? ???? ??? ??? ???? ?? ??
- ex. Multi-DBMS, Intelligent Agent ?
- ?? ?? ???? ???? ?? ???? ??? ???? access?? ?? ??
- ex. ODBC, JDBC ?
37Example Garlic Project
- IBM Almaden Research Center?? 1995? ?? ??
- Multimedia information system? ???? ?? ??
- Heterogeneous database system ? non-database
system? ???? ?? ??? ??? data? ?? - DB2 DataJoiner?? product? ???
- Heterogeneous Database Join
- Transparent Access to Heterogeneous Data Source
- Global Optimization
- Special Data Management/Access
38- Architecture
- Garlic
- Heterogeneous legacy data? ?? ??? ??? ????
middleware system - Data? ?? ???? ??? ?? ??? ??
- Repository Wrapper
- Heterogeneous repository? ??? data? type? ??? ???
???? Garlic? ?? - Garlic ?? protocol ? repository native protocol?
??
...
Client
Client
Client
Garlic (mediator)
Metadata
Query Processor
...
Wrapper
Wrapper
Wrapper
Wrapper
Relational DB
Object Oriented DB
Image Archive
Sound Archive
39 40- Issue
- Garlic(mediator) ??? ???? ??? modeling? ???
- Garlic query processor?? query planning? ?? ?
wrapper?? ???? ??? ??? ??? ??? - Garlic query processor?? query? ??? ? wrapper?
??? ??? ???
41Data Replication
42Introduction
- ?? ???? ??? data item? ?? node? ?? ??? ??
- ??
- Replication is often used in many distributed
system to provide a higher level of Performance,
Reliability and Availability - Highly available system
- Fault tolerance capabilities
- Improved response time
- Load sharing
??? data item?? ??? ??? ????? ???? access
algorithm ??
43Replica Control Protocol
- Algorithm that control access to replicated
data - Replica ???? ??? data item?? ? copy
- ????
- Data correctness
- ?? ?? ??
- Performance
- Availability ?? ??, ??? operation? ?? ??? ??
- Response Time Operation? ????? ??? ??? ?? ????
?? ?? - Cost
44- Correctness
- Single-copy serializability ? ????? ??
- Single-copy
- ?? system? ??? data item? ???? ??? ??
- Serializability
- ??? ???? ?? operation?? ??? ? operation? ???? ??
?? ?? ??? ??
45- Eager replica update protocol
- Update all the replicas of an item as part of a
single transaction - Ensure that executions are serializable
- Non-scalable
- Lazy replica update protocol
- Propagate updates to replicas through independent
transaction after the original transaction commit - Become popular with database vendors due to their
superior performance characteristics - If used indiscriminately, result in
non-serializable execution
46Primary Copy
- ??
- Primary copy replicated? data item??? ??? ?? ?
- Data? update? ? primary copy? update
- Primary copy? ??? node?? ?? replica?? ?? update?
??? - ??
- Read ??? data item? local node? ???, ??? ??
- Local node? ??? data item? ???, ?? node? read?
?? - Write Primary copy? write ??? ??
- Primary copy? ??? ?? node? data item? ?????
propagation
Propagate
Primary Copy
S0
S1
S2
S3
R/W
R
R
R
47- Design Idea
- ? data item? ?? access??? ?? ??? node? primary
copy? ?? - Local read? write???? data? ?? access ??? ?? ??
- ??
- Large system? ?? primary copy? bottle neck?? ????
- Replica? ??? ?? ?? node???? read ??
- ?? write ??
- Network partition?? ?? ??? primary copy? ??? ? ??
- Primary copy? ???? ?? node? fail?? ?? node? data
item? primary copy? ?? - Network partition? ? ??? primary copy? ???? data
consistency? ??
48Quorum Consensus
- ??
- Quorum group operation? ???? ??? permission?
??? ?? node?? group - Quorum set quorum group? ??
- Read, write? ? operation? quorum group? ?? ??
node???? permission? ???? ?? ?? - Version number? ???? ?? ??? write? replica? ???
- ????
- Read, write quorum group???? ?? ??? ??? node? ??
- Write??? ?? quorum group???? ?? ??? ??? node? ??
v 2 x 3
v 2 x 3
v 2 x 3
Quorum Sets R A,B, A,C , B,C , D W
A,B,D, A,C,D, B,C,D
v 1 x 4
v 1 x 4
v 1 x 4
v 1 x 4
A
B
C
D
Read
Write(x3)
49- Weighted Voting
- Quorum set? ???? ??? ??
- ??? node? vote?? ?? ??
- ??? operation? ?? ??? quorum? ??? vote? node????
???? ??? ?? - ??? quorum?? ??? system ?? ??? ???
- Read One Write All (ROWA)
- Read majority / Write majority
? 1) Read Quorum 1 Write
Quorum 5 ? 2) Read Quorum 3
Write Quorum 3
A
B
C
D
Vote 1
Vote 1
Vote 2
Vote 1
50- ??
- ??? data item? ? ??? ???? write? ???? data
consistency? ?? ? ? ?? - Network partition? ?? ??? ?? ??? ???? ?? ??
- ??
- Read operation? ??? ??? ?? overhead? ??
- ?? Node? fail?? ?? ?? system? available?? ?? ???
?? ? ??
51Available Copies
- ??
- ??? Read One Write All ??
- Read operation ??? available replica??? read
- Write operation ?? available replica? write
- Directory Oriented Available Copies Method
- Directory Data item? replica? ??? ?? node??
list - Include(xa) Node A? data item x? replica? ???
directory? ?? - exclude(xa) Node A? data item x? replica? ???
??? directory? ?? ?? - Read Operation
- Directory?? date item x ? ?? node? ??? read ? ??
- ???? directory??? ?? node? ??? ???
- Write Operation
- Directory?? data item x ? ?? ?? node? ??? write
request - ??? node???? write? ???? ??? node? directory??
???? write ???
52- ??
- ??? access? read operation? ??? ? ??
- ?? data availability? ????
- ??
- Network partition? ???? ?, data consistency? ????
???
x
y
x
y
A
B
C
D
x
y
x
y
A
B
C
D
Network partition
53Product
- ?? DBMS ?? replication ??? ??
- Oracle, Sybase, Informix
- Tamdem Machine
- Distributed file system
- Distributed operation system
54Conclusion
- Integrating heterogeneous data source
- Multi-DBMS
- TP Monitor solution
- Wrapper/Mediator solution
- Data Replication
55Reference
- Multi-DBMS
- Principles of Distributed Database Systems by
M.Tamer Ozsu and Patrick Valduriez, Prentice
Hall, 1999 - Distributed Database Lecture Note(KAIST CS Dept,
Stanford CS Dept.) - UniSQL Homepage (http//www.unisql.com)
- TP Monitor
- The Essential Client/Server Survival Guide by
Robert Orfali, Dan Harkey, and Jeri Edwards, John
Wiley Sons, Inc., 1996 - Philip A. Bernstein, Transaction Processing
Monitors, Communication of the ACM, November
1990, p75 - 86 - C. Mohan, Transaction Processing and Distributed
Computing in the Internet Age, TPDC Talk,
Presentation TP, 1998 - Distributed Transaction Processing Reference
Model, Version 3, The Open Group, 1996 - Distributed Transaction Processing The TX
Specification, The Open Group, 1991 - Distributed Transaction Processing The XA
Specification, The Open Group, 1995 - Principles of Transaction Processing for the
System Professional by Philip A. Bernstein and
Eric Newcomer, Morgan Kaufmann, 1996
56- Wrapper/Mediator
- Gio Wiederhold, Mediators in the Architecture of
Future Information Systems, IEEE Computer, March
1992, p38 - 49 - Mary Tork Roth, and Peter Schwarz, Dont Scrap
It, Wrap It! A Wrapper Architecture for Legacy
Data Source, Proceeding of the 23rd VLDB
Conference Athens, Greece, 1997 - Mary Tork Roth, Manish Arya, Laura M. Haas,
Michael J. Carey, William F. Cody, Ronald Fagin,
Peter M. Schwarz, Joachim Thomas II and Edward L.
Wimmers, The Garlic Project, SIGMOD Conf. 1996,
p557 - Data Replication
- Distributed Operating System by Andrew S.
Tanenbaum, Prentice Hall, 1995 - Replicated Data Management in Distributed
Systems, Readings in Distributed Computing
Systems (T. L. Casavant and M. Singhal, Editors),
IEEE Computer Society Press, Los Alamitos, C.A.,
1994, pages 572--591.
57????1 Middleware
- Middleware? ?? CTR
- client? server?? ?? server? server??? high-level
communication? ????, API? ??? software layer - ??
- ??? ??? ???? ???
- ?? ????? middleware? porting?? ?? API? ???? ??
- ?? ??? ????? ??
- ?? ?? ??????? ?? ??? interface? ??
- ??
- Message oriented middleware(MOM), Remote
Procedure Call(RPC), - Object Request Broker(ORB), Distributed
Computing Environment(DCE), - Online Transaction Processing Monitor(OLTP
Monitor), Database Connectivity Middleware
Client
Server
Server
Client
middleware
Network communication services
58????2 DBMS Gateway
- Some database vendors extend two-phase commit to
multiple databases - Oracles Open Gateway
- Manage the two-phase commit across heterogeneous
XA-compliant database - Gateway between oracle server and other database
server
client
DBMS A
Oracle Open Gateway
Oracle Server
client
DBMS B
Oracle Open Gateway
client
Client view of oracle server
59????3 Application Server
- Oracles Application Server
- ???? ???? ??? ????(Network Computing
Architecture)? ?? ????? web server, object
request broker, TP monitor?? middleware ??? ?? - Informix support gateway to access distributed
database - IBMs DB2 DataJoiner can access and join tables
located across multiple data sources
Oracle Server
X/Open XA
WEB client
Oracle Application Server
...
DBMS A
CORBA client
DBMS B
Application logic
...