Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms - PowerPoint PPT Presentation

About This Presentation

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms


distributed program executed by a collection of processes ... each sent message m contains timestamp TS(m) update rules by pi at occurrence of ei: ... – PowerPoint PPT presentation

Number of Views:573
Avg rating:3.0/5.0
Slides: 68
Provided by: Fio93
Learn more at:


Transcript and Presenter's Notes

Title: Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms

Consistent Global States of Distributed Systems
Fundamental Concepts and Mechanisms
  • CS 249 Project
  • Fall 2005
  • Wing Wong

  • Introduction
  • Asynchronous distributed systems, distributed
    computations, consistency
  • Two different strategies to construct global
  • Monitor passively observes the system
  • Monitor actively interrogates the system
    (snapshot protocol)
  • Properties of global predicates
  • Sample applications deadlock detection and

  • global state union of local states of
    individual processes
  • many problems in distributed computing require
  • construction of a global state and
  • evaluation of whether the state satisfies some
    predicate F
  • difficulties
  • uncertainties in message delays
  • relative speeds of computations
  • global state obtained can be obsolete,
    incomplete, or inconsistent

Distributed Systems
  • collection of sequential processes p1, p2, , pn
  • unidirectional communication channels between
    pairs of processes
  • reliable channels
  • messages may be delivered out of order
  • network strongly connected (not necessarily

Asynchronous Distributed Systems
  • no bounds on relative process speeds
  • no bounds on message delays
  • no synchronized local clocks
  • communication is the only possible mechanism for

Distributed Computations
  • distributed program executed by a collection of
  • each process executes a sequence of events
  • communication through events send(m) and
    receive(m), m as message identifier

Distributed Computations
  • hi ei1ei2
  • local history of process pi
  • canonical enumeration
  • total order imposed by sequential execution
  • hik ei1ei2 eik
  • initial prefix of hi containing first k events
  • H h1 U U hn
  • global history containing all events
  • does not specify relative timing between events

Distributed Computations
  • to order events, define binary relation ? to
    capture cause-and-effect
  • e ? e if and only if e causally precedes e
  • concurrent events neither e ? e nor e ? e,
    write e e
  • distributed computation partially ordered set
    defined by (H, ?)

Distributed Computations
  • e21 ? e36 e22 e36

Global States, Cuts and Runs
  • sik
  • local state of process pi after event eik
  • S (s1, ,sn)
  • global state of distributed computation
  • n-tuple of local states
  • cut C h1c1 U U hncn or (c1, , cn)
  • subset of global history H

Global States, Cuts and Runs
  • (s1c1, ,sncn)
  • global state correspond to cut C
  • (e1c1, ,encn)
  • frontier of cut C
  • set of last events
  • run
  • a total ordering R including all events in global
  • consistent with each local history

Global States, Cuts and Runs
  • cut C (5,2,4) cut C (3,2,6)
  • a consistent run R e31e11e32e21e33e34e22e12e35e1

  • cut C is consistent if for all events e and e
  • closed under the causal precedence relation
  • consistent global state corresponds to a
    consistent cut
  • run R is consistent if for all events, e ? e
    implies e appears before e in R

  • run R e1e2 results in a sequence of global
    states S0S1S2
  • Si is obtained from Si-1 by some process
    executing event ei , or Si-1 leads to Si
  • denote the transitive closure of the leads-to
    relation by gtR
  • S is reachable from S in run R iff S gtR S

Lattice of Global States
  • lattice set of all consistent global states,
    along with leads-to relation
  • Sk1kn shorthand for global state (s1k1,,snkn)
  • k1 kn level of lattice

Lattice of Global States
  • path sequence of global states of increasing
    level (downwards)
  • each path corresponds to a consistent run
  • a possible pathS00 S01 S11 S21 S31 S32 S42 S43
    S44 S54 S64 S65

Observing Distributed Computations
  • processes notify monitor process p0 whenever they
    execute an event
  • monitor constructs observation as the sequence of
    events corresponding to the notification messages
  • problem
  • observation may be inconsistent due to
    variability in notification message delays

Observing Distributed Computations
Observing Distributed Computations
  • any permutation of run R is a possible
  • we need
  • delivery rule at monitor process to restore
    message order
  • we have First-In-First-Out (FIFO) delivery using
    sequence number for all source-destination pair
    pi, pj
  • sendi(m) ? sendi(m) gt deliverj(m) ?

Delivery Rule 1
  • assume
  • global real-time clock
  • message delays bound by d
  • process includes timestamp (real-time clock
    value) when notifying p0 of local event e
  • DR1 At time t, deliver all received messages
    with timestamps up to t d in increasing
    timestamp order

Delivery Rule 1
  • let RC(e) denotes value of global clock when e is
  • real-time clock satisfies Clock Condition
  • e ? e gt RC(e) lt RC(e)
  • but logical clocks also satisfies clock condition

Logical Clocks
  • event orderings based on increasing clock values
  • LC(ei) denotes value of logical clock when ei is
    executed by pi
  • each sent message m contains timestamp TS(m)
  • update rules by pi at occurrence of ei

Logical Clocks
Delivery Rule 2
  • replace real-time clock by logical clock
  • need gap-detection property
  • given events e, e where LC(e) lt LC(e),
    determine if some event e exists such that
    LC(e) lt LC(e) lt LC(e)
  • message is stable at p if no future messages
    with timestamps smaller than TS(m) can be
    received by p

Delivery Rule 2
  • with FIFO, when p0 receives m from pi with
    timestamp TS(m), can be certain no other message
    m from pi with TS(m) TS(m)
  • message m at p0 guaranteed stable when p0 has
    received at least one message from all other
    processes with timestamps gt TS(m)
  • DR2 Deliver all received messages that are
    stable at p0 in increasing timestamp order

Strong Clock Condition
  • DR1, DR2 assume RC(e) lt RC(e) (or LC(e) lt
    LC(e)) gt e ? e
  • recall RC and LC guarantee clock condition e ?
    e gt RC(e) lt RC(e)
  • DR1, DR2 can unnecessarily delay delivery
  • want timing mechanism TC that gives Strong Clock
  • e ? e TC(e) lt TC(e)

Timing Mechanism 1 - Causal Histories
  • causal history as clock value
  • set of all events that causally precede event e
  • smallest consistent cut that includes e
  • projection of ?(e) on process pi ?i(e) ?(e) n

Timing Mechanism 1 - Causal Histories

Timing Mechanism 1 - Causal Histories
  • To maintain causal histories
  • ? initially empty
  • if ei is an internal or send event
  • ?(ei) ei U ?(previous local event of pi)
  • if ei receive of message m by pi from pj
  • ?(ei) ei U ?(previous local event of pi) U
    ?(corresponding send event at pj)

Timing Mechanism 1 - Causal Histories
new event e15
new send event
new event e23
new receive event
Timing Mechanism 1 - Causal Histories
  • can interpret clock comparison as set inclusion
  • e ? e ?(e) ? ?(e)
  • (why not set membership, e ? e e ? ?(e)?)
  • unfortunately, causal histories grow too rapidly

Timing Mechanism 2 - Vector Clocks
  • note
  • projection ?i(e) hik for some unique k
  • eir ? ?i(e) for all r lt k
  • can use single number k to represent ?i(e)
  • ?(e) ?1(e) U U ?n(e)
  • represent entire causal history by n-dimensional
    vector clock VC(e), where for all 1 i n
  • VC(e)i k, if and only if ?i(e) hik

Timing Mechanism 2 - Vector Clocks
Timing Mechanism 2 - Vector Clocks
  • To maintain vector clock
  • each process pi initializes VC to contain all
  • update rules by pi at occurrence of ei
  • VC(ei)i number of events pi has executed up
    to and including ei
  • VC(ei)j number of events of pj that causally
    precede event ei of pi

Timing Mechanism 2 - Vector Clocks
causal histories
vector clocks

new send event
new receive event
Vector Clock Comparison
  • Define less than relation
  • V lt V (V ? V) ? (? 1 k n Vk Vk)

Properties of Vector Clocks
  • Strong Clock Condition
  • e ? e VC(e) lt VC(e)
  • Simple Strong Clock Condition given event ei of
    pi and event ej of pj, i ? j
  • ei ? ej VC(ei)i VC(ej)i

Properties of Vector Clocks
  • Test for Concurrency given event ei of pi and
    event ej of pj
  • ei ej (VC(ei)i gt VC(ej)i) ? (VC(ej)j gt
  • Pairwise Inconsistent given event ei of pi and
    ej of pj, i ? j
  • if ei , ej cannot belong to the frontier of the
    same consistent cut
  • (VC(ei)i lt VC(ej)i) ? (VC(ej)j lt VC(ei)j)

Properties of Vector Clocks
  • Consistent Cut
  • frontier contains no pairwise inconsistent events
  • VC(eici)i ? VC(ejcj)i , ?1 i, j n
  • Counting of events causally precede ei
  • (ei) (Sj1 .. n VC(ei)j) 1

events 413-1 7
Properties of Vector Clocks
  • Weak Gap-Detection given event ei of pi and ej
    of pj,
  • if VC(ei)k lt VC(ej)k for some k ? j, there
    exists event ek such that ?(ek ? ei) ? (ek ? ej)

Causal Delivery and Vector Clocks
  • assume processes increment local component of VC
    only for events notified to monitor p0
  • p0 maintains set M for messages received but not
    yet delivered
  • suppose we have
  • message m from pj
  • m last message delivered from process pk, k ? j

Causal Delivery and Vector Clocks
  • To deliver m, p0 must verify
  • no earlier message from pj is undelivered(i.e.
    TS(m)j 1 messages have been delivered from
  • no undelivered message m from pk
    s.t.sendk(m)?sendk(m)?sendj(m), ?k ? j (i.e.
    whether TS(m)k ? TS(m)k for all k)

Causal Delivery and Vector Clocks
  • p0 maintains array D1n where Di
    TS(mi)i, mi being last message delivered from
  • e.g. on right, delivery of m is delayed until m
    is received and delivered

Delivery Rule 3
  • Causal Delivery
  • for all messages m, m, sending processes pi, pj
    and destination process pk
  • sendi(m) ? sendj(m) gt deliverk(m) ?
  • DR3 (Causal Delivery) Deliver message m from
    process pj as soon as
  • Dj TS(m)j 1, and
  • Dk ? TS(m)k, ?k ? j
  • p0 set Dj to TS(m)j after delivery of m

Causal Delivery and Hidden Channels
  • should apply to closed systems
  • incorrect conclusion with hidden channels
    (communication channel external to the system)

Active Monitoring - Distributed Snapshots
  • monitor p0 requests states of other processes and
    combine into global state
  • assume channels implement FIFO delivery
  • channel state ?i,j for channel pi to pj messages
    sent by pi not yet received by pj

Distributed Snapshots
  • notationsINi set of processes having direct
    channels to piOUTi set of processes to which
    pi has a channel
  • for each execution of the snapshot protocol,
    process pi record its local state si and the
    states of its incoming channels (?j,i for all pj
    ? INi)

Distributed Snapshots
  • Snapshot Protocol (Chandy-Lamport)
  • p0 starts the protocol by sending itself a take
    snapshot message
  • when receiving the take snapshot message for
    the first time from process pf
  • pi records local state si and relays the take
    snapshot message along all outgoing channels
  • channel state ?f,i is set to empty
  • pi starts recording messages on other incoming

Distributed Snapshots
  • Snapshot Protocol (Chandy-Lamport)
  • when receiving the take snapshot message beyond
    the first time from process ps
  • pi stops recording messages along channel from ps
  • channel state ?s,i are messages that have been

Distributed Snapshots
p1 done
p2 done
  • dash arrows indicate take snapshot messages
  • constructed global state S23 ?1,2 empty ?2,1

Properties of Snapshots
  • Let Ss global state constructed Sa global
    state when protocol initiated Sf global state
    when protocol terminated
  • Ss is guaranteed to be consistent
  • actual run that the system followed may not pass
    through Ss
  • but ? a run R such that Sa gtR Ss gtR Sf

Properties of Snapshots
  • Sa S21
  • Sf S55
  • r does not pass through Ss ( S23)

Properties of Snapshots
  • but S21 gt S23 gt S55

Properties of Global Predicates
  • Now we have two methods for global predicate
  • monitor passively observing runs
  • monitor actively constructing snapshots
  • utility of either approach depends (in part) on
    properties of the predicate

Stable Predicates
  • communication delays gt Ss can only reflect some
    past state of the system
  • stable predicate once become true, remain true
  • e.g. deadlock, termination, loss of all tokens,
    unreachable storage
  • if F is stable, then (F is true in Ss) gt (F is
    true in Sf) and(F is false in Ss) gt (F is false
    in Sa)

Stable Predicates
  • deadlock detection through snapshots (p.29, 30)

Stable Predicates
  • deadlock detection using reactive protocol (p.31,

Nonstable Predicates
  • e.g. debugging, checking if queue lengths exceed
    some thresholds
  • Two problems
  • condition may not persist long enough for it to
    be true when the predicate is evaluated
  • if a predicate F is found true, do not know
    whether F ever held during the actual run

Nonstable Predicates
  • e.g. monitoring condition (x y)
  • 7 states where (x y) holds
  • but no longer hold after state S54
  • e.g. (y x) 2
  • condition hold only in S31 and S41
  • monitor might detect (y - x) 2 even if actual
    run never goes through S31 or S41

Nonstable Predicates
  • very little value to detect nonstable predicate

Nonstable Predicates
  • With observations, can extend predicates
  • Possibly(F) There exist a consistent observation
    O of the computation such that F holds in a
    global state of O
  • Definitely(F) For every consistent observation O
    of the computation, there exists a global state
    of O in which F holds
  • e.g. Possibly((y x) 2), Definitely(x y)

Nonstable Predicates
  • use of extended predicate in debuggingif F
    some erroneous state, then Possibly(F) indicates
    a bug, even if it is not observed during an
    actual run
  • if predicate F is stable, then Possibly(F)

Detecting Possibly and Definitely F
  • detection based on the lattice of consistent
    global states
  • If any global state in the lattice satisfies F,
    then Possibly(F) holds
  • Definitely(F) requires all possible runs to pass
    through a global state that satisfies F

Detecting Possibly and Definitely F
  • Possibly((y x) 2)
  • Definitely(y x)(why?)

Detecting Possibly and Definitely F
  • set of global state current with progressively
    increasing levels
  • any member of current satisfies F gt Possibly(F)

Detecting Possibly and Definitely F
  • iteratively construct set of global states of
    level l without passing through a state that
    satisfies F
  • set empty gt Definitely(F) true
  • set contains the final state gt ?Definitely(F)

  • many distributed system problems require
    recognizing certain global conditions
  • two approaches to constructing global states
  • reactive-architecture based
  • snapshot based
  • timing mechanism that captures causal precedence
  • applying to distributed deadlock detection and
  • solutions can be adapted to deal with nonstable
    predicates, multiple observations and failures
Write a Comment
User Comments (0)