A Principled Information Valuation for Communications During Multi-Agent Coordination - PowerPoint PPT Presentation

1 / 26
About This Presentation

A Principled Information Valuation for Communications During Multi-Agent Coordination


A Principled Information Valuation for Communications During Multi-Agent Coordination Simon A. Williamson, Enrico H. Gerding, Nicholas R. Jennings – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 27
Provided by: unir182


Transcript and Presenter's Notes

Title: A Principled Information Valuation for Communications During Multi-Agent Coordination

A Principled Information Valuation for
Communications During Multi-Agent Coordination
  • Simon A. Williamson, Enrico H. Gerding, Nicholas
    R. Jennings
  • School of Electronics Computer Science
  • University of Southampton

  • Introduction
  • Decentralised Decision Processes
  • Communication Valuations
  • Policy Generation
  • RoboCupRescue as a dec_POMDP_Valued_Com
  • Communication Policies
  • Unrestricted Communication
  • Restricted Communication
  • Future Work

  • Communication is a restricted resource
  • Team members must evaluate the value of a
  • Balanced against the cost of communication
  • Decision theoretic approach with information
    theory to value communications

RoboCup Rescue
  • Team of ambulance agents must rescue civilians
  • Uncertainties in location of civilians and their
    status - they may be trapped and require many
    ambulances to dig them out
  • Uncertainties in location of team mates and their
    observations and activities
  • Communication is used to coordinate activities
  • In some regions of the map, agents cannot

Decentralised Decision Processes
  • A decentralised extension to the standard POMDP
    formalisation with communication suggested in
    PT02, XLZ01 and ZG03
  • dec_POMDP_com (Decentralised POMDP with
    Communication) from ZG03
  • Utilise a communication substage - a solution
    defines an action and communication policy

dec_POMDP_com for 2 agents
Communication Valuations
  • Our model calls for an explicit valuation for
  • The exact value of a communication can be
    calculated using the decentralised POMDP model
  • Other work models team knowledge in a Bayes Net
    and uses that to generate a valuation
  • Can also use teamwork models such as STEAM to
    give a valuation based on the cost of
  • All of these approaches involve modelling the
    team and information propagation - leading to an
    explosion of state variables

  • Cannot always communicate in parallel so
    communication is an action like any other
  • There is an explicit reward function for
    communications which approximates the change in
    expected reward from communicating
  • Weighting between reward functions calculates
    this approximation
  • Avoids using the policy generation stage to
    calculate the exact value of communicating

dec_POMDP_Valued_com for 2 agents
Using Information Theory
  • We approximate this calculation using techniques
    from Information Theory
  • This approach follows from work in sensor
    networks where valuations are derived from Fisher
    Information and Kalman Filters
  • This works easily in sensor networks because the
    problem function is based on information theory -
    so individual agents can use it directly
  • In our problem, we are using information theory
    as an approximation of the cost of
    miss-coordination (having different information)
  • Because of this, the value of information must be
    normalised with respect to cost of
    miss-coordinating in the actual problem
  • Pros Efficient calculation and reduces
    complexity of policy generation
  • Cons Not easy - we describe an empirical
    validation of this technique

The dec_POMDP_Valued_com again
  • Use KL Divergence as it is efficient when
    combined with Bayesian updating in the POMDP
  • b1 is the belief state of the agent and bh is the
    observation history. N is a normalisation factor.
  • Gives the information content of the observation
    history with respect to the agents current

Policy Generation
  • RobocupRescue is a large problem, with real-time
    constraints on action selection
  • Using an online policy generation algorithm based
    on RTBSS (Real-Time Belief State Search)
  • A tree is constructed at each action selection

Policy Generation
  • We modify the algorithm to consider joint actions
  • If the agents have the same knowledge then they
    will calculate coordinated actions
  • A second modification leverages the dual reward
    function model for communication actions

RCR as dec_POMDP_Valued_com
  • State indicates whether each building contains
    trapped civilians or not, and the location of the
  • Actions are behaviours - movement, rescue,
    load/unload and communicate.
  • Reward for emptying buildings.
  • Observations are the local sensing capabilities
    of the ambulances.
  • Communication is the history of observation since
    the last communication action.

Communication Policies
  • Full communicates all the time when possible
    (with no cost).
  • Zero never communicate.
  • Selective communication is an action in the
    policy computation and has a valuation which
    increases with the time since the last
  • Valued - communication is an action in the policy
    computation, and a reward is given based on the
    communication sent.

Valued Communication
  • Comms valuation uses KL Divergence.
  • Problem reward uses a different function
  • Mixed with the RCR reward function
  • This allows us to experiment with the relative
    importance of communicating

Unrestricted Communication
  • Comparing the 4 communication policies
  • Full
  • Zero
  • Selective
  • Valued
  • Interested in the percentage of civilians saved
    over the course of the simulation and at the end
  • Averaged over 30 runs
  • No restrictions on communication
  • Agents can always communicate at any point on the

Results 1
  • Full does the best
  • Zero and Selective do similarly bad
  • Neither can complete the problem

Valued Communication Results
  • We compare performance at the end of the
  • Alpha is varied between 0 and 1
  • At 0 the agents only communicate and at 1 they
    never communicate

Restricted Communication
  • Comparing 3 communication policies
  • Full
  • Zero
  • Valued
  • Interested in the percentage of civilians saved
    over the course of the simulation and at the end
  • Averaged over 30 runs
  • Restrictions on communication
  • Areas of the rescue map are defined as Blackout
    regions where communication is not possible
  • 0, 25, 50, 75 and 99 blackouts

Results 2
  • Is only marginally affected by communication
    restrictions up to 75
  • Can do better than a naive policy which only
    communicates when possible
  • The shape change reflects the different value of
    information now it is more expensive
  • Biggest drop at 99 because of the much greater
    time to communicate

  • Information valuation is an efficient mechanism
    for valuing a communication resource
  • Can adapt to a restricted communication
    environment with only minimal drop in performance

Future Work
  • Generalise the information theoretic valuation
  • calculate/learn normalisation and mixture
  • investigate with other types of communication
    restrictions restricted bandwidth etc
  • expand to larger agent teams

  • PT02 David V. Pynadath and Milind Tambe.
    Multiagent Teamwork Analyzing the optimality and
    complexity of key theories and models. AAMAS 2002
  • XLZ01 Ping Xuan, Victor Lesser and Schlomo
    Zilberstein. Communication decisions in
    multiagent cooperation model and experiments.
    5th International Conference on Autonomous Agents
  • ZG03 Schlomo Zilberstein and Claudia V. Goldman.
    Optimizing information exchange in cooperative
    multiagent systems. AAMAS 2003

  • Any Questions?
Write a Comment
User Comments (0)
About PowerShow.com