Title: A Decision-Theoretic Approach to Designing Proactive Communication in Multi-Agent Teamwork
1A Decision-Theoretic Approach to Designing
Proactive Communication in Multi-Agent Teamwork
- Thomas R. Ioerger, Yu Zhang,
- Richard Volz, John Yen (PSU-IST)
- Dept. of Computer Science
- Texas AM University
2Motivation
Team
?Agents share a large amount of knowledge
about the teamwork. ?Hard coded Interactions
among participants. ?High-frequency message
exchange. ?Communication risk.
Multi-Agent
Agent
3Challenging Issues in Designing Communication
Protocols
- Each agent has incomplete information from which
uncertainties arise. - Each agent has different problem solving
capabilities. - Data are decentralized and lack systems global
control. - Excessive/unrestricted communication leads to
lack of scalability
4Our Approach and Its Contributions
- Proactive Communication
- ?OBPC Reduction of communication load through
OBservations. - ?DIP Dynamic estimation of the probability
distribution of Information Production and need. - ?DTPC Decision-Theoretic determination of
communication strategies.
5Background
- ? CAST (Collab. Agents for Simulating Teamwork)
- MALLET (Multi-Agent Logic-based Language for
Encoding Teamwork)
(team-plan killwumpus(?w) (process (seq
(agent-bind ?ca (constraint (play-role ?ca
scout))) (DO ?ca (findwumpus ?w)))
(agent-bind ?fi (constraint ((play-role ?fi
fighter)
(closest-to-wumpus ?fi ?w)))) (DO ?fi
(movetowumpus ?w)) (DO ?fi (shootwumpus
?w)))))) (ioper shootwumpus (?w) (pre-cond
(wumpus ?w) (location ?w ?x ?y) (dead ?w false))
(effect (dead ?w true)))
6Overview
CAST
Team Structure Teamwork Procedure
KB
KB
KB
KB
KB
Optimal Communication Strategy
KB
7Agent Execution Cycle
Observe Sense
Predict Info. need and production
Execution Cycle
Act Effect
Decide Strategy
Communicate Information
8Syntax of Observability
ltobservabilitygt (CanSee ltviewinggt)
(BelieveCanSee ltbelievergtltviewinggt) lt
viewinggt ltobservergtltobservablegt
ltcondgt ltbelievergt ltagentgt ltobservergt
ltagentgt ltobservablegt
ltpropertygtltactiongt ltcondgt
(ltpropertygt) ltpropertygt
(ltproperty-namegt ltobjectgt ltargsgt) ltactiongt
(DO ltdoergt (ltoperator-namegt
ltargsgt)) ltobjectgt
ltagentgtltnon-agentgt ltdoergt
ltagentgt
9Example Observability Rules
- (CanSee ca (location ?o ?x ?y)
- (location ca ?xc ?yc) (location ?o ?x ?y)
(inradius ?x ?y ?xc ?yc rca) - ) //The carrier can see the location property
of an object. - (CanSee ca (DO ?fi (shootwumpus ?w))
- (play-role fighter ?fi) (location ca ?xc ?yc)
(location ?fi ?x ?y) (adjacent ?xc ?yc ?x ?y) - ) //The carrier can see the shootwumpus action
of a fighter. - (BelieveCanSee ca fi (location ?o ?x ?y)
- (location fi ?xi ?yi) (location ?o ?x ?y)
(inradius ?x ?y ?xi ?yi rfi) - ) //The carrier believes the fighter is able to
see the location property of an object. - (BelieveCanSee ca fi (DO ?f (shootwumpus ?w))
- (play-role fighter ?f) (? ?f fi) (location ca
?xc ?yc) (location fi ?xi ?yi) (location ?f ?x
?y) - (inradius ?xi ?yi ?xc ?yc rca) (inradius ?x
?y ?xc ?yc rca) (adjacent ?x ?y ?xi ?yi) - ) //The carrier believes the fighter is able to
see the shootwumpus action of another fighter.
10Proactive Communication Based on Observation
- ?ProactiveTell
- A provider reasons about what information it will
have. - A provider reasons about whether to deliver a
piece of information when having the information.
- ?ActiveAsk
- A needer reasons about what information it will
need. - A needer reasons about whether to ask for a piece
of information when needing the information.
11Evaluation
Multi-Agent Wumpus World
- ?20 wumpuses, 8 pits, and
- 20 piles of gold per world.
- ?1 carrier and 3 fighters compose a team.
- ?The team goal is to kill wumpuses and get the
gold without being killed. - ?5 randomly generated worlds with 2020 cells.
12Decision-Theoretic Proactive Communication
- Strategies
- Utility Function
- Cost Function
- Value Function
- Decision-Making
13Decision-Making on Situation PA
Situation PA Provider produces a new piece of
information
b-a Accept
e
1
a-b ProactiveTell
0
b-a Wait
e
a-b Silence
b-a Silence
2
e
e
b-a ActiveAsk
a provider b needer e end
14DM on Situation PB
Situation PB Provider receives a request for a
piece of information
a-b Reply
e
0
a-b WaitUntilNext
e
15DM on Situation NA
Situation NA Needer needs a piece of information
e
b-a Silence
a-b Reply
t
1
0
b-a ActiveAsk
a-b WaitUntilNext
e
a-b Silence
b-a Wait
e
0
t
a-b ProactiveTell
t transfer
16DM on Situation NB
Situation NB Needer receives a piece of
information
0
e
b-a Accept
t
17Utility Function
- ? Parameters in utility function
- I information about which communication occurs
- t time of decision-making
- t1 time at which I is needed
- t2 time at which the value for I used is
produced - SU situation at t
- S strategy available at SU
- M a set of messages involving in obtaining I
- E environment state at t
- U(I, t, t1, t2, SU, S, M, E)
- V(I, t, t1, t2, SU, S)C(M)
18Value Function
- V(I, t, t1, t2, SU, S)
- T(I, t, t1, t2, SU,
S)//Timeliness - R(I, t, t1, t2, SU,
S)//Relevance
19Timeliness Function
- ?Timeliness
- Whether agents use a value that can be produced
in time when they need I. - d(I, t, t1, t2, SU, S) max(0, t2t1)
- ft(d(I, t, t1, t2, SU, S))
- s.t. ft(x) lt ft(y) if y lt x
- T(I, t, t1, t2, SU, S) ft(d(I, t, t1, t2, SU,
S))
20Relevance Function
- ?Relevance
- Unprocessed, Most recent, Important
- P(I, t, t1, t2, SU, S)
- Pr(I ? t ? t1 ? t2 ? no other value for I was
produced between Intt1,t2 S ? SU) - frI(P(I, t, t1, t2, SU, S))
- s.t. frI(x) lt frI(y) if x lt y
- R(I, t, t1, t2, SU, S) frI(P(I, t, t1, t2,
SU, S))
21Cost Function
- 0 if Mi?
- C(Mi)
- k1 k2 len(Mi) otherwise
22Expected Utility
Time Strategy t1 t2
P.ProactiveTell
P.Silence T
P.Reply
P.WaitUntilNext
N.ActiveAsk if a Reply
if a WaitUnitlNext
N.Silence
N.Wait if a ProactiveTell
T if a Silence
N.Accept
23Strategies
Situation PA provider produces I
ProactiveTell? Silence?
Unfulfilled need
Next production
Unknown
t
Known
Current time
Last need aware of
Last sent
Last not sent
24Strategies
Situation PB provider receives a request for I
Reply? WaitUntilNext?
Next production
Unknown
t
Known
Current time
Last production
25Strategies
Situation NA needer needs I
ActiveAsk? Wait? Silence?
Next production
Most recent production
Unknown
t
Known
Current time
Last I received
26Strategies
Situation NB needer receives I
Accept
27Summary
- Advantages of Approach allows agents to make
intelligent choices of communication policy based
on - frequencies of needs, of sensing, of info.
change - costs of messages, plus penalities for delays in
action, or acting with incorrect information
28Criteria for Applicable Domains
- ?There are information needs among the team.
- ?Agents can communicate.
- ?There is uncertainty in the environment.
- Stochastic properties of teamwork process.
- Agents have incomplete/disjoint knowledge about
the world. - ?The team acts under critical time constraints,
so proactive assistance becomes important.