Title: Uncertainty in AI
1Uncertainty in AI
- Outline
- Introduction
- Basic Probability Theory
- Probabilistic Reasoning
- Why should we use probability theory?
- Dutch Book Theorem
2Sources of Uncertainty
- Information is partial
- Information is not fully reliable.
- Representation language is inherently imprecise.
- Information comes from multiple sources and it is
conflicting. - Information is approximate
- Non-absolute cause-effect relationships exist
3Basic Probability
- Probability theory enables us to make rational
decisions. - Which mode of transportation is safer
- Car or Plane?
- What is the probability of an accident?
4Basic Probability Theory
- An experiment has a set of potential outcomes,
e.g., throw a dice - The sample space of an experiment is the set of
all possible outcomes, e.g., 1, 2, 3, 4, 5, 6 - An event is a subset of the sample space.
- 2
- 3, 6
- even 2, 4, 6
- odd 1, 3, 5
5Probability as Relative Frequency
- An event has a probability.
- Consider a long sequence of experiments. If we
look at the number of times a particular event
occurs in that sequence, and compare it to the
total number of experiments, we can compute a
ratio. - This ratio is one way of estimating the
probability of the event. - P(E) ( of times E occurred)/(total of trials)
6- Example
- 100 attempts are made to swim a length in 30
secs. The swimmer succeeds on 20 occasions
therefore the probability that a swimmer can
complete the length in 30 secs is - 20/100 0.2
- Failure 1-.2 or 0.8
- The experiments, the sample space and the events
must be defined clearly for probability to be
meaningful - What is the probability of an accident?
7Theoretical Probability
- Principle of IndifferenceAlternatives are always
to be judged equiprobable if we have no reason to
expect or prefer one over the other. - Each outcome in the sample space is assigned
equal probability. - Example throw a dice
- P(1)P(2) ... P(6)1/6
8Law of Large Numbers
- As the number of experiments increases the
relative frequency of an event more closely
approximates the theoretical probability of the
event. - if the theoretical assumptions hold.
- Buffons Needle for Computing p
- Draw parallel lines 1 inch apart on a plane
- Throw a 1-inch needle on the plane
- P( needle crossing a line )2/p
9Large Number Reveals Untruth in Assumptions
- Results of 1,000,000 throws of a die
- Number 1 2 3 4 5 6
- Fraction .155 .159 .164 .169 .174 .179
10Axioms of Probability Theory
- Suppose P(.) is a probability function, then
- 1. for any event E, 0P(E) 1.
- 2. P(S) 1, where S is the sample space.
- 3. for any two mutually exclusive events E1 and
E2, - P(E1 È E2) P(E1) P(E2)
- Any function that satisfies the above three
axioms is a probability function.
11Joint Probability
- Let A, B be two events, the joint probability of
both A and B being true is denoted by P(A, B). - Example
- P(spade) is the probability of the top card
being a spade. - P(king) is the probability of the top card being
a king. - P(spade, king) is the probability of the top
card being both a spade and a king, i.e., the
king of spade. - P(king, spade)P(spade, king) ???
12Properties of Probability
- 1. P(ØE) 1 P(E)
- 2. If E1 and E2 are logically equivalent, then
- P(E1)P(E2).
- E1 Not all philosophers are more than six feet
tall. - E2 Some philosopher is not more that six feet
tall. - Then P(E1)P(E2).
- 3. P(E1, E2)P(E1).
13Conditional Probability
- The probability of an event may change after
knowing another event. - The probability of A given B is denoted by
P(AB). - Example
- P( Wspace ) the probability of a randomly
selected word from an English text is space - P( Wspace Wouter) the probability of space
if the previous word is outer
14Example
- A the top card of a deck of poker cards is a
king of spade - P(A) 1/52
- However, if we know
- B the top card is a king
- then, the probability of A given B is true is
- P(AB) 1/4.
15How to Compute P(AB)?
B
A
16Business Students
- Of 100 students completing a course, 20 were
business major. Ten students received As in the
course, and three of these were business majors.,
suppose A is the event that a randomly selected
student got an A in the course, B is the event
that a randomly selected event is a business
major. What is the probability of A? What is the
probability of A after knowing B is true?
17Probabilistic Reasoning
- Evidence
- What we know about a situation.
- Hypothesis
- What we want to conclude.
- Compute
- P( Hypothesis Evidence )
18Credit Card Authorization
- E is the data about the applicant's age, job,
education, income, credit history, etc, - H is the hypothesis that the credit card will
provide positive return. - The decision of whether to issue the credit card
to the applicant is based on the probability
P(HE).
19Medical Diagnosis
- E is a set of symptoms, such as, coughing,
sneezing, headache, ... - H is a disorder, e.g., common cold, SARS, flu.
- The diagnosis problem is to find an H (disorder)
such that P(HE) is maximum.
20- Linda is 31 years old, single, outspoken, and
very bright. She majored in philosophy. As a
student, she was deeply concerned with issues of
discrimination and social justice, and also
participated in antinuclear demonstrations. - Please rank the following statements by their
probability, using 1 for the most probable and 8
for the least probable. - a. Linda is a teacher in elementary school.
- b. Linda works in a bookstore and takes yoga
classes. - c. Linda is active in the feminist movement.
- d. Linda is psychiatric social worker.
- e. Linda is a member of the League of Women
Voters. - f. Linda is a bank teller.
- g. Linda is an insurance salesperson.
- h. Linda is a bank teller and is active in the
feminist movement.
21Example
A patient takes a lab test and the result comes
back positive. The test has a false negative rate
of 2 and false positive rate of 3. Furthermore,
0.8 of the entire population have this
cancer. What is the probability of cancer if we
know the test result is positive?
22Bayes Theorem
- If P(E2)gt0, then P(E1E2)P(E2E1)P(E1)/P(E2)
- This can be derived from the definition of
conditional probability.
23The Three-Card Problem
- Three cards are in a hat. One is red on both
sides (the red-red card). One is white on both
sides (the white-white card). One is red on one
side and white on the other (the red-white card).
A single card is drawn randomly and tossed into
the air. - a. What is the probability that the red-red card
was drawn? (RR) - b. What is the probability that the drawn cards
lands with a white side up? (W-up) - c. What is the probability that the red-red card
was not drawn, assuming that the drawn card lands
with the a red side up. (not-RRR-up)
24Fair Bets
- A bet is fair to an individual I if, according to
the individual's probability assessment, the bet
will break even in the long run. - The following three bet are fair
- Bet (a) Win 4.20 if RR
- lose 2.10
- otherwise. since you believe P(RR)1/3
- Bet (b) Win 2.00 if W-up
- lose 2.00
- otherwise. since you believe P(W-up)1/2
- Bet (c) Win 4.00 if R-up and not-RR
- lose 4.00 if R-up and RR
- neither win nor lose if not-R-up.
- since you believe P(not-RRR-up)1/2
25Dutch Book
- The bets that you accepted have an interesting
property - No matter what card is drawn in the three-card
problem, and no matter how it lands, you are
guaranteed to lose money. - This is called a Dutch Book
26Verification
- there are three possible outcomes
- 1. Some card other than red-red is drawn, and it
lands with white side up. That is, W-up and
not-RR - 2. Some card other than red-red is drawn, and it
lands with a red side up. That is, R-up and
not-RR. - 3. The red-red card is drawn, and it lands (of
course) with a red side up. That is, R-up and RR. - 1 2 3
- a. 2.10 2.10 4.20
- b. 2.00 2.00 2.00
- c. 0.00 4.00 4.00
- total 0.10 0.10 1.80
27The Dutch Book Theorem
- Suppose that an individual I is willing to accept
any bet that is fair for I. Then a Dutch book can
be made against I if and only if I's assessment
of probability violates Bayesian axiomatization.
28Independence Intuition
- Events are independent if one has nothing
whatever to do with others. Therefore, for two
independent events, knowing one happening does
change the probability of the other event
happening. - one toss of coin is independent of another coin
(assuming it is a regular coin). - price of tea in England is independent of the
result of general election in Canada.
29Independent or Dependent?
- Getting cold and getting cat-allergy
- Mile Per Gallon and acceleration.
- Size of a persons vocabulary the persons shoe
size.
30Independence Definition
- Events A and B are independent iff
- P(A, B) P(A) x P(B)
- which is equivalent to
- P(AB) P(A) and
- P(BA) P(B)
- when P(A, B) gt0.
- T1 the first toss is a head.
- T2 the second toss is a tail.
- P(T2T1) P(T2)
31Conditional Independence
- Dependent events can become independent given
certain other events. - Example,
- Size of shoe
- Age
- Size of vocabulary
- Two events A, B are conditionally independent
given a third event C iff - P(AB, C) P(AC)
32Conditional IndependenceDefinition
- Let E1 and E2 be two events, they are
conditionally independent given E iff - P(E1E, E2)P(E1E),
- that is the probability of E1 is not changed
after knowing E2, given E is true. - Equivalent formulations
- P(E1, E2E)P(E1E) P(E2E)
- P(E2E, E1)P(E2E)
33Example Play Tennis?
Predict playing tennis when ltsunny, cool, high,
stronggt What probability should be used to make
the prediction? How to compute the probability?
34Probabilities of Individual Attributes
- Given the training set, we can compute the
probabilities
35Naïve Bayes Method
- Knowledge Base contains
- A set of hypotheses
- A set of evidences
- Probability of an evidence given a hypothesis
- Given
- A sub set of the evidences known to be present in
a situation - Find
- the hypothesis with the highest posterior
probability P(HE1, E2, , Ek). - The probability itself does not matter so much.
36Naïve Bayes Method
- Assumptions
- Hypotheses are exhaustive and mutually exclusive
- H1 v H2 v v Hk
- (Hi Hj) for any i?j
- Evidences are conditionally independent given a
hypothesis - P(E1, E2,, EkH) P(E1H)P(EkH)
- P(H E1, E2,, Ek)
- P(E1, E2,, Ek, H)/P(E1, E2,, Ek)
- P(E1, E2,, EkH)P(H)/P(E1, E2,, Ek)
37Naïve Bayes Method
- The goal is to find H that maximize P(HE1, E2,,
Ek) - Since
- P(HE1, E2,, Ek) P(E1, E2,, EkH)P(H)/P(E1,
E2,, Ek) - and P(E1, E2,, Ek) is the same for different
hypotheses, - Maximizing P(HE1, E2,, Ek) is equivalent to
maximizing P(E1, E2,, EkH)P(H)
P(E1H)P(EkH)P(H) - Naïve Bayes Method
- Find a hypothesis that maximizes
P(E1H)P(EkH)P(H)
38Example Play Tennis
- P( sunny, cool, high, strong) vs.
- P(- sunny, cool, high, strong)
- P(sunny)P(cool)P(high)P(strong)P() vs.
- P(sunny-)P(cool-)P(high-)P(strong-)P(-)
39Application Spam Detection
- Spam
- Dear sir, We want to transfer to overseas (
126,000.000.00 USD) One hundred and Twenty six
million United States Dollars) from a Bank in
Africa, I want to ask you to quietly look for a
reliable and honest person who will be capable
and fit to provide either an existing - Legitimate email
- Ham for lack of better name.
40- Hypotheses Spam, Ham
- Evidence a document
- The document is treated as a set (or bag) of
words - Knowledge
- P(Spam)
- The prior probability of an e-mail message being
a spam. - How to estimate this probability?
- P(wSpam)
- the probability that a word is w if we know w is
chosen from a spam. - How to estimate this probability?
41Limitations of Naïve Bayesian
- Cannot handle hypotheses of composite hypotheses
well - Suppose are independent of
each other - Consider a composite hypothesis
- How to compute the posterior probability
42 43- but this is a very unreasonable assumption
- Need a better representation and a better
assumption
E and B are independent But when A is given, they
are (adversely) dependent because they become
competitors to explain A P(BA, E) ltltP(BA) E
explains away of A
44- Cannot handle causal chaining
- Ex. A weather of the year
- B cotton production of the year
- C cotton price of next year
- Observed A influences C
- The influence is not direct (A -gt B -gt C)
- P(CB, A) P(CB) instantiation of B blocks
influence of A on C
45Summary
- Basics of Probability Theory
- Experiment, sample space, events
- Axioms and prosperities
- Joint Probability
- Conditional Probability
- Probabilistic Reasoning
- Bayes Theorem
- Dutch Book Theorem
- Independence and Conditional Independence
- Naïve Bayes Method