1 / 45

Uncertainty in AI

- Outline
- Introduction
- Basic Probability Theory
- Probabilistic Reasoning
- Why should we use probability theory?
- Dutch Book Theorem

Sources of Uncertainty

- Information is partial
- Information is not fully reliable.
- Representation language is inherently imprecise.
- Information comes from multiple sources and it is

conflicting. - Information is approximate
- Non-absolute cause-effect relationships exist

Basic Probability

- Probability theory enables us to make rational

decisions. - Which mode of transportation is safer
- Car or Plane?
- What is the probability of an accident?

Basic Probability Theory

- An experiment has a set of potential outcomes,

e.g., throw a dice - The sample space of an experiment is the set of

all possible outcomes, e.g., 1, 2, 3, 4, 5, 6 - An event is a subset of the sample space.
- 2
- 3, 6
- even 2, 4, 6
- odd 1, 3, 5

Probability as Relative Frequency

- An event has a probability.
- Consider a long sequence of experiments. If we

look at the number of times a particular event

occurs in that sequence, and compare it to the

total number of experiments, we can compute a

ratio. - This ratio is one way of estimating the

probability of the event. - P(E) ( of times E occurred)/(total of trials)

- Example
- 100 attempts are made to swim a length in 30

secs. The swimmer succeeds on 20 occasions

therefore the probability that a swimmer can

complete the length in 30 secs is - 20/100 0.2
- Failure 1-.2 or 0.8
- The experiments, the sample space and the events

must be defined clearly for probability to be

meaningful - What is the probability of an accident?

Theoretical Probability

- Principle of IndifferenceAlternatives are always

to be judged equiprobable if we have no reason to

expect or prefer one over the other. - Each outcome in the sample space is assigned

equal probability. - Example throw a dice
- P(1)P(2) ... P(6)1/6

Law of Large Numbers

- As the number of experiments increases the

relative frequency of an event more closely

approximates the theoretical probability of the

event. - if the theoretical assumptions hold.
- Buffons Needle for Computing p
- Draw parallel lines 1 inch apart on a plane
- Throw a 1-inch needle on the plane
- P( needle crossing a line )2/p

Large Number Reveals Untruth in Assumptions

- Results of 1,000,000 throws of a die
- Number 1 2 3 4 5 6
- Fraction .155 .159 .164 .169 .174 .179

Axioms of Probability Theory

- Suppose P(.) is a probability function, then
- 1. for any event E, 0P(E) 1.
- 2. P(S) 1, where S is the sample space.
- 3. for any two mutually exclusive events E1 and

E2, - P(E1 È E2) P(E1) P(E2)
- Any function that satisfies the above three

axioms is a probability function.

Joint Probability

- Let A, B be two events, the joint probability of

both A and B being true is denoted by P(A, B). - Example
- P(spade) is the probability of the top card

being a spade. - P(king) is the probability of the top card being

a king. - P(spade, king) is the probability of the top

card being both a spade and a king, i.e., the

king of spade. - P(king, spade)P(spade, king) ???

Properties of Probability

- 1. P(ØE) 1 P(E)
- 2. If E1 and E2 are logically equivalent, then
- P(E1)P(E2).
- E1 Not all philosophers are more than six feet

tall. - E2 Some philosopher is not more that six feet

tall. - Then P(E1)P(E2).
- 3. P(E1, E2)P(E1).

Conditional Probability

- The probability of an event may change after

knowing another event. - The probability of A given B is denoted by

P(AB). - Example
- P( Wspace ) the probability of a randomly

selected word from an English text is space - P( Wspace Wouter) the probability of space

if the previous word is outer

Example

- A the top card of a deck of poker cards is a

king of spade - P(A) 1/52
- However, if we know
- B the top card is a king
- then, the probability of A given B is true is
- P(AB) 1/4.

How to Compute P(AB)?

B

A

Business Students

- Of 100 students completing a course, 20 were

business major. Ten students received As in the

course, and three of these were business majors.,

suppose A is the event that a randomly selected

student got an A in the course, B is the event

that a randomly selected event is a business

major. What is the probability of A? What is the

probability of A after knowing B is true?

Probabilistic Reasoning

- Evidence
- What we know about a situation.
- Hypothesis
- What we want to conclude.
- Compute
- P( Hypothesis Evidence )

Credit Card Authorization

- E is the data about the applicant's age, job,

education, income, credit history, etc, - H is the hypothesis that the credit card will

provide positive return. - The decision of whether to issue the credit card

to the applicant is based on the probability

P(HE).

Medical Diagnosis

- E is a set of symptoms, such as, coughing,

sneezing, headache, ... - H is a disorder, e.g., common cold, SARS, flu.
- The diagnosis problem is to find an H (disorder)

such that P(HE) is maximum.

- Linda is 31 years old, single, outspoken, and

very bright. She majored in philosophy. As a

student, she was deeply concerned with issues of

discrimination and social justice, and also

participated in antinuclear demonstrations. - Please rank the following statements by their

probability, using 1 for the most probable and 8

for the least probable. - a. Linda is a teacher in elementary school.
- b. Linda works in a bookstore and takes yoga

classes. - c. Linda is active in the feminist movement.
- d. Linda is psychiatric social worker.
- e. Linda is a member of the League of Women

Voters. - f. Linda is a bank teller.
- g. Linda is an insurance salesperson.
- h. Linda is a bank teller and is active in the

feminist movement.

Example

A patient takes a lab test and the result comes

back positive. The test has a false negative rate

of 2 and false positive rate of 3. Furthermore,

0.8 of the entire population have this

cancer. What is the probability of cancer if we

know the test result is positive?

Bayes Theorem

- If P(E2)gt0, then P(E1E2)P(E2E1)P(E1)/P(E2)
- This can be derived from the definition of

conditional probability.

The Three-Card Problem

- Three cards are in a hat. One is red on both

sides (the red-red card). One is white on both

sides (the white-white card). One is red on one

side and white on the other (the red-white card).

A single card is drawn randomly and tossed into

the air. - a. What is the probability that the red-red card

was drawn? (RR) - b. What is the probability that the drawn cards

lands with a white side up? (W-up) - c. What is the probability that the red-red card

was not drawn, assuming that the drawn card lands

with the a red side up. (not-RRR-up)

Fair Bets

- A bet is fair to an individual I if, according to

the individual's probability assessment, the bet

will break even in the long run. - The following three bet are fair
- Bet (a) Win 4.20 if RR
- lose 2.10
- otherwise. since you believe P(RR)1/3
- Bet (b) Win 2.00 if W-up
- lose 2.00
- otherwise. since you believe P(W-up)1/2
- Bet (c) Win 4.00 if R-up and not-RR
- lose 4.00 if R-up and RR
- neither win nor lose if not-R-up.
- since you believe P(not-RRR-up)1/2

Dutch Book

- The bets that you accepted have an interesting

property - No matter what card is drawn in the three-card

problem, and no matter how it lands, you are

guaranteed to lose money. - This is called a Dutch Book

Verification

- there are three possible outcomes
- 1. Some card other than red-red is drawn, and it

lands with white side up. That is, W-up and

not-RR - 2. Some card other than red-red is drawn, and it

lands with a red side up. That is, R-up and

not-RR. - 3. The red-red card is drawn, and it lands (of

course) with a red side up. That is, R-up and RR. - 1 2 3
- a. 2.10 2.10 4.20
- b. 2.00 2.00 2.00
- c. 0.00 4.00 4.00
- total 0.10 0.10 1.80

The Dutch Book Theorem

- Suppose that an individual I is willing to accept

any bet that is fair for I. Then a Dutch book can

be made against I if and only if I's assessment

of probability violates Bayesian axiomatization.

Independence Intuition

- Events are independent if one has nothing

whatever to do with others. Therefore, for two

independent events, knowing one happening does

change the probability of the other event

happening. - one toss of coin is independent of another coin

(assuming it is a regular coin). - price of tea in England is independent of the

result of general election in Canada.

Independent or Dependent?

- Getting cold and getting cat-allergy
- Mile Per Gallon and acceleration.
- Size of a persons vocabulary the persons shoe

size.

Independence Definition

- Events A and B are independent iff
- P(A, B) P(A) x P(B)
- which is equivalent to
- P(AB) P(A) and
- P(BA) P(B)
- when P(A, B) gt0.
- T1 the first toss is a head.
- T2 the second toss is a tail.
- P(T2T1) P(T2)

Conditional Independence

- Dependent events can become independent given

certain other events. - Example,
- Size of shoe
- Age
- Size of vocabulary
- Two events A, B are conditionally independent

given a third event C iff - P(AB, C) P(AC)

Conditional IndependenceDefinition

- Let E1 and E2 be two events, they are

conditionally independent given E iff - P(E1E, E2)P(E1E),
- that is the probability of E1 is not changed

after knowing E2, given E is true. - Equivalent formulations
- P(E1, E2E)P(E1E) P(E2E)
- P(E2E, E1)P(E2E)

Example Play Tennis?

Predict playing tennis when ltsunny, cool, high,

stronggt What probability should be used to make

the prediction? How to compute the probability?

Probabilities of Individual Attributes

- Given the training set, we can compute the

probabilities

Naïve Bayes Method

- Knowledge Base contains
- A set of hypotheses
- A set of evidences
- Probability of an evidence given a hypothesis
- Given
- A sub set of the evidences known to be present in

a situation - Find
- the hypothesis with the highest posterior

probability P(HE1, E2, , Ek). - The probability itself does not matter so much.

Naïve Bayes Method

- Assumptions
- Hypotheses are exhaustive and mutually exclusive
- H1 v H2 v v Hk
- (Hi Hj) for any i?j
- Evidences are conditionally independent given a

hypothesis - P(E1, E2,, EkH) P(E1H)P(EkH)
- P(H E1, E2,, Ek)
- P(E1, E2,, Ek, H)/P(E1, E2,, Ek)
- P(E1, E2,, EkH)P(H)/P(E1, E2,, Ek)

Naïve Bayes Method

- The goal is to find H that maximize P(HE1, E2,,

Ek) - Since
- P(HE1, E2,, Ek) P(E1, E2,, EkH)P(H)/P(E1,

E2,, Ek) - and P(E1, E2,, Ek) is the same for different

hypotheses, - Maximizing P(HE1, E2,, Ek) is equivalent to

maximizing P(E1, E2,, EkH)P(H)

P(E1H)P(EkH)P(H) - Naïve Bayes Method
- Find a hypothesis that maximizes

P(E1H)P(EkH)P(H)

Example Play Tennis

- P( sunny, cool, high, strong) vs.
- P(- sunny, cool, high, strong)
- P(sunny)P(cool)P(high)P(strong)P() vs.
- P(sunny-)P(cool-)P(high-)P(strong-)P(-)

Application Spam Detection

- Spam
- Dear sir, We want to transfer to overseas (

126,000.000.00 USD) One hundred and Twenty six

million United States Dollars) from a Bank in

Africa, I want to ask you to quietly look for a

reliable and honest person who will be capable

and fit to provide either an existing - Legitimate email
- Ham for lack of better name.

- Hypotheses Spam, Ham
- Evidence a document
- The document is treated as a set (or bag) of

words - Knowledge
- P(Spam)
- The prior probability of an e-mail message being

a spam. - How to estimate this probability?
- P(wSpam)
- the probability that a word is w if we know w is

chosen from a spam. - How to estimate this probability?

Limitations of Naïve Bayesian

- Cannot handle hypotheses of composite hypotheses

well - Suppose are independent of

each other - Consider a composite hypothesis
- How to compute the posterior probability

- Using the Bayes Theorem

- but this is a very unreasonable assumption
- Need a better representation and a better

assumption

E and B are independent But when A is given, they

are (adversely) dependent because they become

competitors to explain A P(BA, E) ltltP(BA) E

explains away of A

- Cannot handle causal chaining
- Ex. A weather of the year
- B cotton production of the year
- C cotton price of next year
- Observed A influences C
- The influence is not direct (A -gt B -gt C)
- P(CB, A) P(CB) instantiation of B blocks

influence of A on C

Summary

- Basics of Probability Theory
- Experiment, sample space, events
- Axioms and prosperities
- Joint Probability
- Conditional Probability
- Probabilistic Reasoning
- Bayes Theorem
- Dutch Book Theorem
- Independence and Conditional Independence
- Naïve Bayes Method