CS 70 - Lecture 20 - Mar 7, 2011 - 10 Evans

Goal for today (Note 12): Conditional Probability

Here is an example that we would like to understand:

  A pharmaceutical company is marketing a new, inexpensive test for a certain
  medical condition. According to clinical trials, the test has
  the following properties:
    1. When applied to an affected person, the test comes up 
       positive in 90% of cases, and negative in 10% ("False negatives")
    2. When applied to a healthy person, the test comes up
       negative in 80% of cases and positive in 20% ("False positives")
  Suppose that 5% of the US population has the condition. 
  In other words, a random person has a 5% chance of being affected.
  When a random person is tested and comes up positive, what is the
  probability that the person actually has the condition?

This is an example of conditional probability: what is the 
probability of event A (person is affected) given that we know
event B occurs (the person tests positive). We write this P(A|B),
the probability of A given B.

Def: P(A|B) = P(A inter B)/P(B)

Justification: Let S be the original sample space, and P() the
original probability function on S.  Since we know B occurs, 
we have a new sample space, namely B subset S. What is the
new probability function? If x in B, then P(x|B) must satisfy
  1 = sum_{x in B} P(x|B), so
the obvious choice is P(x|B) = P(x)/P(B).
So if A subset B is any event in the new sample space B,
then P(A|B) = sum_{x in A} P(x|B) = sum_{x in A} P(x)/P(B)
            = P(A)/P(B)
What if A is not a subset of B? If x in A but x not in B,
then clearly P(x|B) = 0; if B occurs then x cannot occur.
Thus we finally get P(A|B) = P(A inter B)/P(B).

Ex: Suppose we toss 3 balls into 3 bins
ASK&WAIT: What is P(first bin empty)?
ASK&WAIT: What is P(second bin empty | first bin empty)?

Ex: Roll two fair dice, what is P(rolling a 6 | sum of dice is 10)?

Ex: Roll two fair coins, what is P(second is head | first is head)

Returning to medical testing, let N = US population.
The population consists of 4 groups:
  1) TP (true positives)  |TP|=90% of  5% of N = ( 9/200)*N, P(TP)=9/200
  2) FP (false positives) |FP|=20% of 95% of N = (19/100)*N, P(FP)=19/100
  3) TN (true negatives)  |TN|=80% of 95% of N = (76/100)*N, P(TN)=76/100
  4) FN (false negatives) |FN|=10% of  5% of N = ( 1/200)*N, P(FN)=1/200
Now let A = {person is affected} = TP U FN
        B = {person tests positive} = TP U FP
        A inter B = TP
  and finally P(A|B) = P(TP)/P(TP U FP) 
                     = (9/200)/(9/200 + 19/100) = 9/47 ~ .19
So if a random person tests positive, there is only a 19% chance
that they really have it.

ASK&WAIT: What is P(B|A) = P(person tests positive | person is affected)?

ASK&WAIT: What is P(test correct when given to random person)?

ASK&WAIT: Let a "phony test" simply declare everyone healthy
          what is P(phony test correct when given to a random person)?

Here is another way to describe what we just did, as an example
of Bayesian Inference, a widely used technique in artifical intelligence:

Given a noisy observation (results of the test), we want to determine
the probability of something we cannot directly measure (whether the
person is healthy). So in general we know (can measure) the following:

  P(A) = probability that the event A occurs ("prior knowledge")
       = .05 (5% of population has disease)
  P(B|A) = probability of event B, which we can measure, given A
         = probablity that person tests positive, given they have the disease
  P(B|not A) = probability of B given that A does not also occur
         = probability that person tests posiive, given they do not have disease

We want to "infer" the following, which we cannot easily measure
  P(A|B) = probability that A occurs given B
         = probability that person really is sick, given that they test positive

ASK&WAIT: Why is this harder to measure than the first 3 items?

  Here are the formulas, that generalized what we did for the medical test:

Thm (Bayes' Rule)
  P(A|B) = P(B|A) * P(A) / P(B) 
  proof:   P(A|B) = P(A inter B) / P(B)  ... by definition
                  = P(A inter B)*(P(A)/P(A)) / P(B)
                  = (P(A inter B)/P(A)) * P(A) / P(B)
                  = P(B|A) * P(A) / P(B) ... again by definition

This reduced computing P(A|B) to knowing P(B|A) and P(A), which we have, and
  P(B) = P(a random person tests positive)
which we do not. So we need one more formula:

Thm (Total Probability Rule)
  P(B) = P(B|A)*P(A) + P(B|not A)*(1-P(A))
Here "not A" is the complement of the set A
  proof:  P(B) = P(B inter A) + P(B inter (not A) ) 
                    ... since (B inter A) and (B inter (not A)) are disjoint
               = P(B|A)*P(A) + P(B|not A)*P(not A)
                    ... by definition of conditional probability
               = P(B|A)*P(A) + P(B|not A)*(1-P(A))

Altogether, combining these last two theorems, we get

 P(A|B) = P(B|A)*P(A)/P(B)
        = P(B|A)*P(A)/[P(B|A)*P(A) + P(B|not A)*(1-P(A))]

Ex: For our medical testing example
    P(B|A) = P(infected person tests positive) = .9
    P(B|not A) = P(healthy person tests positive) = P("false positive") = .2
    P(A) = P(random person is infected)
         = fraction of whole population with disease = .05
 so P(A|B) = P(person who tests positive is sick)
           = (.9*.05)/(.9*.05 + .2*.95)
           ~ .19

Ex: Suppose there are two bowls of cookies.
Bowl #1 has 10 CCCs and 30 plain cookies.
Bowl #2 has 20 CCCs and 20 plain cookies.
Joe picks a random bowl, and then a random cookie from the bowl.
It turns out to be plain. What is the probability Joe picked Bowl #1?

  A = {Joe picked Bowl #1}
  B = {Joe picked a plain cookie}
We want P(A|B) = P(Joe picked Bowl #1, given that he picked a plain cookie)
We know
  P(B|A) = P(Joe picked a plain cookie, given that he chose Bowl #1) = 30/40
  P(B|not A) = P(Joe picked a plain cookie, given that he chose Bowl #2) = 20/40
  P(A) = .5
So P(A|B) = P(B|A)*P(A)/[P(B|A)*P(A) + P(B|not A)*(1-P(A))]
          = (.75*.5)/(.75*.5 + .5*.5)
          = .6


Def: Two events A and B are independent if P(A inter B) = P(A)*P(B)

EX:  flip two coins, A = {HH, TH}, B = {HH, HT}, A inter B = {HH}
     P(A) = 1/2 = P(B), P(A inter B) = 1/4

Prop: If A and B are independent, then P(A|B) = P(A) and P(B|A) = P(B)
      In words: "If A and B are independent, knowing that B occurred tells
      you nothing about A, and vice versa"
Proof: P(A|B) = P(A inter B)/P(B) = P(A)*P(B)/P(B) = P(A)
       P(B|A) = P(A inter B)/P(A) = P(A)*P(B)/P(A) = P(B)

ASK&WAIT: Throw 3 balls into 3 bins, are 
          A = {first bin empty} and B = {second bin empty} independent?
ASK&WAIT: Throw 2 dice, are
          A = {rolling a 6} and B ={sum=10} independent?
ASK&WAIT: Throw 2 dice, are
          A = {sum even}, B = {first die even} independent?

Independence of A and B means P(A inter B) = P(A)*P(B) easy to
compute from P(A) and P(B). We want to extend this notion to
more than 2 independent events A1,A2,...,Ak, so that for any 
subset of these set, say A1, A2, A5 and A7, we can compute
P(A1 inter A2 inter A5 inter A7)
  = P(A1) * P(A2) * P(A5) * P(A7)
Here is how to extend the definition of independence so this is true.

Def: Events A1, A2, ... , An are mutually independent if
     for every i and every subset J of {1,2,...,n} - {i} then
     P(Ai | inter_{j in J} Aj) = Pr(Ai)
     i.e. Ai does not depend on any combination of the other events

Thm: P(B inter A) = P(B)*P(A|B)
Proof: follows from definition of P(A|B)

Thm: P(A1 inter A2 inter ... inter An) = 
     P(A1) * P(A2|A1) * P(A3|A1 inter A2) * P(A4| A1 inter A2 inter A3)
           * ... * P(An | A1 inter A2 inter ... inter An-1 )

Ex:  P(A1 inter A2 inter A3 inter A4)
   = (P(A1 inter A2 inter A3 inter A4)/P(A1 inter A2 inter A3))*
      P(A1 inter A2 inter A3)
   = (P(A1 inter A2 inter A3 inter A4)/P(A1 inter A2 inter A3))*
     (P(A1 inter A2 inter A3)/P(A1 inter A2))*
      P(A1 inter A2)
   = (P(A1 inter A2 inter A3 inter A4)/P(A1 inter A2 inter A3))*
     (P(A1 inter A2 inter A3)/P(A1 inter A2))*
     (P(A1 inter A2)/P(A1)) * P(A1)
   = P(A4 | A1 inter A2 inter A3)*
     P(A3 | A1 inter A2)*
     P(A2 | A1)*
     P(A1)

Proof of general case: induction on n
       Base case: n=1: P(A1)=P(A1)
       Induction step: Assume 
          P(A1 inter ... inter An-1)
                = P(A1) * ... * P(An-1 | A1 inter ... inter An-2)
       Then P(A1 inter ... inter An) 
          = P(A1 inter ... inter An-1) * P(An | A1 inter ... inter An-1)
          = P(A1) * ... * P(An-1 | A1 inter ... inter An-2) *
            P(An | A1 inter ... inter An-1)     (by induction, as desired)

Corollary: Suppose A1, A2, ... , An are mutually independent. Then
           P(A1 inter A2 inter ... inter An) = P(A1)*P(A2)*...*P(An)
  Proof: in above proof, each
           P(Ai | A1 inter ... inter Ai-1) = P(Ai) by mutual independence

EX: Toss a fair coin 3 times. Let A={HHH}, A1={Hxx}, A2={xHx}, A3={xxH}
    A = A1 inter A2 inter A3
    P(A) = P(A1) * P(A2|A1) * P(A3|A1 inter A2)
         = P(A1) * P(A2)    * P(A3)
         = 1/2   *  1/2     * 1/2
         = 1/8 as expected
EX: Toss a biased coin 3 times, with P(H) = p
ASK&WAIT: what is P(A)?