```CS 70 - Lecture 23 - Mar 14, 2011 - 10 Evans

Goals for today (read Note 14):
random variables
expection (average, mean) of random variables

DEF: Let S be the sample space of a given experiment, with probability
function P. A _random variable_ is a function f:S -> Reals.

EX 1: Flip a biased coin once, S1 = {H,T}, P1(H) = p,
f1(x) = {1 if x=H, -1 if x=T}
f1 = amount you win (if f1>0) (or lose, if f1<0) if you bet \$1 on H.

EX 2: Flip a biased coin n times, S2 = {all sequences of H, T of length n}
f2(x) = #H - #T = #H - (n-#H) = 2*#H - n
f2 = amount you win (or lose) if you bet \$1 on H on each flip

EX 3: Let S3 = result of rolling a die once, P3(any face) = 1/6
Let f3(die) = value on top of die (an integer from 1 to 6)

EX 4: Let S4 = result of rolling a pair of red and blue dice 24 times
= { ((1,1),(1,1),...,(1,1)),...,((6,6),(6,6),...,(6,6))}
<----- 24 times ------>     <----- 24 times ------>
ASK&WAIT:   What is P4(x) for any x in S4?
Let f4(x) = { +1 if a pair of sixes appears in x }
{ -1 otherwise                       }
We can interpret f4 as the amount of money we win (or lose) by
betting on getting a pair of sixes

EX 5: S5 = {US population}, P5(person x in S5) = 1/|S5|,
Let f5(person x in S) = { +1 if x has a particular disease }
{  0 if x does not                 }

EX 6: Suppose you have a pile of n graded homework assignments to hand
back to a class. But you shuffle them randomly, so all permutations
are equally likely, and hand them back in that order, so each student
gets a random homework. How many students get their own homework back?

Let the students be named {1,2,...n}.
S6 = {all permutations of 1 to n}, P6(any permutation) = 1/n!
Let a particular permutation of (1,...,n) be
sigma = (sigma(1),sigma(2),...,sigma(n));
then student 1 gets homework sigma(1), student 2 gets homework sigma(2),
and student i get homework sigma(i). So student i gets her own homework
back if and only if i = sigma(i).

g_i(sigma) = 1 if i = sigma(i), and 0 otherwise
ASK&WAIT: What random variable tells us how many students get their
own homwork back?

EX: Suppose you flip a fair coin, and win \$1 if it comes up H,
lose \$1 if it comes up T
ASK&WAIT: What is the "average" amount you expect to win after N flips?

DEF: Given S, P and random variable f, the _Expected Value_
(also called Mean or Average) of f is
E(f) = sum_{all x in S} P(x)*f(x)

This is the "average" value of f ones gets if one repeats the experiment
a great number of times.
EX 1: With S1, P1, f1 as before, (flip coin once, bet \$1 on H)
E(f1) = (+1)*(p) + (-1)*(1-p) = 2*p-1
= 0 if coin fair (p=1/2)
Imagine betting \$1 on getting H. Then E(f1) is the amount you expect to
win (if E(f1)>0) or lose (E(f1)<0) on the bet. If E(f1)=0, you break even

EX 2: With S2, P2 and f2 as before, (flip coin N times, bet \$1 on H)
If we flip a coin N times, we expect E(f2) to be the amount we win
betting \$1 on flip to get H; and intuitively this should be
N*E(f1) = N*(2*p-1)
Formally, we get
E(f2) = sum_{sequences x of n Hs and Ts} f2(x)*P2(x)
= sum_{sequences x of n Hs and Ts} (#H-#T in x)*P2(x)
looks complicated, but later we will see that our intuition was right,
and there is an easier way to do it that matches our intuitive approach

EX 3: With S3, P3, f3 as before, (roll die once)
E(f3) = (1/6)*1 + (1/6)*2 + ... + (1/6)*6 = 21/6 = 7/2

EX 5: With S5, P5, f5 as before, (choose random person, are they sick?)
E(f5) = sum_{persons x} f5(x)*P5(x)
= sum_{sick persons x} f5(x)*P5(x) + sum_{healthy persons x} f5(x)*P5(x)
= sum_{sick persons x} 1*(1/|S5|) + sum_{healthy persons x} 0*(1/|S5|)
= P(random person is sick)

EX 6: S6, P6, g as before (return homeworks at random, who get their own back)
then E(g) = average number of students who get back their
own homework. Looks like a sum over n! terms, again we need a better approach...

EX 4: With S4, P4, f4, (roll red/blue dice 24 times, bet on pair of sixes)
seems like you need to sum over all 6^48 sequences,
We need a simpler way:

DEF: P(f=r) = sum_{all x in S such that f(x)=r} P(x)

EX 1:  With S1, P1 and f1 as before (flip coin once, bet \$1 on H)
P1(f1=1) = P1(H) = p, P1(f1=-1) = P1(T) = 1-p
EX 2:  With S2, P2 and f2 as before (flip coin N times, bet \$1 on H)

EX 3:  With S3, P3 and f3 as before (roll die once)
P3(f3=k) = 1/6 for k=1,2,...,6 and P3(f3=k)=0 otherwise

EX 4:  With S4, P4 and f4 as before (roll red/blue dice 24 times, bet on pair of sixes)
P4(f4=1) = sum_{all x in which a pair of sixes appears} P4(x)
= P4(a pair of sixes appears)

EX 5:  With S5, P5 and f5 as above, (choose random person, are they sick?)

Thm: E(f) = sum_{numbers r in range of f} r*P(f=r)
Proof: Write down proof for S finite, but same for S countably infinite
Let {r1,r2,...,rk} be numbers in range of f, and write
S = S1 U S2 U ... U Sk where
Si = {x in S such that f(x)=ri}
and so P(Si) = P(f=ri)
Note that all Si are pairwise disjoint, so we can write
E(f) = sum_{x in S} f(x)*P(x)
= sum_{x in S1} f(x)*P(x) + sum_{x in S2} f(x)*P(x)
+ ... + sum_{x in Sk} f(x)*P(x)
= sum_{x in S1} r1*P(x) + sum_{x in S2} r2*P(x)
+ ... + sum_{x in Sk} rk*P(x)
Look at one term:
sum_{x in Si} ri*P(x) = ri * sum_{x in Si} P(x)
= ri * P(Si)
= ri * P(f=ri)
so E(f) = r1*P(f=r1) + r2*P(f=r2) + ... + rk*P(f=r3)
= sum_{number r in range of f} r*P(f=r)
as desired.

EX 3: With S3, P3 and f3 as above, (roll die once)
E(f3) = sum_{k=1 to 6} k*P(f=k) = sum_{k=1 to 6} k*(1/6) = 7/2 as before

EX 4: With S4, P4, f4 as above (roll red/blue dice 24 times, bet \$1 on pair of sixes.)
E(f4) is the average amount one wins (if E(f4)>0) or loses (if E(f4)<0)
every time one plays.
E(f4) = sum_{numbers r in range of f} r*P(f4=r)
= +1*P4(getting pair of sixes) + (-1)*P4(not getting pair of sixes)
= P4(getting pair of sixes) - P4(not getting pair of sixes)
ASK&WAIT:     What is P4(not getting pair of sixes)?
P4(getting pair of sixes) = 1 - P4(not getting pair of sixes)
~ 1-.5086 = .4914
and E(f4) = .4914 - .5086 = -.0172, so you lose in the long run

Note: In 1654 the gambler Gombaud asked Fermat and Pascal whether
this was a good bet, inadvertently starting the field of
probability theory
Note: If we do 25 rolls instead of 24,
P4(not getting a pair of sixes) drops to (35/36)^25 ~ .4945
P4(getting pair of sixes) grows to .5055, so it is a good bet.

EX 5: Let S5, P5, f5 be as above. (pick random person, are they sick?)
E(f5) = (+1)*P(f5=1) + O*P(f5=0)
= P(f5=1) = P(person sick)
This is a special case of the following lemma:

Lemma: Let S be a sample space, E subset S any event, and
f(x) = {1 if x in E     }
{0 if x not in E }
Then E(f) = P(E)

EX 2: S2, P2, f2 as above (flip coin N time, bet \$1 on H)
E(f2) = expected win betting \$1 on a coin N times
= sum_{i=-N to N} i*P2(getting i=#H-#T)
= sum_{i=-N to N, i+N even} i*C(N,(N+i)/2)*p^((N+i)/2)^(1-p)^((N-i)/2)
still isn't simple, so need a new idea:

Thm: Let S and P be a sample space and probability function, and
let f and g be two random variables. Then
E(f+g) = E(f) + E(g)
Proof: Let h=f+g be a new random variable.
Then E(h) = sum_{x in S} h(x)*P(x)
= sum_{x in S} (f(x)+g(x))*P(x)
= sum_{x in S} f(x)*P(x) + sum_{x in S} g(x)*P(x)
= E(f)                   + E(g)

Corollary: Let S and P be as above, and h = f1 + f2 + ... + fn
Then E(h) = E(f1) + E(f2) + ... + E(fn)

EX 2: Let S2, P2, f2 be as before (flip coin N times, bet \$1 on H)
Then we can write
f2 = g1 + g2 + ... + gN where
gi(x) = { +1 if i-th flip = H }
{ -1 if i-th flip = T }
and E(f2) = E(g1) + E(g2) + ... + E(gN)
For any i, E(gi) = (+1)*P(H) + (-1)*P(T) = p - (1-p) = 2*p-1
so E(f2) = N*(2*p-1)
which matches our original intuition about making N independent
bets in a row (whew!)

EX 6: Let S6, P6 be as before (return homework at random, how many get their own back?)
Let sigma be a permutation,
g_i(sigma) = 1 if sigma(i)=i, 0 otherwise
= 1 if i-th student gets her own homework back
g(sigma) = sum_{i=1 to n} g_i(sigma)
= number of students getting their own homework back
E(g) = E(sum_{i=1 to n} g_i)
= sum_{i=1 to n} E(g_i)
So all we need is the probability that the i-th student gets the right homework:
E(g_i) = P(student i gets right homework)
= (# permutations where student i gets right homework)/n!
= (# permutations of other (n-1) homeworks)/n!
= (n-1)! / n! = 1/n
Thus E(g) = sum_{i=1 to n} 1/n = 1
So the answer is 1 student, independent of the class size n.

EX 4: Let S4, P4, and f4 be as before (roll red/blue dice 24 times, bet \$1 on pair of sixes)
Suppose you also make the side bet
that you win \$2 if at least 8 fives come up, and lose \$2.5 if
fewer than 8 fives come up. Is this joint bet worth making?
Answer: Let g(x) = { +2 if at least 8 fives come up in x  }
{ -2.5 if at most 7 fives come up in x }
P(g=+2) = P(at least 8 fives)
= sum_{i=8 to 48} C(48,i) * (1/6)^i * (5/6)^(48-i)
~ .55992
P(g=-2.5) = P(at most 7 fives)
= 1 - P(at least 8 fives)
= 1 - .55992 = .44008
E(g) ~ +2*.55992 - 2.5*.44008 ~ .0196
Then the value of the joint bet f4+g is
E(f4+g) = E(f4)+E(g) ~ -.0172+.0196 = .0024
and being positive, is worth making (barely)

EX: Suppose you shoot at a target, and miss it with probability p each
time you try. What is the expected number of times you have to try
before getting a hit?
S = { H, MH, MMH, MMMH, .... }
P( MM...MH ) = p^#M * (1-p)
f( MM...MH ) = #shots = #M + 1
We want E(f) = sum_{m=0}^infinity (m+1)*p^m*(1-p)
Recall    sum_{m=0}^infinity p^m = 1/(1-p)
so d/dp ( sum_{m=0}^infinity p^m ) = d/dp ( 1/(1-p) )
or    sum_{m=0}^infinity m*p*{m-1} = 1/(1-p)^2
or    sum_{m=0}^infinity m*p^m*(1-p) = p/(1-p)
so sum_{m=0}^infinity (m+1)*p^m*(1-p) =
p/(1-p) + (1-p)/(1-p) = 1/(1-p)
so E(f) = 1/P(hit)
So if P(M)=.99, you need to take 1/(1-.99) = 100 shots on average to hit
```