CS 70 - Lecture 26 - Mar 28, 2011 - 10 Evans

Goals for today (read Note 15):
quick review of random variables and expectation
important distributions

DEF: Let S be the sample space of a given experiment, with probability
function P. A _random variable_ is a function f:S -> Reals.

EX 1: Flip a biased coin once, S1 = {H,T}, P1(H) = p,
f1(x) = {1 if x=H, -1 if x=T}
f1 = amount you win (if f1 = 1) (or lose, if f1 = -1) if you bet \$1 on H.

EX 2: Flip a biased coin n times, S2 = {all sequences of H, T of length n}
P2(x) = p^#H * (1-p)^(n-#H)   where #H = #Heads in x, n-#H = #Tails
f2(x) = #H - #T = #H - (n-#H) = 2*#H - n
f2 = amount you win (or lose) if you bet \$1 on H on each flip

DEF: Given S, P and random variable f, the _Expected Value_
(also called Mean or Average) of f is
E(f) = sum_{all x in S} P(x)*f(x)

EX 1: E(f1) = 1*p + (-1)*(1-p) = 2*p-1
EX 2: E(f2) = sum over all 2^n sequences x of n Heads and Tails, of
(2*#H-n) * p^#H*(1-p)^(n-#H)
prefer something simpler to sum

DEF: P(f=r) = sum_{all x in S such that f(x)=r} P(x)

DEF: We call the set of pairs of values
{(r,P(f=r)) for all r in the range of f}
the distribution of f.

Thm 1: E(f) = sum_{numbers r in range of f} r*P(f=r)

EX 1: P(H) = P(f1=1) = p, P(T) = P(f1=-1) = 1-p
so E(f1) = 1*p + (-1)*(1-p) = 2*p-1, same as before

EX 2: P(f2 = r) = P(#H - #T = r) = P(2*#H - n = r)
= P(#H = (n+r)/2)
= { p^((n+r)/2) * (1-p)^((n-r)/2)  if (n+r) is even
{ 0 otherwise,   eg when n=1 and r=0
so E(f2) = sum_{r = -n to n by steps of 2}
r * p^((n+r)/2) * (1-p)^((n-r))/2)
This only has n+1 terms to sum instead of 2^n as before,
but we still want something simpler.

Thm 2: If g_1(x),...,g_n(x) are random variables, then
g(x) = sum_{i=1 to n} g_i(x) is another random variable
with expectation
E(g) = sum_{i=1 to n} E(g_i)

EX 2: Let g_i(x) = {+1 if the i-th toss is Heads
{-1 if the i-th toss is Tails
= how much you win or lose on the i-th toss
so f2(x) = sum_{i=1 to n} g_i(x)
= total you win or lose on all n tosses
so E(f2) = sum_{i=1 to n} E(g_i)
Note that g_i is the same as f_1, how much you win or
lose on 1 toss, so E(g_i) = E(f_1) = 2*p-1, and
E(f2) = n*(2*p-1)

End of review!

Recall that the distribution of a random variable is the set
{(r,P(f=r)) where r is in the range of f}

There are certain important distributions that come up repeatedly,
that we need to recognize. The first we have seen already:

DEF: if P(f=r) = C(n,r) * p^r * (1-p)^(n-r) for r=0,...,n,
then we say f has a binomial distribution, and abbreviate this as
f ~ Bin(n,p)

Geometric Distribution:

EX: Suppose you shoot at a target, and hit with probability p each
time you try. What is the expected number of times you have to try
before getting a hit?
S = { H, MH, MMH, MMMH, .... }
P( MM...MH ) = (1-p)^#M * p
f( MM...MH ) = #shots = #M + 1
so P(f=r) = P(r = #M+1) = (1-p)^(r-1) * p

DEF: We say that f has the geometric distribution with parameter p
if f:S-> {1,2,3,...} and
P(f=r) = (1-p)^(r-1) * p
We abbreviate this by f ~ Geom(p)

Check that 1 = sum_{r=1 to infinity} P(f=r)
= sum_{r=1 to infinity} (1-p)^(r-1) * p
= p * sum_{r=1 to infinity} (1-p)^(r-1)
= p * 1/(1-(1-p)) = p / p = 1 - ok!

We want E(f) = sum_{r=1}^infinity r*(1-p)^(r-1)*p
Start from geometric sum:    sum_{r=1}^infinity (1-p)^(r-1) = 1/p
so d/dp ( sum_{r=1}^infinity (1-p)^(r-1) ) = d/dp ( 1/p )
or    sum_{r=1}^infinity -(r-1)*(1-p)*(r-2) = -1/p^2
or    sum_{r=1}^infinity (r-1)*(1-p)^(r-2)*p = 1/p
so E(f) = 1/p = 1/P(hit)
So if P(hit)=.01, you need to take 1/.01 = 100 shots on average to hit

Here is another way to compute E(f), for random variables whose
values can be positive integers N+ = {1,2,3...}

Thm: Let f:S->N+ be a random variable. Then
E(f) = sum_{i=1 to infinity} P(f >= i)
Proof: Write P(f=i) as p_i, so
E(f) = sum_{i=1 to infinity} i * p_i
= p_1 + (p_2 + p_2) + (p_3 + p_3 + p_3) +
= (p_1 + p_2 + p_3 + p_4 + ...)
+(      p_2 + p_3 + p_4 + ...)
+(            p_3 + p_4 + ...)
+(                  p_4 + ...) + ...
= P(f >= 1) + P(f >= 2) + P(f >= 3) + P(f >= 4) + ...

Ex: When P(f=r) = (1-p)^(r-1)*p, then
P(f>=i) = sum_{r = i to infinity) (1-p)^(r-1)*p
= (1-p)^(i-1) * p * sum_{r=0 to infinity} (1-p)^r
= (1-p)^(i-1) * p * (1/(1-(1-p)))
= (1-p)^(i-1)
so E(f) = sum_{i=1 to infinity} (1-p)^(i-1)
= 1/(1-(1-p)) = 1/p as before

Summarizing we have
Thm: If f has a geometric distribution with parameter p, that is
P(f=r) = (1-p)^(r-1)*p, then
P(f >= i) = (1-p)^(i-1)
E(f) = 1/p

Beside the number of throws until we hit a target, there are
various of other examples:
number of runs before a system fails
number of retransmissions of a packet before it is successfully sent, etc.

Ex: Coupon Collector, revisited
We buy cereal boxes, each of which contains one of n random baseball cards.
What is the expected number of boxes I have to buy to have at least one
of each card?

To define the sample space carefully, let the different baseball cards be
numbered from 1,...,n, so the sample space is
S = {all sequences w of numbers from 1 to n such that
each number appears at least once
the last number in the sequence appears exactly once}
The random variable whose expectation we want is
f(w) = length of w
which is rather complicated to compute directly.
So instead we write f(w) as a sum of other random variables,
each of which has a geometric distribution, and just add their expectations.

Let f_i(w) = number of additional boxes we need to buy to get the
i-th distinct card in w, after getting the (i-1)-st distinct card.
Ex: suppose n=3, and w = 111221123. Then
f_1(w) = 1 (the first box always contains a card we haven't seen yet)
f_2(w) = 3 (since we need 3 more boxes to get the first card besides 1)
f_3(w) = 5 (since we need 5 more boxes to get the first card besides 1 and 2)
Then clearly f(w) = length of w = f_1(w) + f_2(w) + ... + f_n(w)
and so E(f) = sum_{i=1 to n} E(f_i)

We claim that each f_i has a geometric distribution:
f_1 is trivially geometric with p = probability that the first card
is different than previous ones = 1
f_2 is geometric with p = probability that the next card
is different from the 1 already gotten = (n-1)/n
f_3 is geometric with p = probability that the next card
is different from the 2 already gotten = (n-2)/n
... f_i is geometric with p = probability that the next card
is different from the i-1 already gotten = (n-(i-1))/n

Thus E(f_i) = 1/((n-(i-1))/n) = n/(n-i+1)
and E(f) = sum_{i=1 to n} n/(n-i+1)
= n * sum_{j=1 to n} 1/j

We can approximate this sum by an integral:
integer_{1 to n+1} dx/x <= sum_{j=1 to n} 1/j <= 1 + integral_{1 to n} dx/x
or
ln(n) <= ln(n+1) <= sum_{j=1 to n} 1/j <= 1 + ln(n)
In fact, it is known that
sum_{j=1 to n} 1/j ~ ln(n) + gamma
where gamma = .5772... is called Euler's constant.

Finally we get E(f) ~ n*(ln(n) + gamma).

Recall that last time we looked at this problem, we showed
that if you bought > = n*ln(2*n) boxes, then the probability of
having at least one of each card is >= 1/2, so we got a similar result.

Poisson Distribution:

Suppose we throw n balls into n/lambda bins, where lambda is a constant.
We want to know the probability that i balls land in a particular bin,
when n is very large. The exact distribution is binomial, with
the probability of landing in bin 1 (say) equal to 1/(n/lambda) = lambda/n, so

P(i balls land in bin 1) = C(n,i) *  (lambda/n)^i * (1- lambda/n)^(n-i)

The problem is that all we know about n is that it is very large;
P(0 balls land in bin 1) = C(n,0) * (1 - lambda/n)^n
= (1 - lambda/n)^((n/lambda)*lambda)
As n grows toward infinity, x = lambda/n shrinks to zero, and we know
lim_{x -> 0} (1 - x)^(1/x) = 1/e = 1/2.71828... = .367...
Therefore
lim_{n -> infinity} P(0 balls land in bin 1) = (1/e)^lambda = exp(-lambda)
More generally, we want simple expressions for
p_i = lim_{n -> infinity} P(i balls land in bin i)
= lim_{n -> infinity} C(n,i) * (lambda/n)^i * (1 - lambda/n)^(n-i)
We know p_0 = exp(-lambda), and the easiest way to figure out the rest
is to look at the ratios
p_i / p_{i-1} = lim_{n -> infinity}
[ C(n,i)   * (lambda/n)^i     * (1 - lambda/n)^(n-i)     ] /
[ C(n,i-1) * (lambda/n)^(i-1) * (1 - lambda/n)^(n-(i-1))
= lim_{n -> infinity}
[ C(n,i) / C(n,i-1) ] * (lambda/n) * (1 - lambda/n)^(-1) ]
= lim_{n -> infinity}
[ (n!/(i!(n-i)!) / (n!/(i-1)!(n-i+1)! ] * (lambda/n) * (1 - lambda/n)^(-1) ]
= lim_{n -> infinity}
[ (n-i+1)/i ] * (lambda/n) * (1 - lambda/n)^(-1) ]
= (lambda/i) * lim_{n -> infinity} [ (n-i+1)/n ] * (1 - lambda/n)^(-1) ]
= (lambda/i)
or p_i = (lambda/i)*p_{i-1}
This yields a simple recurrence:
p_0 = exp(-lambda)
p_1 = exp(-lambda)*lambda
p_2 = exp(-lambda)*lambda^2/2
p_3 = exp(-lambda)*lambda^3/(3*2)
p_4 = exp(-lambda)*lambda^4/(4*3*2)
...
p_i = exp(-lambda)*lambda^i/i!

Let's check that
1 = sum_{i=0 to infinity} p_i
= sum_{i=0 to infinity} exp(-lambda) * lambda^i/i!
= exp(-lambda) * sum_{i=0 to infinity} lambda^i/i!
= exp(-lambda) * exp(lambda) = 1 as desired.

DEF: A random variable f for which
P(f=i) = exp(-lambda)*lambda^i/i!
for i=0,1,2,... is said to have a Poisson distribution
with parameter lambda, or more briefly f ~ Poiss(lambda)

If f ~ Poiss(lambda), we compute its expectation as follows:

E(f) = sum_{i=0 to infinity} i * exp(-lambda)* lambda^i/i!
= exp(-lambda) * sum_{i=1 to infinity} * lambda^i/(i-1)!
= lambda * exp(-lambda) * sum_{i=1 to infinity} * lambda^(i-1)/(i-1)!
= lambda * exp(-lambda) * exp(lambda)
= lambda

If you plot P(f=i) versus i, it increases until i ~ lambda,
and then decreases.

The Poisson distribution is useful for modeling "rare events",
as the following example shows.

Ex: Suppose a web server gets an average of 100K requests/day,
You want to make sure to have enough computers to handle most
bursts in activity. One computer takes 1 second to handle one request.
How many computers do you need to handle bursts in requests?

The "rare event" here is not a request arriving at the web server,
but one person choosing to make a request, which we assume is
done independently by each person (maybe not true if the web server
is handling news events and everyone wants to find out about
some big event at the same time).

In this case we have some large but unknown number of people n,
and some tiny but unknown probability p that each person will make
a request in any 1 second period of time, so the probability
that i people make a request in a 1 second period is binomial:
P(i people out of n make a request) = C(n,i) * p^i * (1-p)^(n-i)
But we don't know n or p, so what do we do? Since there are
100K requests/day, there are an average of
100K/(24*3600) = 1.2 = lambda
requests per second. In other words, we can directly measure
the expectation E(f) of the binomial random variable
f = number of people making a request in a 1 second period
That is lambda = E(f) = n*p. So we can measure the product n*p even
though we don't know n or p individually. Thus
P(i people out of n make a request)
= C(n,i) * (lambda/n)^i * (1-lambda/n)^(n-i)
This is the expression we used to introduce the Poisson distribution;
for n large (and the number of people is certainly large),
this gets close to
P(i people out of n make a request) ~ exp(-lambda) * lambda^i / i!
For lambda = 1.2, we get the following table:
i  P(i)  Sum
0  .301  .301
1  .361  .662
2  .217  .879
3  .087  .966
4  .026  .992
5  .006  .999
So if you have 5 servers, there is a 99.9% chance
that you will be able to deal with simultaneous
1-second requests without a conflict.