CS 70 - Lecture 28 - Apr 1, 2011 - 10 Evans Goals for today: Variance and Standard Deviation (Please read Note 16) Mean and Standard deviation: some of you ask what the mean and standard deviation of the scores on the midterm are, so you already understand that 1) the mean tells you the average score 2) the standard deviation (std) tells you how many people were "close to the average" i.e expect many scores to lie in range from mean-x*std to mean+x*std where x = 1 or 2; if your score is < mean-3*std you know you're in trouble, and if your score is > mean+3*std you feel really good Now it is time to define this carefully: DEF: Let S be a sample space, P a probability function, and f a random variable. E(f) the expectation of f. Then let g = (f - E(f))^2. Then the variance of f is defined as V(f) = E( g ) = sum_{e in S} g(e)*P(e) = sum_{e in S} (f(e) - E(f))^2*P(e) The standard deviation of f is defined as sigma(f) = ( V(f) )^(1/2) It may seem that what we want is E(| f - E(f) |), but this is generally hard to compute, so instead we compute V(f) = E((f-E(f))^2), which tells us how far we expect (f-E(F))^2 to be from zero, and then take the square root to get sigma(f). Ex: Roll a fair coin n times, bet $1 on H each time, let f(x) = #H - #T = amount you win or lose. We know E(f) = 0, but how far from breaking even can we expect to be? Let f = f_1 + f_2 + ... + f_n where f_i = { +1 if i-th throw is H { -1 if i-th throw is T Then f^2 = sum_{i=1 to n} f_i^2 + sum_{i,j = 1 to n, i neq j} f_i*f_j, so E(f^2) = sum_{i=1 to n} E(f_i^2) + sum_{i,j=1 to n, i neq j} E(f_i*f_j) Now f_i^1 = 1 always, so E(f_i^2) = 1 f_i*f_j = { +1 if f_i = f_j (prob 1/2) { -1 if f_i =-f_j (prob 1/2) so E(f_i*f_j) = +1*.5 + (-1)*.5 = 0 and E(f^2) = V(f) = n and sigma(f) = sqrt(n) In other words, we can expect to be up or down about sqrt(n) dollars Ex: What about using a biased coin in the last example, with P(H) = p? Need a theorem to simplify this: Thm: V(f) = E(f^2) - (E(f))^2 Proof: V(f) = E((f - E(f))^2) = E(f^2 - 2*f*E(f) + (E(f))^2) = E(f^2) - E(2*f*E(f)) + E((E(f))^2)) = E(f^2) - 2*E(f)*E(f) + (E(f))^2 = E(f^2) - (E(f))^2 Ex: Throw a biased coin n times, bet $1 on H each time f = f_1 + ... + f_n Proceeding as above, we need E(f_i^2) = E(1) = 1 E(f_i*f_j) = (+1)*(p^2 + (1-p)^2) + (-1)*(2*p*(1-p)) = 4*p^2-4*p+1 = (2*p-1)^2 so E(f^2) = n*1 + (n^2-n)*(2*p-1)^2 and E(f) = n*(2*p-1) so V(f) = E(f^2) - (E(f))^2 = n - n*(2p-1)^2 = 4*n*p*(1-p) and sigma(f) = 2*sqrt(n*p*(1-p)) = sqrt(n) if p=1/2 Ex: Roll a fair die, f = value on top of die E(f) = (1/6)*(1+2+3+4+5+6) = 7/2 V(f) = E(f^2) - (E(f))^2 = (1/6)*(1^2 + ... + 6^2) - (7/2)^2 = (91/6) - (49/4) = 35/12 Ex: Choose a number from {1,2,...,n} with equal probability 1/n E(f) = (1/n)*(1+...+n) = (n+1)/2 V(f) = (1/n)*(1^2+...+n^2) - ((n+1)/2)^2 = (1/n)*(n/6 + n^2/2 + n^3/3) - (n^2/4 + n/2 + 1/4) = (n^2 - 1)/12 sigma(f) ~ n/sqrt(12) ~ .29 * n EX: Suppose we have a list of n distinct items L(1),...,L(n), and want an algorithm that takes an input x known to be on the list, and returns i if L(i)=x. An obvious algorithm is "linear search" i=0 repeat i=i+1 until L(i)=x or i=n+1 Suppose x is chosen at random from the list, with equal probabilities. What is the expectation of the operation count C of this algorithm, i.e. how many times is the line "i=i+1" executed? If we run this algorithm many times, E(C) tells us how long it will take "on average", and sigma(C) tells us how variable the running time will be. Since C has the same distribution as the last example, the answer is the same. Ex: Suppose f ~ Poiss(lambda), that is P(f=r) = exp(-lambda)*lambda^r/r! Then E(f^2) = sum_{r=1 to infinity} r^2*exp(-lambda)*lambda^r/r! = exp(-lambda)*sum_{r=1 to infinity} r*lambda^r/(r-1)! To see how to do this sum, we start with exp(lambda) = sum_{r=0 to infinity} lambda^r/r! and manipulate until we get what we want: lambda * exp(lambda) = sum_{r=0 to infinity} lambda^(r+1)/r! d/d lambda( lambda * exp(lambda) ) = d/d lambda ( sum_{r=0 to infinity} lambda^(r+1)/r! ) = sum_{r=0 to infinity} (r+1) lambda^r/r! Multiply this by lambda to get lambda * d/d lambda( lambda * exp(lambda) ) = sum_{r=1 to infinity} r lambda^r/(r-1)! as desired. Simplifying, we get E(f^2) = exp(-lambda) * lambda * d/d lambda( lambda * exp(lambda) ) = exp(-lambda) * lambda ( 1 * exp(lambda) + lambda * exp(lambda) ) = exp(-lambda) * exp(lambda) * ( lambda + lambda^2) = lambda + lambda^2 and V(f) = E(f^2) - (E(f))^2 = lambda, so V(f) = E(f). Now we quantify the notion that the value of a random variable f(x) is unlikely to fall very far from E(f): Thm (Chebyshev's Inequality): Let f be a random variable. Then P( |f - E(f)| >= r ) <= V(f)/r^2 Letting r = z*sigma(f), we can also write this as P( |f - E(f)| >= z*sigma(f) ) <= 1/z^2 In words, the probability that the value of a random variable f is farther than z times the standard deviation sigma(f) from its mean value E(f) can be no larger than 1/z^2, which decreases as z increases. Here is a table: z P( |f - E(f)| >= z*sigma(f)) --- ------------------------------- 1 <= 1 (trivial bound on a probability!) 2 <= 1/4 = .25 5 <= 1/25 = .04 10 <= 1/100= .01 We will prove this as a corollary of another useful result: Thm: Markov's Inequality. Let g(x) be a nonnegative random variable. Then P( g >= s ) <= E(g)/s Proof: E(g) = sum_r r * P(g = r) = sum_{r < s} r*P(g=r) + sum_{r >= s} r*P(g=r) >= sum_{r >= s} r*P(g=r) ... since omitted sum is positive >= sum_{r >= s} s*P(g=r) = s * sum{r >=s } P(g=r) = s * P(g >= s) Proof of Chebyshev: Apply Markov to g(x) = (f(x) - E(f))^2, which by construction has E(g) = V(f), yielding V(f)/s >= P( g >= s ) = P( (f - E(f))^2 >= s ) Now let s = r^2, so V(f)/r^2 >= P( |f - E(f)| >= r ) as desired. How good is Chebyshev's inequality, i.e. how close to P( |f(x) - E(f)| >= z*sigma(f) ) can 1/z^2 be? EX: Again consider rolling a die 100 times, and computing sum h. Then E(h) = 350 and sigma(h)~17. Comparing actual P( |h-350| > z*17) with 1/z^2 from Chebyshev yields table below: Chebyshev is much too large for large z. (Later we will get a much better approximation from the Central Limit Theorem). z P( |f - E(f)| >= z*sigma(f)) from Chebyshev More accurate --- ------------------------------- 1 <= 1 ~ .4 2 <= .25 ~ .05 5 <= .04 ~ 10^(-6) 10 <= .01 ~ 10^(-25)