CS 70 - Lecture 15 - Feb 23, 2011 - 10 Evans Goals for Note 10 (note: we are skipping Note 9): Counting Preparation for Probability Theory (most of the rest of the course) analyzing the "average" behavior of algorithms (quicksort) design faster algorithms that "flip coins" or take random guesses (hashing) how much of Artificial Intelligence works (deducing "most likely" meaning of a sentence, CS188) communicating reliably over a noisy channel (EE126) building control systems that work well despite noise (Kalman filters) Start with (and mostly do) discrete probability: If I flip a fair coin once, what is the chance of heads? 1 flip giving head / 2 possible equally likely outcomes = .5 If I flip a fair coin twice, what is the chance of 2 heads? Need to count the number of ways you can get 2 heads: HH and divide by the number of equally likely outcomes: {HH,HT,TH,TT} so 1/4 = 25% If I flip a fair coin 100 times, what is the chance I get exactly 50 heads? Need to count how many sequences of 100 flips get exactly 50 heads, divide by number of sequences of 100 flips altogether (each of which is equally likely) (about 8%) ASK&WAIT: If you insist that people creating passwords for your new website use passwords consisting of 6 upper case letters, how long would it take a hacker to break in, if s/he could try 1 password / microsecond? ASK&WAIT: What about 8 upper case letter passwords? ASK&WAIT: What about 8 letter passwords with at least one nonletter? Notation: set S = {list of distinct items} S union T = union of S and T = set of all items in S or in T or both (with any copies removed) S intersect T = intersection of S and T = set of items in both S and in T |S| = cardinality of S = #items in S Counting Principles 1) The Sum Rule: EX: If you have to do one project for a class, and are given one list with 2 projects and another with 3 different projects, how many different projects do you have to choose from? 2+3 = 5 The Sum Rule (formally): Suppose we have to choose one task to do, either T1 or T2. Let S1 be the set of n1 ways to do task 1, and S2 the set of n2 ways to do task 2, where S1 and S2 disjoint. The set of ways to do either T1 or T2 is S1 U S2. The number of ways to do either T1 or T2 is |S1 U S2| = |S1| + |S2| = n1 + n2 2) The Product Rule: EX: If you have to do two projects for a class, the first one chosen from a list of 2 projects, and the second one chosen from a list of 3 projects, how many different pairs of projects could you turn in? S1={p1,p2}, S2={pa,pb,pc} pairs={(p1,pa),(p1,pb),(p1,pc),(p2,pa),(p2,pb),(p2,pc)} = S1 x S2 2*3 = 6 different pairs The Product Rule (formally) Suppose we have two tasks to do, T1 and T2, with S1, n1, S2, n2 as above. The set of ways to do both T1 and T2 is S1 x S2. The number of ways to do both T1 and T2 is |S1 x S2| = |S1| * |S2| = n1 * n2 (remember S1 x S2 is the set of all pairs of entries {(x1,x2), xi in Si} 3) The Extended Product Rule: If S1 is the set of n1 ways to do T1, S2 the set of n2 ways to do T2, ... , Sm the set of nm ways to do Tm, then the set of ways to do T1,T2,...,Tm is S1xS2x...xSm, which has n1*n2*...*nm elements ASK&WAIT: How many bits strings of length 9 are there? ASK&WAIT: How many different license plates are there if all consist of three letters following by 3 numbers? ASK&WAIT: How many different computer passwords are there if they may be 8 characters, upper case letters only? ASK&WAIT: How many different computer passwords are there if they may be 6-8 characters long, upper or lower case letters, digits? ASK&WAIT: What if there must be at least one letter and one number? ASK&WAIT: How many ways can you shuffle a deck of 52 cards? EX: How many ways can a class of 100 students be divided in 2-student teams? 2 students {s1,s2} -> 1 way 4 students {s1,s2,s3,s4} -> 3 ways {(s1,s2),(s3,s4)}, {(s1,s3),(s2,s4)}, {(s1,s4),(s2,s3)} How do we get a simple formula for any even n? Suppose there are are P(n-2) pairings of n-2 students, whose names are 1, 2, ... , n-2; now add students n-1 and n What pairings are possible? Take student n, and choose any other student m to make the pair. That leaves n-2 students, with P(n-2) possible pairings. m can take on n-1 values, so there are (n-1)*P(n-2) possible pairings. Result: recurrence P(n) = (n-1)*P(n-2) , with P(2)=1 Are we sure we have counted every possibility exactly once? Use induction: assume P(n-2) is correct In construction, get (n-1) groups of P(n-2) pairings each, where n is paired with a different m in each group. So no pairing appears in more than one group. And no pairing can appear twice in one group because all P(n-2) groupings of n-2 students are different, by induction. And each pairing has to appear in one group, depending on partner of n. ASK&WAIT: What is a closed form formula for P(n)? For n=100: P(100) = 99*97*95*...*3 \approx 2.7e+78 n P_n 2 1 4 3 6 15 8 105 10 945 20 6.5e+08 40 3.2e+23 60 2.9e+40 100 2.7e+78 150 6.1e+130 (number of atoms in universe once thought to be about 1e80) 4) Inclusion-Exclusion Principle: EX: How many 8-bit strings either start with 1 or end with 00? S1 = {1xxxxxxx, x= any bit}, S2 = {xxxxxx00, x=any bit} We want |S1 U S2|. But S1 and S2 overlap: S1 inter S2 = {1xxxxx00} So we count |S1| = 2^7, |S2| = 2^6. But |S1|+|S2|>|S1 U S2| because S1 inter S2 has been counted twice, so we subtract it: |S1 U S2| = |S1| + |S2| - |S1 inter S2| = 2^7 + 2^6 - 2^5 = 160 The Inclusion-Exclusion Principle (formally) Suppose we have two tasks to do, T1 and T2, with S1, n1, S2, n2 as above, except S1 and S2 may intersect. The set of ways to do both T1 and T2 is S1 U S2. The number of ways to do both S1 and S2 |S1 U S2| = |S1| + |S2| - |S1 inter S2| EX: How many <= 3 decimal digit numbers are divisible by 3 or by 4? Inclusion-Exclusion with 3 tasks Suppose you have 3 tasks, in sets S1, S2, S3, which might overlap. Then | S1 U S2 U S3 | = |S1| + |S2| + |S3| - |S1 inter S2| - |S2 inter S3| - |S1 inter S3| + |S1 inter S2 inter S3 | EX: How many <= 3 decimal digit numbers are divisible by 3, 4 or 5? (5) Pigeonhole Principle: If k+1 or more objects (pigeons) are placed in k boxes (holes), then at least one box contains 2 or more objects. (proof by contradiction: if each box had at most one object, there would only be k or fewer objects, a contradiction) EX: In any group of 27 English words, at least 2 begin with the same letter, since there are only 26 letters. ASK&WAIT: How large a group of people do you need to be sure that two of them have the same first and last initials? ASK&WAIT: How many times do you have to shuffle a deck of cards, to be sure that the cards are in exactly the same order at least twice? (6) Generalized Pigeonhole Principle: If N or more objects (pigeons) are placed in k boxes (holes), then at least one box contains ceiling(N/k) or more objects. (proof by contradiction: if each box had at most ceiling(N/K)-1 objects, there would be at most k*(ceiling(N/k) -1) < k*((N/k +1) - 1) = N objects, a contradiction) EX: N=k+1 implies ceiling(N/k)=ceiling((k+1)/k)=2, usual Pigeonhole principle ASK&WAIT: There are 199 students enrolled in CS70. How many have to receive the same letter grade (A,B,C,D,F)? EX: Given any set S of n+1 positive integers less than or equal to 2*n, then one of them must divide another one: (ex: n=5, if S={2 3 5 7 9 10}, then 5 | 10) Proof: Let S = {a(1),a(2),...,a(n+1)}. Write a(i) = 2^(k(i)) * q(i), where q(i) is odd. So {q(1),...,q(n+1)} is a set of positive odd integers from 1 to 2n-1, of which there are only n, namely 1,3,5,...,2n-1. So by the pigeonhole principle, q(i)=q(j)=q for some i and j. Thus a(i) = 2^(k(i)) * q and a(j) = 2^(k(j)) * q. If k(i)>k(j) then a(j)|a(i), else a(i)|a(j). ASK&WAIT: Assuming California has 36M people, at least how many of them must have the same 3 initials and were born on the same day of the same month (but not necessarily in the same year)? For example, Arnold B. Casey (ABC), born 29 Feb 1955 and Abigail B. Chen (ABC), born 29 Feb 1990 (7) Permutations DEF: a permutation of a set S of n distinct objects is an ordered list of these objects DEF: an r-permutation is an ordered list of r elements of S EX: S={1,2,3}, all permutations={(1,2,3),(2,1,3),(1,3,2),(2,3,1),(3,1,2),(3,2,1)} all 2-permutations={(1,2),(2,1),(1,3),(3,1),(2,3),(3,2)} DEF: the number of r-permutations of a set S with n elements is P(n,r) Theorem: P(n,r) = n*(n-1)*(n-2)*...*(n-r+1) = n!/(n-r)! Proof: (product rule): there are n ways to choose the first in list, n-1 ways to choose second, ... , n-r+1 ways to choose rth EX: P(3,3)=3*2*1=6, P(3,2)=3*2=6 EX: how many different ways can a salesman visit 8 cities? P(8,8)=8!=40320 EX: How many different ways can 10 horses in a race win, place and show (come in first, second, third)? P(10,3) = 10*9*8 = 720 (8) Combinations DEF: an r-combination from a set S is simply an unordered subset of r elements from S EX: S={1,2,3}, all 2-combinations={{1,2},{1,3},{2,3}} Comparing to all 2-permutations, we see we ignore order, DEF: C(n,r) = number of r-combinations from a set with n-elements Theorem: C(n,r) = n! / [ (n-r)! r! ] Proof: the set of all r-permutations can be formed from the set of all r-combinations by taking all r! orderings of each r-combination, so P(n,r)=r! * C(n,r), and C(n,r)=P(n,r)/r!= n! / [ (n-r)! r! ]= n*(n-1)*(n-2)*...*(n-r+1)/r! EX: C(3,2)=P(3,2)/2!=6/2=3 DEF C(n,r) also called binomial coefficient, written (n \\ r), pronounced "n choose r" Note that C(0,0)= 0!/0!*0! = 1; C(n,0)=C(n,n)=1 Corollary: C(n,r)=C(n,n-r) Proof: C(n,r)=n!/[(n-r)! r!] = n!/[ r! (n-r)!] = n!/[(n-(n-r))! (n-r)!] = C(n,n-r) EX C(3,1)=C(3,2)=1 DEF Pascal triangle: (0) (0) (1) (1) (0) (1) (2) (2) (2) (0) (1) (2) (3) (3) (3) (3) (0) (1) (2) (3) (4) (4) (4) (4) (4) (0) (1) (2) (3) (4) (5) (5) (5) (5) (5) (5) (0) (1) (2) (3) (4) (5) (6) (6) (6) (6) (6) (6) (6) (0) (1) (2) (3) (4) (5) (6) ... row sum 1 1 1 1 2 1 2 1 4 1 3 3 1 8 1 4 6 4 1 16 1 5 10 10 5 1 32 1 6 15 20 15 6 1 64 Note that to get any entry, you sum its neighbors to left above, right above Theorem: C(n,r)= C(n-1,r-1)+C(n-1,r) (Pascal's identity) Proof: need to show n! (n-1)! (n-1)! ----------- = ------------ + ------------ (n-r)!*r! (n-r)!*(r-1)! (n-r-1)!*r! multiply by (n-r)!*r!/(n-1)! to get n = 1 1 ----------- ------------ + ------------ 1 1/r 1/(n-r) or n = r + (n-r), which is true Theorem: sum_{r=0}^n C(n,r) = 2^n (note row sums of Pascals triangle) proof: 2^n = number of subsets of a set S with n elements = sum_{r=0}^n number of subsets of size r of S = sum_{r=0}^n C(n,r) Binomial Theorem: (x+y)^n = sum_{r=0}^n C(n,r) * x^r * y^{n-r} EX: (x+y)^0= 1 (x+y)^1= 1x + 1y (x+y)^2= 1x^2 + 2xy + 1y^2 (x+y)^3= 1x^3 + 3x^2y + 3xy^2 + 1y^2 (x+y)^4=1x^4 + 4x^3y + 6x^2y^2 + 4xy^3 + 1y^4 Proof: (x+y)^n=(x+y)*(x+y)*...*(x+y) n times what is coefficient of x^r*y^(n-r)? If we multiply out whole expression, get one x^r*y^(n-r) term for each subset of r terms (x+y) out of n from which we choose x, which is C(n,r) EX: what is coeff of x^12 y^13 in (2x-3y)^25 = sum_{r=0}^25 C(25,r) (2x)^r (-3y)^(n-r) = sum_{r=0}^25 C(25,r) 2^r (-3)^(n-r) x^r y^(n-r) = ... - C(25,12) 2^12 3^13 x^12 y^13 ... = ... - 25!/(12! 13!) 2^12 3^13 x^12 y^13 ... = ... - 3.4.. 10^16 x^12 y^13 ... ASK&WAIT: How many bit strings contain exact 5 zeros and 14 ones, if each zero is immediately followed by 2 ones? ASK&WAIT: show that C(2n,2)=2C(n,2)+n^2 ASK&WAIT: show that sum_{k=1}^n k*C(n,k) = n*2^(n-1): EX: How many ways can a class of n students be divided into m person teams? We assume n = q*m, so there are q teams. Solution: Let us start to write down all the ways of dividing all the students into q teams of m students each by writing down all n! ways of ordering n students, and just saying the first m students are the first team, the 2nd m students are the second team and so on. But it is clear that we have counted the same sets of teams too many times. Let us try to divide out n! by the number of multiple copies of the same set of teams. First, it is clear that no matter what the order of the first m students in the list is, we get the same team. Since there are m! such orders, we have counted sets of teams which differ only in the order of the first teams m! times too often, so we should divide by m!. Similarly, the order of the 2nd group of m students does not matter, so we should divide by m! again. The same argument applies to each of the q groups of m students, so we should divide by m! q times. But we are still not not done, because the team consisting of the first m students could appear anywhere in the q possible positions, as could the second group of m students, and so on. In other words, we have still counted the same set of teams q! times too often, because the teams can appear in q! possible orders, and still represent the same set of teams. So we have to divide by q! also. All in all, we get n!/( (m!)^q q!). EX: How many different desserts can you make out of 4 scoops of ice cream, each of which may be chocolate (C), vanilla (V) or strawberry (S)? Here are the 15 possibilities: CCCC VVVV SSSS CCCV VVVC SSSC CCCS VVVS SSSV CCVS VVCS SSCV CCVV VVSS CCSS Here is a more systematic way to get the answers: we will represent each dessert by a sequence of 4 stars (representing the 4 scoops) and 2 bars (dividing the starts into 3 groups: C, V and S). Here are some examples: **|*|* represents 2 Cs, 1 V and 1 S *|**|* represents 1 C , 2 V's and 1 S *|***| represents 1 C , 3 V's and 0 S's |****| represents 0 C , 4 V's and 0 S's ||**** represents 0 C , 0 V's and 4 S's etc The idea is that every sequence of 4 stars and 2 bars represents exactly one dessert. How many such sequences are there? The idea is that we take 6 possible possible positions (for 4 stars and 2 bars) and choose 2 of them for bars. There are C(6,2) = 6!/(2! 4!) = 15 ways to do this. Here is the general result: Theorem: Suppose I have n types of objects ("flavors"). How many different sets ("desserts") consisting of r objects ("scoops") are there? The answer is C(n+r-1,n-1). Proof: The idea is the same as before: each sequence of r stars ("scoops") and (n-1) bars represents a possible set. There are C(n+r-1,n-1) ways to pick n-1 places out of r+n-1 locations to put the bars. Ex: If I have n=3 flavors of ice cream, and make desserts of r=4 scoops, there are C(n+r-1,n-1)=C(3+4-1,3-1)=C(6,2)=15 different desserts. EX: How many anagrams are there of the word "mammal"? Recall that an anagram is a distint ordering of the letters. Here are some smaller examples: the word "the": The 6 anagrams are the, teh, eth, eht, het, hte the word "see": The 3 anagrams are see, ese, ees Here are different ways to try to solve this problem for the word "mammal", followed by the general result: Solution 1: Pick 3 locations for the m's Pick 2 of the remaining locations for the 2 a's Pick the remaining location for l By the product rule, the number of ways to pick locations is C(6,3) ... for the m's * C(3,2) ... for the a's * C(1,1) ... for the l = 20*3*1 = 60 Solution 2: Pick 1 location for the l Pick 3 of the remaining locations for the m's Pick the remaining 2 locations for the a's By the product rule, the number of ways to pick locations is C(6,1) ... for the l * C(5,3) ... for the m's * C(2,2) ... for the a's = 6*10*1 = 60, the same answer (whew!) Solution 3: Let us start by labeling the m's as m1,m2 and m3, and and the a's as a1 and a2, so we can distinguish them. So now we have 6 distinct symbols, m1,a1,m2,m3,a2,l, and the number of ways to order them is 6!. But clearly we have counted some ordering as distinct that we should not, so let's try to divide out by the number of multiple copies. For example, consider all the orderings where the first 3 characters are m's, and the last three are a1,a2,l. The are clearly 3! = 6 such orderings, since m1,m2,m3 can appear in the first three positions in any order, but yield the same anagram. This argument that we are counting each anagram 3! times works no matter where the 3 m's appear, so we should divide the number of orderings by 3! to account for the 3 m's. Similarly, we should divide by 2! to account for the two a's. This yields 6!/ (3! 2!) = 60, the same answer (whew!) Solution 3 is the one that generalizes to arbitrary anagrams: Theorem: Suppose we have n(1) copies of symbol 1 n(2) copies of symbol 2 ... n(k) copies of symbol k Let n = n(1) + n(2) + ... + n(k). Then the number of distinct anagrams constructed from these symbols is n! --------------------------- n(1)! n(2)! n(3)! ... n(k)! Proof: Consider all n! permutations of the n symbols. Some of these are identical: Given a permutation, all n(1)! permutations with symbol 1 in the same positions are identical Given a permutation, all n(2)! permutations with symbol 2 in the same positions are identical ... Given a permutation, all n(k)! permutations with symbol k in the same positions are identical Therefore, we need to divide n! by n(1)!*n(2)!*...*n(k)! to get the correct number of anagrams.