CS271: RANDOMNESS & COMPUTATION

INSTRUCTOR: Alistair Sinclair (sinclair@cs, 677 Soda)
LECTURES: Tuesday, Thursday 9:30-11:00 in 310 Soda
OFFICE HOURS: Monday 1:00-2:00, Tuesday 11:00-12:00 in 677 Soda
TA: Greg Valiant (gvaliant@eecs, 615 Soda)
OFFICE HOURS: Monday 2:00-3:00 in 611 Soda, Friday 2:00-3:00 in 751 Soda

Recent Announcements

• (12/17) Problem Set 3 has been graded. Sample solutions are posted below. Graded solutions can be picked up from my office after Jan 3rd. Grades have also been assigned for the class. Everybody did pretty well. Have a good break and Happy New Year!
• (12/3) The timing of office hours this Monday (Dec 5) will change. AS office hour will be 3-4 (instead of 1-2). GV office hour will be 10-12 (instead of 2-3). Also, AS office hour on Tuesday (Dec 6) will change from 11-12 to 12-1.
• (11/30) Problem Set 3 is posted below; it covers the material in Lectures 21 to 27, and is due by 5pm on Wednesday, December 14. As always, start early!
• (11/22) Problem Set 2 has been graded. Sample solutions are posted below. Those who did not pick up their graded solutions in class today can do so during office hours or at next Tuesday's lecture. Happy Thanksgiving!
• (11/10) Yet another (minor) correction for Q5(b). As somebody pointed out, it is not in fact "obvious" that connectivity is a monotonically increasing property, since adding a new point can actually cause a connected graph to become disconnected in this model. (You might like to check this!) So the translation from the PPP to the original model as given below isn't quite correct. However, the translation can still be made via the following observation. The number of points in the PPP will be exactly n with probability e^{-n}(n^n)/(n!) > c/sqrt{n} for a constant c. So if we show that in the PPP the graph is connected with probability much less than 1/sqrt{n} (in fact it will be exponentially small) then we can conclude that in the original n-point model Pr[G is connected] -> 0. So the bottom line is that it's still fine for you to follow the outline in the hint below; it's just that the justification for transferring to the PPP is different.
• (11/9) The current hint for Q5(b) on HW2 is misleading. Here is a modified hint that contains some simplifications and corrections; apologies for the confusion. [Note: If you have come up with an alternative argument for this part that does not use the hint, that is fine.]
Hint for Q5(b)
1. First, you can make your life alot easier by assuming that the points are distributed in the unit square according to a Poisson Point Process (PPP) of intensity n. This means that the number of points in any subregion A has a Poisson distribution with parameter n x area(A), and the numbers of points in disjoint subregions are independent. (This independence makes things much simpler.) Since the property of being connected is obviously monotonically increasing with the number of points, it follows by exactly the same argument as in the proof of Theorem 14.7 in Lecture 14 that Pr[G is connected] <= 4 x Pr'[G is connected] where Pr denotes the probability in the original n-point model, and Pr' denotes the probability in the PPP model. Thus we can work in the PPP model and show that Pr'[G is connected] -> 0.
2. Since we're working in the PPP model, you will probably need a Chernoff-type bound for a Poisson r.v. You may assume that a Poisson r.v. X satisfies exactly the same form of tail bounds as the Angluin bounds for a binomial r.v., as given in Corollary 13.3. (This bound for the upper tail follows immediately by substituting \lambda=\beta\mu in the bound you derived in Q1(c) of the present HW. The bound for the lower tail follows by a completely analogous argument.)
3. The strategy outlined in the original hint is still valid, except that condition (iii) in the definition of a "bad" set of discs should be modified slightly as follows. Condition (iii) should read: "The intersection of D_5 -D_3 with each disc of radius 1.5r centered at a set of points spaced equally at distance 0.01r around the boundary of D_3 contains at least (k+1) points." This is the same as the previous condition, except that the radius of the discs is a bit smaller and (most important) the number of discs involved is small (actually constant). In addition to verifying the claimed lower bound on the probability that a given set of three discs is bad, you should explain clearly why the presence of a bad set of discs ensures that G is not connected.
• (11/8) On HW2, there are a couple of problems with Q5(b) as currently stated. Some additional hints/corrections will be provided shortly. In the meantime, you are encouraged to work on the other problems first. Also, in Q6 it is possible to improve on the constant 8 in the denominator of the exponent; so if you get a better constant - correctly justified - then that is fine. Finally, in Q1(c) you should ignore the point about the lower tail.
• (11/8) In Q2 of HW2, to avoid possible confusion, note that the factor \omega(n) is not necessary to obtain the result of the "Deduce" part. (In fact, a constant will do instead.)
• (11/8) The venue for Greg's replacement office hour this week (Thursday 2-3pm) is his office, 615 Soda.
• (11/7) There is a typo in Q5(a) of HW2. The radius should be \sqrt{(10 log n)/n}, rather than (10 log n)/n. Apologies for the confusion.
• (11/7) It's been pointed out to me that the deadline for HW2 is on 11/11, a holiday. Therefore, the deadline is extended to 5pm Monday 11/14. For the same reason, Greg's second office hour this week will move from Friday 2-3 to Thursday 2-3, venue TBA.
• (10/31) Here are the venues for Greg Valiant's office hours: Monday 2-3pm in 611 Soda; Friday 2-3pm in 751 Soda.
• (10/29) There will be NO LECTURE next Thursday (November 3rd). Tuesday's lecture will take place as usual. Please use the time to work on Problem Set 2.
• (10/29) Problem Set 2 is posted below; it covers the material in Lectures 13 to 20, and is due by 5pm on Friday, November 11. As always, start early!
• (10/29) Some people have asked how the hw scores translate to final course grades, and in particular whether low hw scores could lead to somebody failing the class. Basically the grading scheme will be set so that anybody who has made a decent attempt at all three problem sets will pass the class. (Thus, for example, nobody is in any danger of failing based on HW1.)
• (10/29) The class now has a TA: Greg Valiant, email gvaliant@eecs. Starting this coming week (Monday October 31), Greg will hold office hours on Mondays and Fridays from 2 to 3pm. If you need to speak to Greg and are unable to make either of those times, you can send him email to arrange an alternative time.
• (10/24) Problem Set 1 has been graded. Sample solutions are posted below. Graded solutions will be returned in class tomorrow, or can be picked up during office hours.
• (10/4) Assessment: The class grade will be based on three Problem Sets, the first of which is posted below. The last Problem Set will be due after the end of classes. There will be no final exam.
• (10/1) The first Problem Set is posted below; it covers the material in Lectures 1 to 11, and is due by 5pm on Friday, October 14. Start early!
• (9/7) Following the move to a larger classroom, all waitlisted students have now been admitted.
• (9/2) THE CLASSROOM HAS BEEN CHANGED TO 310 SODA, STARTING WITH THE NEXT LECTURE (TUESDAY SEPT 6).
• (8/25) Our classroom may be changed as a result of overcrowding. Please watch this space for announcements.
• (8/25) The class is oversubscribed. If you decide to drop the class, please de-register immediately so that another student can be admitted. If the class remains full, it may be necessary to limit enrollment to graduate students. If you plan to audit the class (i.e., come to lectures, but not do assessed work or receive a letter grade), you should enroll in the class with the S/U option.

Description

One of the most remarkable developments in Computer Science over the past 30 years has been the realization that allowing computers to toss coins can lead to algorithms that are more efficient, conceptually simpler and more elegant that their best known deterministic counterparts. Randomization has since become such a ubiquitous tool in algorithm design that any kind of encyclopedic treatment in one course is impossible. Instead, I will attempt to survey several of the most widely used techniques, illustrating them with examples taken from both algorithms and random structures. A tentative and very rough course outline, just to give you a flavor of the course, is the following:
• Elementary examples: e.g., checking identities, fingerprinting and pattern matching, primality testing.
• Moments and deviations: e.g., linearity of expectation, universal hash functions, second moment method, unbiased estimators, approximate counting.
• The probabilistic method: e.g., threshold phenomena in random graphs and random k-SAT formulas; Lovász Local Lemma.
• Chernoff/Hoeffding tail bounds: e.g., Hamilton cycles in a random graph, randomized routing, occupancy problems and load balancing, the Poisson approximation.
• Martingales and bounded differences: e.g., Azuma's inequality, chromatic number of a random graph, sharp concentration of Quicksort, optional stopping theorem and hitting times.
• Random spatial data: e.g, subadditivity, Talagrand's inequality, the TSP and longest increasing subsequences.
• Random walks and Markov chains: e.g., hitting and cover times, probability amplification by random walks on expanders, Markov chain Monte Carlo algorithms.
• Miscellaneous additional topics as time permits: e.g., statistical physics, reconstruction problems, rigorous analysis of black-box optimization heuristics,...

Prerequisites

Mathematical maturity, and a solid grasp of undergraduate material on Algorithms and Data Structures, Discrete Probability and Combinatorics. If you are unsure about the suitability of your background, please talk to me before committing to the class.

Registration

Following department policy, all students - including auditors - are requested to register for the class. Auditors should register S/U; an S grade will be awarded for class participation and satisfactory scribe notes. If there is excessive demand for the class, it may be necessary to limit enrollment to full-time graduate students. Those who decide to drop the class are requested to do so promptly so that others may take their place.

Suggested References

There is no required text for the class, and no text that covers more than about one third of the topics. However, the following books cover significant portions of the material, and are useful references.
• Noga Alon and Joel Spencer, The Probabilistic Method (3rd ed.), Wiley, 2008.
• Svante Janson, Tomasz Łuczak and Andrzej Ruciński, Random Graphs, Wiley, 2000.
• Geoffrey Grimmett and David Stirzaker, Probability and Random Processes (3rd ed.), Oxford Univ Press, 2001.
• Michael Mitzenmacher and Eli Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis, Cambridge Univ Press, 2005.
• Rajeev Motwani and Prabhakar Raghavan, Randomized Algorithms, Cambridge Univ Press, 1995.

Scribe Notes

Scribe notes for all lectures will be posted on this web page shortly after each lecture. These will be based on (edited versions of) scribe notes from previous renditions of the class. In some cases, where there is substantial new material, I may request volunteers to write scribe notes for occasional classes.

Assessment etc.

The assessment mechanism will depend on the final composition of the class and will be announced later. A major (and possibly the only) component will be a small number of sets of homework exercises distributed through the semester. You are encouraged to do the Exercises sprinkled through the scribe notes as we go along, as these will ensure that you absorb the material in real time and should make the homeworks more manageable. If the class is not too large, students may also be asked to present a paper at the end of the semester.