(Adminstrivia: relation to CS276, CS261; office hours; workload; etc.) How to break simple substitution ciphers: - Frequency analysis. (e has prob. 0.12, t,a,o,i,n,s,h,r has 0.06-0.09, d,l 0.04, c,u,m,w,f,g,y,p,b 0.015-0.028, v,k,j,x,q,z < 0.01) - Digraph analysis (most common: th, he, in, er, an, re, ed, on, es, st, en, at, to, nt, ha, nd, ou, ea, ng, as, or, ti, is, et, it, ar, te, se, hi, of (in order)) - Trigraphs (the, ing, and, her, ere, ent, tha, nth, was, eth, for, dth) - Word patterns (e.g., WXYYXYYXZZX probably corresponds to MISSISSIPI) - Probable plaintext (e.g., pick the word "mathematics" and drag it through the ciphertext) - Vowel-consonant contacts (the low-frequency things are probably consonants; vowels contact them a lot; and you can iterate) Q: relate this to Hidden Markov models do not worry about your difficulties in mathematics. i assure you that mine are greater. - albert einstein kt vtx ztrra pltex ater kuiiuoeyxumd uv gpxnmgpxuod. u pdderm ate xnpx guvm prm jrmpxmr. - pylmrx muvdxmuv abcdefghijklmnopqrstuvwxyz ykqsupmjfgdbehcazrvoinxtlw plokmijnuhbygvtfcrdxeszwaq abcdefghijklmnopqrstuvwxyz above ciphertext has 86 letters, in total frequency analysis of above ciphertext: 10: x m 5: v e d 9: u 3: g a 8: r p 2: y o n l k i 6: t 1: z j digraph analysis of above ciphertext: 4: uv px 3: te rm 2: xu xn xm uo mu mr gp er at trigraph analysis of above ciphertext: 2: gpx muv ate word patterns: kuiiuoeyxumd: difficulties missionaries gpxnmgpxuod: amalgamated amalgamates mathematics u: a i How to break polyalphabetic substitution (extended Vigenere) ciphers: - Determine the period - Kasiski's method: repeated plaintext trigraphs leak through into ciphertext; look for distances between repeated segments, and take gcd (use method to account for accidental repeats) - Index of Coincidence: see HAC, Stinson If r.v. X is some text (n characters worth) and i,j are uniform r.v.'s on {1,..,n}, let f(X) = Pr[X_i = X_j]. Then f(X) ~ 0.065 if X is a random English text; f(X) = 1/26 ~ 0.038 if X is random white noise; and we can estimate f(X) from a simple sub. ciphertext x by computing \sum_a p_a^2, where p_a denotes the fraction of characters of x that are equal to a. This gives us a test to distinguish between encryptions of English text and white noise, and so we can guess the period and use this test to confirm whether our guess is correct. - Autocorrelation: Compute g(t) = \sum_i 1_{x_i = x_{i+t}}; if the cipher has period t, we expect g(t) to be large (say, g(t) ~ 0.065 n); otherwise, g(t) should be smaller (say, g(t) ~ 0.038 n). - Then, separate into t separate simple substitutions, and solve each separately, possibly using digraphs to link them together. How to break two-time pads: - Text-xors. Pick a crib, drag it through, extend on both ends. How to break linear ciphers: - Linear algebra. => Lessons: Types of attack (ciphertext-only, ...); complexity of attack (data complexity, workfactor, ...)