CS70 - Lecture 11 - Feb 11, 2011 - 10 Evans

Goal for Note 6:

Cryptography: when you type in your password into a website,

why can't anyone else who might see it (as it goes by on the

network) decode it?

Any message (character string) is converted to a number M

(a long message is converted to a sequence of numbers).

What happens when a Sender wants to send a secret message to a Receiver:

The Sender takes message M and encrypts it to get the

encrypted message C = f_enc(M)

The Sender sends C to the Receiver. Anyone may "intercept" C on its way.

The Receiver decrypts C to get the original message M = f_dec(C).

For this to work as the Sender and Receiver desire:

f_enc and f_dec have to be inverses of one another, i.e.

M = f_dec(f_enc(M)) for all M

It is easy for the Sender to evaluate f_enc

It is easy for the Receiver to evaluate f_dec

It is very hard for anyone other than the Receiver to evaluate f_dec.

The harder it is, the better the secrecy.

Two kinds of cryptography:

Private key (traditional): need one "Key" for both f_enc and f_dec

where K=Key is a shared secret between Sender, Receiver

EX: xor: C = f_enc(M) = M xor K

think of M, C, K as bit strings of the same length

M = f_dec(C) = C xor K

ASK&WAIT: Why are f_enc and f_dec inverses?

hard to break if K used once

EX: Original Washington/Moscow hotline worked this way

EX: crypt command in UNIX, uses algorithm from German Enigma machine

used in World War II, which was broken by Turing

Secrecy depends on keeping K a secret known only to Sender, Receiver

so only they can evaluate f_enc and f_dec

Disadvantage: if 1000 people want to talk to one another in secret,

need 999*1000 secret keys, so all pairs can talk; too many keys!

Public key: any Sender can do f_enc, but only one Receiver can do f_dec

Advantage: for 1000 people to talk in secret, each person has his/her

own secret f_dec, but can just publish the corresponding f_enc

EX: RSA (Rivest/Shamir/Adleman)

Need: 1) large number n that is product of two different large primes p*q=n

large means at least 200 to 400 decimal digits

We will assume our message satisfies 0 <= M < n;

longer messages can be broken into smaller parts and sent separately.

2) integer e that is relatively prime to (p-1)*(q-1)

3) integer d = multiplicative inverse of e mod (p-1)*(q-1)

Everyone knows n and e, but only Receiver knows d

Then for message M, C = f_enc(M) = M^e mod n is the encrypted message

For encrypted message C, M = f_dec(C) = C^d mod n is the decrypted message

EX: Try 2537=n=p*q=43*59, e=13, message = STOP = (ST,OP)=(1819,1415)

using position of letters in alphabet. Then encrypted message

= ( 1819^13 mod 2537 , 1415^13 mod 2537 ) = ( 2081, 2182 ).

To decrypt we use d = 937 and compute

( 2081^937 mod 2537 , 2182^937 mod 2537 ) = (1819,1415)

We will show that f_enc and f_dec are inverses of one another shortly.

But first, why is f_enc() easy and f_dec() hard to evaluate?

f_enc() requires multiplying by M and taking the remainder mod n,

both of which are easy, even if M and n are large.

f_dec() equally easy if we know d, which only the Receiver knows.

Why is it hard to figure out d? All you have to do is

1) factor n=p*q

2) use Euclidean algorithm to compute d so d*e ==1 mod (p-1)*(q-1)

But 1) is very hard: Best algorithms would take billions of years

if n has 400 digits. And any other known algorithm to compute d

leads to computing p and q too. So quality of encryption depends on

large integers being very hard to factor. If you figure out an

algorithm to factor quickly, you can become rich or famous.

Proof that f_dec() is inverse of f_enc requires

Fermat's Little Theorem (proof later)

If p is prime and p does not divide a evenly, then a^(p-1) == 1 mod p

Corollary: If p is prime then for any positive numbers a and b, a^(1+b*(p-1)) == a mod p

Proof:

Case 1: If p divides a, then a^(1+b*(p-1)) == 0 == a mod p

Case 2: if p does not divide a, then by Fermat's Little Theorem

a^(1+b*(p-1)) == a*(a^(p-1))^b == a*1^b == a mod p

Proof that f_dec(f_enc(M)) = M, where 0 <= M < p*q:

f_dec(f_enc(M)) = f_dec(M^e mod n) = (M^e)^d mod n = M^(e*d) mod n.

Since e*d == 1 mod (p-1)*(q-1), we can write e*d = 1 + m*(p-1)*(q-1) for some m.

Then by using the Corollary twice we get

M^(e*d) = M^(1 + m*(p-1)*(q-1)) == M mod p

M^(e*d) = M^(1 + m*(p-1)*(q-1)) == M mod q

Thus both p and q divide (M - M^(e*d)), and since they are

different primes, their product n = p*q also divides (M-M^(e*d)), ie.

M^(e*d) == M mod n

or M = M^(e*d) mod n as desired.

To finish cryptography, we need a proof of Fermat's Little Theorem:

Thm: If p is prime and p does not divide a evenly, then a^(p-1) == 1 mod p

Here are some "numerical experiments" to devise a proof conjecture:

consider integers 1 <= i < p, for some prime p, say p=7.

Try multiplying them by any integer mod p, see what you get:

1 2 3 4 5 6

*2 mod 7 => 2 4 6 3 5 7

*3 mod 7 => 3 6 2 5 1 4

*4 mod 7 => 4 1 5 2 6 3

*5 mod 7 => 5 3 1 6 4 2

*6 mod 7 => 6 5 4 3 2 1

ASK&WAIT: What is the pattern?

Can see same pattern for any prime p

Conjecture (proven shortly): given any prime p and any 1 <= a < p,

the numbers a*1 mod p, a*2 mod p , ... a*(p-1) mod p are

all different, i.e. just a permutation of 1,...,p-1

Now take their product:

(p-1)! = (a*1) mod p * (a*2) mod p *...*(a*(p-1)) mod p

or

(p-1)! == (a*1*a*2*...a*(p-1)) mod p

== a^(p-1) (p-1)! mod p

Suppose we could "divide by" (p-1)!;

would get 1 == a^(p-1) mod p as desired

Now let's do proof carefully:

Proof of Conjecture: suppose 1 <= x,y < p , x neq y

so -(p-1) <= x-y <= p-1, x neq y

so p does not divide x-y

so p does not divide a*(x-y)

so a*x mod p neq a*y mod p

In other words, a*1 mod p, a_2 mod p , ... , a*(p-1) mod p

all different as conjectured.

So now we have (p-1)! == a^(p-1)*(p-1)! mod p, and want

to conclude 1 == a^(p-1) mod p

ASK&WAIT: What did we prove last time that lets us do this?

Thus (p-1)!*x == 1 mod p has unique solution, multiply through to get

(p-1)!*x == a^(p-1)*(p-1)!*x mod p

or

1 == a^(p-1)*1 mod p

as desired. This completes the proof of Fermat's Little Theorem.

You can show even more, that (p-1)! == -1 mod p (Wilson's Theorem)

For RSA to be useful, we need to find a lot of large primes.

It turns out that there are so many primes, you can just

pick numbers randomly and test if they are prime;

there are enough primes that chances are you won't have

to test too many random numbers before finding one.

Def: pi(n) = the number of primes <= n

Ex: pi(20) = |{2,3,5,7,11,13,17,19}| = 8

Theorem (Prime Number Theorem): The limit as n -> infinity of

pi(n) / (n/ log_e n) = 1

EX: n pi(n) n/log_e(n) pi(n)/ (n/log_e n)

10^1 4 4.3 .92

10^2 25 21.7 1.15

10^3 168 144.8 1.16

10^4 1229 1085.7 1.13

10^5 9592 8685.9 1.10

10^6 78498 72382.4 1.08

10^7 664579 620420.7 1.07

10^8 5761455 5428681.0 1.06

The point is that the ratio in the last column is slowly approaching 1

So about what fraction of 200 decimal digit numbers are prime?

# 200 digit primes / # 200 digit numbers

= ( pi(10^200) - pi(10^199) ) / (10^200 - 10^199 )

~ ( 10^200/log_e(10^200) - 10^199/log_e(10^199) ) / (10^200 - 10^199)

~ .002 or about 1 out of 500

So if you pick 500 random 200 digit numbers,

there is a reasonable chance that one is prime.

But we still need a quick test that a particular number is

prime. We already said that trying to factor big numbers

is too expensive (which is why we can use RSA safely in

the first place!), so we need something cheaper.

It turns out that Fermat's Little Theorem tells us (almost)

all we need: any prime p satisfies a^(p-1) == 1 mod p, which

is cheap to test for some randomly chosen a not divisible by p;

if a^(p-1) is not == 1 mod p, we are sure p is not prime.

But if a^(p-1) == 1 mod p for enough randomly chosen a, we have

strong evidence that p is prime. This is not quite enough

(there is a set of nonprimes, called "Carmichael numbers", that

pass this test), but the test can be improved to identify

primes reliably.