CS 61B: Lecture 23 Friday, October 16 Transposition Tables: Using a Dictionary to Speed Game Trees ============================================================= An inefficiency of unadorned game tree search is that some grids can be reached through many different sequences of moves, and so the same grid might be evaluated many times at great expense. To save some of this cost, we maintain a record of previously encountered grids, called a "transposition table." We implement this table using a dictionary (possibly implemented as a hash table). Each time we compute a grid's score, we insert into the dictionary an object that consists of the grid and its associated score. The "key" that allows us to look up an object like this later is a grid (without its associated score). Each time chooseMove() evaluates a grid, it should first check whether the grid is in the transposition table; if so, its score is returned immediately. Otherwise, its score is evaluated recursively and stored in the transposition table. However, we do not store grids that occur below a certain (experimentally determined) depth in the tree, because high-depth nodes are too numerous and are repeated too rarely. DICTIONARIES ============ Let's leave aside game trees for a few moments and consider a simpler problem. Suppose you have a set of two-letter words and their definitions. You want to be able to look up the definition of any word, very quickly. For simplicity, assume that for each word, there is a Java object containing the word and its definition. The two-letter word is the "key" that addresses the object. Since there are 26 English letters, there are 26 * 26 = 676 possible two-letter words. To implement a dictionary, we declare an array of 676 references, all initially set to null. To insert a Definition into the dictionary, we define a function hash() that maps each two-letter word (key) to a unique integer between 0 and 675. We use this integer as an index into the table, and make the corresponding table position refer to the Definition object. A simple implementation appears below. Note that "Definition" is a subclass of "Word", and hash() is a method that maps each two-letter word to a different integer. public class Word { || public class Definition extends Word { protected String wordString; || String defString; } || } // Note: A Word is a key. || // Note: a Definition contains a key. public class WordDictionary { private Definition[] defTable = new Definition[676]; private static int hash(String key) { return 26 * (key.charAt(0) - 'a') + (key.charAt(1) - 'a'); } public void insert(Definition newDef) { defTable[hash(newDef.wordString)] = newDef; } Definition find(Word findWord) { return defTable[hash(findWord.wordString)]; } public void remove(Word killWord) { defTable[hash(killWord.wordString)] = null; } } We can store a dictionary of Tic Tac Toe grids similarly. Each of the nine squares contains an X, contains an O, or is empty; hence, there are 3^9 = 19683 possible grids, each of which is the key to a different position in the dictionary's table. Returning to our dictionary: What if we want to store every English word, regardless of length? The table "defTable" must be long enough to accommodate pneumonoultramicroscopicsilicovolcanoconiosis, at 45 letters the longest word in the English language. Unfortunately, declaring an array of length 26^45 is out of the question. English has fewer than one million words, so we should be able to do much better. Hash Tables (the most common implementation of dictionaries) ----------- Suppose n is the number of keys (words) whose definitions we want to store, and suppose we use a table of size s, where s is perhaps a little larger than n. We change hash() into a "hash function" that maps any key to a number between 0 and s-1. Here is a good example that works on any String. private static int hash(String key) { int hashVal = 0; for (int i = 0; i < key.length(); i++) { hashVal = (128 * hashVal + key.charAt(i)) % s; } return hashVal; } Observe that inside the loop, "hashVal" is the remainder from something divided by s; hence, it is guaranteed to be in the range between zero and s-1. (We're assuming that s is not so large as to allow hashVal to overflow.) With our new hash() method, no matter how long and variegated the keys are, we can map them into a table whose size is not much greater than the actual number of items we want to store. However, we've created a new problem: several keys might be hashed to the same position in the table (hash(key1) == hash(key2)). This circumstance is called a "collision." How do we prevent items from being erased in favor of other items? To deal with collisions, we use a simple idea called "chaining." Instead of having each element of the table refer to one item, we have it refer to a linked list of items, called a "chain". All the keys that hash to a given position reside somewhere in the linked list at that position. |--------------------------------------------------------- defTable > | * | * | null | * | null | * | * | ... |---|-------|---------------|---------------|-------|----- v v v v v ------- ------- ------- ------- ------- |bloog| |tongo| |prubb| |zoiks| |yeeha| | * | | null| | * | | null| | null| ---|--- ------- ---|--- ------- ------- v v ------- ------- |mozza<-- Reference |tuber| ^ | null| to an item | null| < chains ------- ------- insert(): Hash the item's key to establish its position in the table. Search the linked list for another item having the same key. If such an item exists, replace it with the new item; otherwise, insert the new item into the list. find(): Hash the key to establish its position in the table. Search the linked list for an item with the given key. Return it if found; otherwise, return null or throw an exception. remove(): Hash the key to establish its position in the table. Search the linked list for an item with the given key. Remove it from the list if found; otherwise, do nothing or throw an exception. As long as the linked lists are short (e.g., n ~ s, so the average number of items per position is about one), all these operations take O(1) time. However, if the table is too small for the number of items (n >> s), performance will be dominated by linked list operations and will degenerate to O(n) time (albeit with a much smaller constant factor than if you used a single linked list). A precise analysis requires more probability theory than we want to use here. Hash Functions -------------- Hash functions are a bit of a black art. The ideal hash function would map each key to a uniformly distributed random integer from zero to s-1. (By "random", I don't mean that the function is different each time; a given key always hashes to the same integer. I mean that two different keys, however similar, will hash to independently chosen values, so the probability they'll collide is 1/s.) Unfortunately, this ideal seems impossible to obtain. The best way to understand good hash functions is to understand why bad hash functions are bad. Here are some examples of bad hash functions on Words. [1] Sum up the ASCII values of the characters. Unfortunately, the sum will rarely exceed 150 or so, and all the Definitions will be bunched up at the beginning of the table. [2] Use the first three letters of a word, in a table with 26^3 entries. Unfortunately, words beginning with "pre" are much more common than words beginning with "xzq", and the former will be bunched up in one long list. This does not approach our uniformly distributed ideal. [3] Consider the "good" hash() function written out above. Suppose the table size s is 128. Then the return value is just the last character of the word, because we take the remainder modulo 128. This makes for a lousy hash function. For this reason, it's best to choose a prime number for your table length s; that way, each "% s" (remainder) operation mixes up the bits of hashVal. (In the interest of mixing bits, it's also a good idea to replace "128" with a prime multiplier.) Assuming s is prime, why is the hash() function presented above good? Because we can find no obvious flaws, and it seems to work well in practice. (I told you it was a black art.) Of course, an ideal hash function should also be really really fast. See Weiss Figure 19.2 for a faster hash function that uses the slow % operator only once. Resizing Hash Tables -------------------- Sometimes we can't predict in advance how many items we'll need to store. If the number of items n grows significantly larger than the table size s, we are in danger of losing constant-time performance. One option is to enlarge the hash table when n becomes too large. (I.e., when n > cs for some constant c which you should determine experimentally for each applcation. You can't derive c from first principles, but c=1 will always do if you don't need perfectly optimized performance). "Enlarge" means that you allocate a new table (typically of twice the length), then walk through all the items in the old table and rehash them into the new. Take note: you CANNOT just copy the items (or linked lists) from the old table to the new, because the hash functions of the two tables will certainly be incompatible. You have to rehash each item separately. You can also shrink hash tables (e.g., when n < cs/4) if you think the freed-up memory will benefit something else. (Practical examples of this being worth the effort are rare.) Obviously, an operation that causes a hash table to resize itself will take more than O(1) time; nevertheless, the average over the long run is still O(1) time per operation. Postscript on Probing --------------------- A well-known alternative to chaining, which does not require the use of linked lists, is called "probing". Probing is mildly interesting, and you can read about it in Weiss if you are moved to do so. Chaining is nearly always preferred to probing in practice. Why? First, probing does not allow n to exceed s; the hash table is full when n=s, and must be enlarged to accommodate more entries. Second, probing is often slower (and never faster) than chaining, because there are more collisions. Probing has one advantage over chaining: it uses less memory (perhaps by up to a factor of two), because it does not use linked lists. However, if your application is pushing the memory envelope enough for this difference to matter, the advantage will be partly offset by the fact that a chaining hash table can grow slowly (in entries) until it fills memory, whereas a probing hash table cannot (because when you enlarge the hash table, you have to create a new array before destroying the old one).