TOC PREV NEXT

Program — Hashing exercises


Goals

This assignment gives you practice devising and evaluating hash functions, and using the java.util.Hashtable class.

Related quizzes

Hashing.

Readings

Weiss (either edition), chapter 20.

The document "Real-world hash functions".

Miscellaneous information

In this course, we are concerned only with chained hash tables. Techniques discussed in Weiss sections 20.3 and 20.4 that store colliding elements elsewhere in the table are not used frequently enough to be of interest here.

Examples here use the java.util.Hashtable class. Online documentation of this class is at

	http://java.sun.com/j2se/1.5/docs/api/index.html

There are two potentially confusing aspects of the java.util.Hashtable class.

Two alternatives to Hashtable introduced in recent versions of Java are HashMap (similar to Hashtable) and HashSet. Consult the online documentation for more information about these classes. You may use them instead of Hashtable if you wish.

Problem

Do the exercises described on the following pages.

Hashing exercises

Exercise 1

The program in ~cs47b/lib/HashTest.java and Hashtable.java reads words from a file and stores them into a hash table of linked lists. The name of the file and the size of the table are specified as command-line arguments. When the input has all been read, the program prints statistics about the hash table:

Find out, by testing each of the four hash functions in Hashtable.java on all hash table sizes between 1040 and 1050, which function performs best and which performs worst on the strings in the file ~cs47b/lib/words. (The file ~cs47b/lib/testHash is a shell script intended to simplify the data collection.) In particular, find out what table sizes the functions work well for and what table sizes lead to poor performance. In general, the lower the average number of comparisons, the better, and for tables with averages approximately equal, the lower the standard deviation, the better.

The four hash functions are given below. In the first three formulas, ck is the kth character of the word to be hashed, where k ranges from 1 to n, the length of the word.

  1. (a variant on the ETH algorithm) h0 = 1; hk = ck * ((hk-1 mod 257)+1);
    H = hn mod
    the table size.
  2. (a variant on the GNU-cpp algorithm) h0 = 0; hk = 4hk-1+ck;
    H =
    the last 31 bits of hn, mod the table size.
  3. (a variant on the GNU-cc1 algorithm) h0 = n; hk = 613hk-1+ck;
    H =
    last 30 bits of hn, mod the table size.
  4. The String hashCode member function.

Exercise 2

The code in ~cs47b/lib/TTThashTest.java stores all possible Tic-tac-toe boards into a java.util.Hashtable object. Add methods to the TTTBoard class that test two ways to hash Tic-tac-toe boards. In one, you should convert the board to a String and then use the String hashCode function. In the other, you should interpret the Tic-tac-toe board as a nine-digit base 3 numeral and return the corresponding integer as the hash value. Compare the performance of these two implementations.

In many Java environments, String methods are "native", that is, they are supplied in the machine language of the computer on which the program is executing. The data you collect in this exercise should indicate whether or not this is the case on the computer you're using.

Exercise 3

Two CS 47B students have a jug filled with eight liters of Jolt Cola that they wish to divide evenly between them. They have two empty jugs with capacities of five and three liters respectively. These jugs are unmarked and no other measuring device is available. How can they accomplish the equal division?

The file ~cs47b/lib/JugSolver.fw.java contains part of a program to solve this problem. The program recursively searches to find a sequence of steps (each step pours one jug into another) that produces the desired amount. The program has a flaw, namely that it doesn't keep track of configurations that it has seen before; infinite recursion results.

Without changing any of the existing code (except perhaps to change the value of the DEBUGGING variable), add statements that fix the program. In particular, you are to use a java.util.Hashtable object to keep track of configurations already seen. Test your completed program on the arguments

	8 5 3 8 0 4

which correspond to the problem stated above. Indicate, on the program listing you bring to the Self-Paced Center for grading, what you added to the framework code.

Checklist

HashTest.java
public class HashTest {

	static BufferedReader reader (String fileName) throws Exception {
		// Code omitted to save paper.
	}
	
	// Arguments are the file of words and the table size.
	public static void main (String [ ] args) throws Exception {
		if (args.length != 2) {
			System.err.println ("Wrong number of arguments.");
			System.exit (1);
		}
		BufferedReader wordReader;
		int tableSize = Integer.parseInt (args[1]);
		Hashtable table = new Hashtable (tableSize);
		String word;
		
		// Load the words into the table.
		int wordCount = 0;
		wordReader = reader (args[0]);
		do {
			try {
				word = wordReader.readLine ();
			} catch (Exception e) ...
			if (word == null) {
				break;
			} else {
				wordCount++;
				table.put (word, new Integer(wordCount));
			}
		} while (true);
		
		// Now see how long it takes to look up all the words.
		wordReader = reader (args[0]);
		long startTime = System.currentTimeMillis ( );
		do {
			try {
				word = wordReader.readLine ();
			} catch (Exception e) ...
			if (word == null) {
				break;
			} else {
				boolean result = table.containsKey (word);
			}
		} while (true);
		long finishTime = System.currentTimeMillis ( );
		System.out.println ("Time to hash " + wordCount + " words is " 
			+ (finishTime-startTime) + " milliseconds.");
		table.printStatistics ( );
	}
}
Hashtable.java
public class Hashtable {

	private static boolean DEBUGGING = false;
	private LinkedList [] myTable;
	
	public Hashtable (int size) {
		myTable = new LinkedList [size];
		for (int k=0; k<size; k++) {
			myTable[k] = new LinkedList ( );
		}
	}
	
	private static long hash (String key) {
		// Uncomment one of the following lines to use the corresponding hash function.
		// return hash1 (key);
		// return hash2 (key);
		// return hash3 (key);
		// return Math.abs (key.hashCode ( ));
	}
	
	// Slight variation on the ETH hashing algorithm
	private static int MAGIC1 = 257;
	private static long hash1 (String key) {
		long h = 1;
		for (int k=0; k<key.length(); k++) {
			h = ((h % MAGIC1)+1) * (int) key.charAt(k);
		}
		return h;
	}
	
	// Slight variation on the GNU-cpp hashing algorithm
	private static int MAGIC2 = 4;
	private static long hash2 (String key) {
		long h = 0;
		for (int k=0; k<key.length(); k++) {
			h = MAGIC2 * h + (int) key.charAt(k);
		}
		return h << 1 >>> 1;
	}
	
	// Slight variation on the GNU-cc1 hashing algorithm
	private static int MAGIC3 = 613;
	private static long hash3 (String key) {
		long h = key.length();
		for (int k=0; k<key.length(); k++) {
			h = MAGIC3 * h + (int) key.charAt(k);
		}
		return h << 2 >>> 2;
	}
	
	// Add the key to the table. The value is included just for compatibility with
	// the Hashtable class in java.util.
	public void put (String key, Integer value) {
		int h = (int) (hash (key) % myTable.length);
		if (!myTable[h].find (key)) {
			myTable[h].insert (new Info (key, value));
			if (DEBUGGING) {
				System.out.println ("Inserting " + key);
			}
		} else {
			System.err.println (key + " already in table.");
		}
	}
	
	// Return true if key is in the table, and false otherwise.
	public boolean containsKey (String key) {
		int h = (int) (hash (key) % myTable.length);
		return (myTable[h].find(key));
	}
	
	// Print statistics about the table:
	//		the maximum length of a collision list;
	//		the optimal length of a collision list;
	//		the average number of comparisons needed for a successful search;
	//		the standard deviation for the number of comparisons needed for
	//		 a successful search.
	public void printStatistics ( ) {
		// Code omitted to save paper.
	}
}
// Auxiliary classes follow in the file; code is omitted to save paper.
TTThashTest.java
import java.util.*;

public class TTThashTest {
// Measure the time to put all possible Tic-tac-toe boards into the hash table.

	public static void main (String [ ] args) {
		Hashtable table = new Hashtable ( );
		long startTime = System.currentTimeMillis ( );
		for (int k=0; k<19683; k++) {
			TTTboard b = new TTTboard (k);
			table.put (b, new Integer (k));
		}
		long finishTime = System.currentTimeMillis ( );
		System.out.println ("Time to insert all Tic-tac-toe boards = "
			+ (finishTime-startTime));
	}
}
TTTboard.java
public class TTTboard {

	private StringBuffer myBoard;
	
	// Initialize a blank Tic-tac-toe board.
	public TTTboard ( ) {
		myBoard = new StringBuffer ("         ");
	}
	
	// Initialize a Tic-tac-toe board from the given value
	// by interpreting the value in base 3 and representing
	// a 0 as a blank, a 1 as an X, and a 2 as an O.
	public TTTboard (int base3) {
		myBoard = new StringBuffer ("         ");
		for (int k=8; k>=0; k--) {
			int digit = base3 % 3;
			switch (digit) {
			case 0:
				myBoard.setCharAt (k, BLANK); 
				break;
			case 1: 
				myBoard.setCharAt (k, 'X'); 
				break;
			case 2:
				myBoard.setCharAt (k, 'O'); 
				break;
			}
			base3 = base3/3;
		}
	}
	
	final char BLANK = ' ';
	
	// More methods appear here, not particularly relevant to this exercise.
	
	// Print the board.
	public void print ( ) {
		// Code omitted to save paper.
	}
}
JugSolver.fw.java
import java.util.*;
public class JugSolver {

	private int desired;
	private int capacity [ ];
	
	public JugSolver (int amt0, int amt1, int amt2, int d) {
		capacity = new int [3];
		capacity[0] = amt0;
		capacity[1] = amt1;
		capacity[2] = amt2;
		desired = d;
	}
	
	// Try to solve the puzzle, starting at configuration b.
	public boolean tryPouring (JugContents b) {
		debugPrint (b.toString ( ));
		if (b.jugs[0] == desired) {
			debugPrint ("Jug 0 contains " + b.jugs[0]);
			return true;
		} else if (b.jugs[1] == desired) {
			debugPrint ("Jug 1 contains " + b.jugs[1]);
			return true;
		} else if (b.jugs[2] == desired) {
			debugPrint ("Jug 2 contains " + b.jugs[2]);
			return true;
		}
		
		// You add some code here.
		for (int i=0; i<3; i++) {
			for (int j=0; j<3; j++) {
				if (i!=j && tryPouring (b.pour (i, j))) {
					System.out.println ("Pouring from jug " + i 
						+ " to jug " + j);
					return true;
				}
			}
		}
		return false;
	}
	
	// Arguments are three jug capacities, plus the contents of jugs 0 and 1,
	// plus the desired amount.
	public static void main (String [ ] args) throws Exception {
		if (args.length != 6) {
			System.err.println ("Wrong number of arguments.");
			System.exit (1);
		}
		JugSolver puzzle = new JugSolver (Integer.parseInt (args[0]),
			Integer.parseInt (args[1]), Integer.parseInt (args[2]),
			Integer.parseInt (args[5]));
		JugContents init 
			= puzzle.new JugContents (Integer.parseInt (args[3]),
				Integer.parseInt (args[4]), 0);
		System.out.println (puzzle.tryPouring (init));
	}
	
	private static boolean DEBUGGING = false;
	
	private static void debugPrint (String s) {
		if (DEBUGGING) {
			System.out.println (s);
		}
	}
	
	static int min (int x, int y) {
		return x < y? x: y;
	}
	
	class JugContents {
	
		public int jugs[];	// Contents of the three jugs.
		
		public JugContents (int amt0, int amt1, int amt2) {
			jugs = new int [3];
			jugs[0] = amt0;
			jugs[1] = amt1;
			jugs[2] = amt2;
		}
		
		public JugContents (JugContents b) {
			jugs = new int [3];
			jugs[0] = b.jugs[0];
			jugs[1] = b.jugs[1];
			jugs[2] = b.jugs[2];
		}
		
		// Return the result of pouring as much as possible from jug from to jug to.
		public JugContents pour (int from, int to) {
			JugContents afterPour = new JugContents (this);
			int amtToCanGet = capacity[to] - jugs[to];
			int amtFromCanSupply = jugs[from];
			int amtPoured = min (amtToCanGet, amtFromCanSupply);
			debugPrint ("Pouring " + amtPoured 
				+ " from jug " + from + " to jug " + to);
			afterPour.jugs[from] -= amtPoured;
			afterPour.jugs[to] += amtPoured;
			return afterPour;
		}
		
		public String toString ( ) {
			return "Configuration = (" + jugs[0] + "," 
				+ jugs[1] + "," + jugs[2] + ")";
		}
		
		// You add more code to this class.
	}
}

TOC PREV NEXT