Final Exam

In case you've forgotten the questions, here's the final exam in
PostScript and
PDF.

**Problem 1.** (8 points) **A Miscellany.**

**a.**
No. Simply setting the `head` field to `null` makes
the listnodes unreachable, so a mark & sweep garbage collector
can reclaim all of them.

**b.**
This hash code does not differentiate between two different cells
for which *i* + *j* is the same,
so moving a chip along a diagonal does not change a board's hash code.
Thus, collisions are likely.

**c. **
Use eight hash tables--one for each category.
You can tell what category a word is in (if any)
simply by checking for its presence in each of the eight hash tables.
(Each hash table's size needs to be chosen proportionately to the
number of words in each category, so that each table will have the same
load factor as one big hash table would. That way, the eight hash
tables offer O(1)-time operations and
take up the same amount of memory as one big table would.)

**d.**

`
Second
Fourth
`

(Optionally, you can also include an exception message at the end. Since this is just a code fragment, we don't know whether or not the exception is caught somewhere.)

**e.**
Insertion sort and quicksort.
(We gave a full point to people who also circled
bucket sort, because a bucket sort implementation might inspect the
range of the input keys before deciding how many buckets to use.)

**f.**
8.

**Problem 2.** (6 points) **Operations on Data Structures.**

I will try to draw these solutions sometime in the next week.

**a.**

**b.**

**c.**

**Problem 3.** (12 points) **Sorting.**

**a.**

5 9 7 4 0 2 8 8

5 8 7 4 0 2 9 8

5 8 7 4 0 2 8 9

0 8 7 4 5 2 8 9

0 2 7 4 5 8 8 9

0 2 4 7 5 8 8 9

0 2 4 5 7 8 8 9

**b.**
All of the keys might go into the first list
(e.g. if all the keys are equal).
Then this quicksort algorithm will call itself recursively with the same list
it started with, and it will repeatedly call itself forever,
never terminating.

**c.**
The worst-case running time is in Θ(*n*^{2}), and
the best-case running time is in Θ(*n* log *n*).

**d.**
Replace the binary search tree with a splay tree.
If the input array is already sorted, then after each insertion,
the root has no right child
(because the root is the largest key inserted so far).
Each time a key is inserted, it becomes the new right child of the root,
then it is splayed to the root with a single zig rotation.
Thus, each insertion takes O(1) time.

**e.**
If radix sort uses 512 buckets, it inspects 9 bits of each key per pass
(as 512 = 2^{9}).
Java `int`s are 32 bits long, so radix sort will need to do four passes
of bucket sort or counting sort.
If radix sort uses 256 buckets, it inspects 8 bits per pass, and still only
needs to do four passes.
However, sorting with 256 buckets is a bit faster per pass,
and takes less memory,
whereas there is no advantage to using 512 buckets at all.
Therefore, you would never choose 512 buckets over 256 buckets for 32-bit keys.

(It's worth mentioning one misconception we saw on the exams.
Some people thought that it's an inherently bad thing
if the final pass of radix sort doesn't have 9 bits left to inspect.
It's not.
Depending on the number of items *n* you are sorting,
it might make a lot of sense to use 2048 = 2^{11} buckets,
even though the final pass will only have 10 bits left to inspect.
This is the smallest number of buckets that allows you to do just three passes,
so it's the right choice for some values of *n*.)

**f.**
Here are two different correct answers.

- A key observation is that
`bottomUpHeap`is a comparison-based algorithm that transforms any array into a binary heap in O(*n*) time. Therefore, if the Professor is correct, then we can create an O(*n*)-time comparison-based sorting algorithm (for*any*list of keys, not just a binary heap) by putting`bottomUpHeap`and the Professor's algorithm together. But an O(*n*)-time comparison-based sorting algorithm cannot exist. (Note that this argument would not work if`bottomUpHeap`had Θ(*n*log*n*) worst-case running time; its linear running time is central to the argument.) - Although the items in a heap are partially sorted, the
*leaves*of a heap are not. For example, if all the internal nodes of a heap are zeros, and all the leaves are unique positive numbers, we can permute the leaves any way we like, and it's still a valid binary heap. At least half the nodes in a binary tree are leaves, so there are at least (*n*/2)! such permutations. So sorting the keys in an*n*-key binary heap is at least as hard as sorting*n*/2 keys from scratch, which takes Ω((*n*/2) log (*n*/2)) = Ω(*n*log*n*) time.

**g.**
This problem is known as *topological* sorting of
a directed acyclic graph.
The key observation is that we can only output a vertex after all the
vertices that point to it have already been written to the output.
So assign each vertex two fields:
`indegree`, which records the indegree of the vertex
(ignoring self-edges), and
`prerequisites`, which records the number of vertices that point to this
vertex and have been written to the output.
We can initialize every `prerequisites` field to zero, and
initialize every `indegree` field by looping through all the edges once.
Then we create a list *L*, which will contain all the vertices for which
`indegree = prerequisites`.
Initialize *L* with all the vertices that have an indegree of zero.
Repeat the following step until *L* is empty:
remove a vertex from *L*, write it to the output,
and increment the `prerequisites` field of every vertex in its
adjacency list.

**Problem 4.** (5 points) **Augmented Binary Search Trees.**

There are several ways to answer this problem, and the three parts of the solution need to match each other, so here's just one possibility.

**a.**
At each node, store the size of the subtree rooted at that node.

**b.**
Search for the key *k*
(in the usual manner for the `find` method in a binary search tree).
As you walk down the tree, at every node *n* where you go to
*n*'s right child,
add the size of the subtree rooted at *n*'s *left* child to
a running total, plus one for *n*.

When you reach the node that contains *k*, treat it the same way:
add the size of the subtree rooted at that node's left child,
plus one for *k*. Return the sum.

**c.**
First, `insert` the new key *k* as usual.
If *k* is already in the tree, do nothing.
Otherwise, create a new leaf for *k* and
set its `size` field to one.
Walk back up the tree (or down the tree a second time)
and increment the size field of all *k*'s ancestors.

**Problem 6.** (6 points) **Asymptotic Analysis.**

**a.**
O(*x* log *y* + 4^{y}).
(Important note:
you cannot replace 2^{2y} with 2^{y}.
Constants in exponents cannot be tossed out.)

**b.**
We divide the proof into two separate cases:
either *x* ≥ *y*, or *x* < *y*.

In the first case, we have

for all valuesx^{2}+ 10xy+y^{2}

≤x^{2}+ 10x^{2}+y^{2}

≤ 11 (x^{2}+y^{2})

The second case, where *x* < *y*, is symmetric.

for all valuesx^{2}+ 10xy+y^{2}

≤x^{2}+ 10y^{2}+y^{2}

≤ 11 (x^{2}+y^{2})

**Problem 6.** (6 points) **Amortized Analysis.**

**a.**
The smallest possible amortized cost of `enqueue` is $2.
For `dequeue` and `multiDequeue`,
we accepted answers of $0 or $1.

**b.**
This accounting ensures that the following invariant always holds:
the number of dollars in the bank is (at least)
the number of items in the queue.
We can verify that each operation maintains this invariant as follows.

An `enqueue` operation increases the size of the queue by one item.
It costs one actual dollar and puts one dollar in the bank,
which is enough to maintain the invariant.

A `dequeue` operation decreases the size of the queue by one item.
Thus, the operation can withdraw one dollar from the bank, which is enough
to pay for the operation's actual cost, giving an amortized cost of zero.
(Some people charged an amortized cost of $1 to cover the cost of
throwing an exception if the queue is empty.
We weren't picky about this when grading.)

A `multiDequeue` operation decreases the size of the queue by either
its parameter `i` or the number of items in the queue, whichever is
less. The actual cost of the operation is also the lesser of these two,
so the operation withdraws from the bank exactly the number of dollars
needed to cover its actual cost.
Thus, the amortized cost is zero.
(Again, one might charge $1 amortized to cover exceptions on an empty queue.)

By induction on the sequence of operations, all the operations maintain the invariant that the number of dollars in the bank is (at least) as great as the number of items in the queue. Therefore, the bank balance never becomes negative.

**Problem 7.** (7 points) **Index Sort.**

public static Thing[] indexSort(Thing[] x) { // Returns sorted copy of x. Thing[] y; y = new Thing[x.length]; for (int i = 0; i < x.length; i++) { int index = 0; for (int j = 0; j < x.length; j++) { if ((x[j].key < x[i].key) || ((x[j].key == x[i].key) && (j < i))) { index++; } } y[index] = x[i]; } return y; }

Mail inquiries to cs61b@cory.eecs