CS 61B: Lecture 26 Monday, April 1 Representing Binary Trees ------------------------- Recall that a binary tree is a rooted tree wherein no node has more than two children. Additionally, every child is either a "left child" or a "right child" of its parent, even if its parent has only one child. In the most popular binary tree representation, each tree node has three references to neighboring tree nodes: a "parent" reference, and "left" and "right" references for the two children. (For some algorithms, the "parent" references are unnecessary; for a few, the "child" references are unnecessary.) Each node also has an "element" reference. public class BinaryTreeNode { | public class BinaryTree { Object element; | BinaryTreeNode root; BinaryTreeNode parent; | int size; BinaryTreeNode left; | } BinaryTreeNode right; | public void inorder() { if (left != null) { left.inorder(); } this.visit(); if (right != null) { right.inorder(); } } } ================================================ + BINARY TREE | -------------------- + =============== |--- ---- | + + ||.|root size|14| | + + |-+- ---- | + + --|----------------- + + v BinaryTree object + + ----- + + | * | + + ----- ------------ + + Root node => |add| | parent | + + ----- ------------ + + |.|.| | element | + + /---\ ------------ + + / ^^ \ |left|right| + + v / \ v ------------ + + ---/- -\--- structure of + + | . | | . | BinaryTreeNodes + + ----- ----- + + |sub| |div| + + ----- ----- + + >|.|.| |.|.|< + + / /--|- -|--\ \ + + / / ^| |^ \ \ + + / v |v v| v \ + + --+-- --+-- --+-- --+-- + + | . | | . | | . | | . | + + ----- ----- ----- ----- + + | 6 | | 5 | | 9 | | 3 | + + ----- ----- ----- ----- + + |*|*| |*|*| |*|*| |*|*| + + ----- ----- ----- ----- + ================================================ BINARY SEARCH TREES =================== An _ordered_dictionary_ is a dictionary in which the keys have a total order, just like in a heap. You can insert, find, and remove items, just as with a hash table. But unlike a hash table, you can quickly find the item with minimum or maximum key, or the item nearest another item in the total order. Hence, an ordered dictionary does anything a dictionary or binary heap can do and more, albeit more slowly. The simplest implementaton of an ordered dictionary is a binary search tree, wherein items are maintained in a (somewhat) sorted order. 18 The "left subtree" of a node is the subtree rooted at the / \ node's left child; "right subtree" is defined similarly. 12 25 A binary search tree satisfies the "binary search tree / \ / \ invariant": for any node X, every key in the left subtree of 4 15 20 30 X is less than or equal to X's key, and every key in the / / \ / right subtree of X is greater than or equal to X's key. You 1 13 17 28 can verify this in the search tree at left: for instance, \ \ \ the root is 18, its left subtree (rooted at 12) contains 3 14 29 numbers from 1 to 17, and its right subtree (rooted at 25) contains numbers from 20 to 30. When a node has only one child, that child is unambiguously either a left child or a right child, depending on whether its key is smaller or larger than its parent's key. An inorder traversal of a binary search tree visits the nodes in sorted order. In this sense, a search tree maintains a sorted list of items. However, operations on a search tree are usually more efficient than the same operations on a sorted linked list. Let's consider several, using the dictionary interface described by Goodrich and Tamassia, Section 8.1. For the following code, assume that each node has both a "key" field and an "element" field. [1] Object findElement(Object k); public Object findElement(Object k) { BinaryTreeNode node = root; // Start at the root. while (node != null) { // Repeatedly compare search if (isLessThan(k, node.key) { // key k with current node; if node = node.left; // k is smaller, go to the left } else if (isGreaterThan(k, node.key) { // child; if k is larger, go to node = node.right; // the right child. Stop when } else { /* The keys are equal */ // we find a match (success; return node.element; // return the item) or reach a } // null pointer (failure; } // return NO_SUCH_KEY). return NO_SUCH_KEY; } This method only finds exact matches. What if we want to find the smallest key greater than or equal to k, or the largest key less than or equal to k? Fortunately, when searching downward through the tree for a key k that is not in the tree, we are certain to encounter both - the node containing the smallest key greater than k (if any key is greater) - the node containing the largest key less than k (if any key is less). See Footnote 1 for an explanation why. +--+ For instance, suppose we search for the key 27 in the example |18| tree (at left). Along the way, we encounter the keys 25 and /--\--+ 28, which are the nearest keys less and greater than 27. 12 |25| / \ +/-\--+ 4 15 20 |30| Methods like closestKeyBefore() and closestElemAfter() keep / / \ +-/+-+ track (with each iteration, starting from the root) of the 1 13 17 |28| key nearest k (in the right direction) encountered so far. \ \ +-\+ If we reach a null pointer (indicating that k is not in the 3 14 29 tree), we return the nearest lesser/greater key/element. [2] Object minElement(); Object minKey(); Object maxElement(); Object maxKey(); minElement() and minKey() are very simple. Start at the root. Repeatedly go to the left child until you reach a node with no left child. That node has the minimum key. If the tree is empty, return NO_SUCH_KEY. maxelement() and maxKey are the same, except that you only go to the right child. In the example tree, observe the locations of the minimum (1) and maximum (30) keys. [3] void insertItem(Object k, Object e); insertItem() starts by following the same path through the tree as findElement(). (findElement() works _because_ it follows the same path as insertItem().) When it reaches a null reference, replace the null with a new node referencing the item (k, e). If insertItem finds a node that already has the key k, it puts it the new element in the left subtree of the older one. (We could just as easily choose the right subtree; it doesn't matter.) [4] Object remove(Object k); remove() is the most difficult operation. First, find a node with key k using the same algorithm as findElement(). Return NO_SUCH_KEY if k is not in the tree; otherwise, let n be the first node with key k. If n has no children, we easily detach it from its parent. If n has one child, replace n with its child; n's parent and n's child become directly attached to each other. If n has two children, however, we have to be a bit more clever. Let x be the node in n's right subtree with the smallest key. Remove x; since x has the minimum key in the subtree, x has no left child and is easily removed. Finally, replace n's item with x's item. x has the closest key to k that isn't smaller than k, so the binary search tree invariant still holds. 18 18 18 / \ / \ / \ 12 25 12 25 12 25 / \ / \ / \ / \ / \ / \ 4 15 20 30 -insertItem(2)-> 4 15 20 30 -remove(30)-> 4 15 20 28 / / \ / / / \ / / / \ \ 1 13 17 28 1 13 17 28 1 13 17 29 \ \ \ \ \ \ \ \ 3 14 29 3 14 29 3 14 / / 2 2 18 18 +--/ \ / \ |12| 25 13 25 /-\+ / \ / \ / \ -remove(12)-> 4 15 20 28 -> 4 15 20 28 /+-/+ \ \ / / \ \ 1 |13| 17 29 1 14 17 29 \+-\+ \ 3 14 3 / / 2 2 To ensure you understand the binary search tree operations, especially remove(), I recommend you inspect Goodrich and Tamassia's code on page 391. Be aware that Goodrich and Tamassia use sentinel nodes for the leaves of their binary trees; I think these waste an unjustifiable amount of space. Running Times of Binary Search Tree Operations ---------------------------------------------- o o In a perfectly balanced binary tree (left) with \ / \ depth d, the number of nodes n is 2^(d+1) - 1. o o o (See Footnote 2.) Therefore, no node has depth \ /\ /\ greater than O(log n). The running times of o o o o o findElement(), insertItem(), and remove() are all \ /\ /\ /\ /\ proportional to the depth of the last node encountered, so o oo oo oo oo they all run in O(log n) time on a perfectly balanced tree. \ o On the other hand, it's easy to form a severely imbalanced tree like \ the one at right, wherein these operations will often take linear time. o There's a vast middle ground of binary trees that are reasonably well-balanced, albeit certainly not perfectly balanced, for which search tree operations will run in O(log n) time. You may need to resort to experiment to determine whether any particular application will use binary search trees in a way that tends to generate somewhat balanced trees or not. Binary search trees offer O(log n) performance on insertions of randomly chosen or randomly ordered keys (with high probability). Unfortunately, it is easy to obliviously fill a tree with items that happen to be already sorted. In this circumstance, the disastrously imbalanced tree depicted at right will be the result. Technically, all operations on binary search trees are O(n), because we must consider the worst possible case. For this reason, researchers have developed a variety of algorithms for keeping search trees balanced. Prominent examples include 2-3-4 trees (which we'll cover on Wednesday), splay trees, and B-trees. =============================================================================== Footnote 1: When we search for a key k not in the binary search tree, why are we guaranteed to encounter the two keys that bracket it? Let x be the smallest key in the tree greater than k. Because k and x are "adjacent" keys, the result of comparing k with any other key y in the tree is the same as comparing x with y. Hence, findElement(k) will follow exactly the same path as findElement(x) until it reaches x. (After that, it may continue downward.) The same argument applies to the largest key less than k. Footnote 2: A perfectly balanced binary tree has 2^i nodes at depth i, where d 0 <= i <= d. Hence, the total number of nodes is Sum 2^i = 2^(d+1) - 1. i=0