Programming Puzzle Gone Wrong

A programming interview question from the Daily Coding Problem email list. Here’s a non-hand-wavy explanation of a way to solve this problem.

Daily Coding Problem: Problem #736 [Easy]

Given a complete binary tree, count the number of nodes in faster than O(n) time. Recall that a complete binary tree has every level filled except the last, and the nodes in the last level are filled starting from the left. “Complete” means: every level, except possibly the last, is completely filled, and all nodes in the last level are as far left as possible. It can have between 1 and 2h nodes at the last level h.


The “complete binary tree” phrase means that this problem refers to binary trees with the heap shape. It doesn’t say “binary search tree” or “priority queue” or “heap sort” or any other phrase about the values of the data carried by nodes in the tree, just the shape of the tree.

The desired answer is a count of the number of nodes in the tree. The phrase “faster than O(n) time” in this context means to without touching every node. An obvious solution to counting a tree’s nodes is to traverse the tree, which touches every node in the tree.

Method

I’ll illustrate using a 4-level complete binary tree, below. I’ve labeled every edge, left child pointers are labeled 0, right child pointers are labeled 1.

numbered binary tree

The leaf nodes are labeled by a concatenation of the edges’ labels in the path from the root to the leaf node. Follow all left hand edges (labeled 0), you get to the left-most leaf node, label 000. Follow all right hand edges (labeled 1), you get to the right-most leaf node, label 111. If you consider the labels as binary representation of numbers, the leaf nodes are labeled in numerical order from left to right.

If you’ve got a sorted array or list, you can do a binary search for a specific value by doing O(log n) number of comparisons. Note that the n here is the number of leaf nodes, 8 in the tree above.

You can find any particular leaf node by label: treat the label as a series of 0 and 1 bits. Follow the left pointer if the current bit is 0, follow the right pointer if the current bit is 1.

Using binary search to find the leaf node with the largest label (considered as a number). You can find it in O(log m) time, where m is the number of leaf nodes.

That gives you the number of leaf nodes in the last level of the tree. The number of nodes in the rest of a tree of depth h is 2h-1 - 1. That’s a small constant time, so you can find the number of nodes in a tree in O(log m) complexity, and m is only a little bit more than half of n, maximum.

Do you see where I went wrong calculating time complexity?

Source code for a program that counts tree nodes in just this fashion.

Count the number of node touches

count of node accesses vs number of nodes

The above diagram shows that my method of “searching” the deepest rank of the tree touches more nodes than the O(n) criterion. I was lead astray by the (false) promise of the O(log m) complexity of binary search. For search and sort algorithms, the thing counted (m) and used in the O() complexity is value comparisons. The edge-labeling and probing algorithm above does O(log m) probes into the tree, each “value comparison” in a pure binary search corresponds to a determination of depth, a probe into the tree. Each probe touches about the left-depth of the tree number of nodes.

Figure out the actual complexity

Binary search of the deepest rank should take O(log2 m) probes into the tree, where m is the number of nodes in the bottom rank. For a full binary tree, m = 2d-1, d is the depth of the tree. Again for a full binary tree, d = log2(n+1), where n is the number of nodes in the full binary tree.

log2(m) = log2( 2d-1) = d - 1

d - 1 = log2(n+1) - 1

You get to throw away the “- 1” part when determining big-O complexity, that’s just a constant. The number of node accesses should be O(log2(n+1)) for this algorithm, which is a little greater than O(log2(n)). My code must have a large constant factor on the complexity, since for n < 45, it touches more than n nodes.