CS 4: Lecture 28
                            Wednesday, May 3, 2006

PERFORMANCE
===========
As you've probably seen in the optimizer in your Lunar Lander code, computers
can sometimes be slow.  Working programmers spend a lot of time designing their
programs to run fast.  Sometimes you can speed up a program by using a more
sophisticated algorithm.  For example, in Lecture 7 we saw that you can often
perform a simulaton in many fewer timesteps if you use the trapezoid rule
instead of Euler's method, with the same accuracy.

Java provides a method called System.currentTimeMillis() that you can use to
measure how long a portion of your code takes.

  long startTime = System.currentTimeMillis();
  ... [do some work] ...
  long workTime = System.currentTimeMillis() - startTime;

Now workTime is the amount of time spent doing the work, measured in
milliseconds.

Let's compare the time required to perform a matrix multiplication operation,
in Java and Matlab.  Suppose you have two nxn matrices A and B, and you want to
compute C = AB.  You need to compute

         n
  C   = sum A   B
   ij   k=1  ik  kj

for every i, j in the range 1...n.

In Java, you could use the following loop.

  public static double[][] matmul (double[][] A, double[][] B) {
    int n = a.length;
    double[][] C = new double[n][n];
    for (int i = 0; i < n; i++) {
      for (int j = 0; j < n; j++) {
        for (int k = 0; k < n; k++) {
          C[i][j] += A[i][k] * B[k][j];
        }
      }
    }
    return C;
  }

In Matlab, the same loop would look like this.

  for i = 1 : n
    for j = 1 : n
      for k = 1 : n
        C(i, j) = C(i, j) + A(i, k) * B(k, j);
      end
    end
  end

But, remember that Matlab has matrix operations built in.  So it's much simpler
to just write

  C = A * B;

How long do these matrix mulitplications take?  If n = 360,
- the triply-nested Java loop takes about 1.7 seconds.
- The triply-nested Matlab loop takes about 400 seconds.
- The built-in Matlab matrix multiply takes less than a tenth of a second.

Why?  Matlab is an interpreted language, and the interpreter runs quite slowly
compared to the partly-compiled Java code.  But Matlab's built-in matrix
multiply runs highly optimized code, which the numerical experts at Matlab have
slaved over to make it as fast as possible.  It's probably written directly in
machine language, and optimized for the specific microprocessor you're using.

Big-Oh Notation (upper bounds on running time)
----------------------------------------------
All three matrix multiplications have one thing in common, though.  They
execute n^3 iterations of the inner loop (which performs one multiplication and
one addition).  For really large values of n, the running time is approximately
c n^3, where c is a constant...but the constant c is different for each of the
three implementations.

We say they all run in O(n^3) time.  This is called _big-oh_notation_.

We know the constant c will change if we buy a faster computer, or use a
different language or compiler.  Big-oh notation expresses the speed of an
algorithm independently of a specific implementation on a specific machine,
ignoring constant factors (which get smaller and smaller as technology
improves).  One of the nice things about an algorithm's big-oh running time is
that you can figure it out by counting how many operations the algorithm
performs.  It doesn't matter what computer you use.

Suppose an algorithm for processing a retail store's inventory takes
(10,000 + 10 n) ms to process n transactions.  That's
  - 10,000 milliseconds to read the initial inventory from disk, and then
  - 10 milliseconds to process each transaction (items acquired or sold).

We say this program runs in O(n) time.  Why?  Even though 10,000 >> 10, the
"10 n" term will be more important if the number of transactions is very large.
Big-oh notation ignores all the terms except the term that grows fastest as
n increases.  Big-oh is an example of "asymptotic notation", so called because
it's concerned with how the running time behaves as n -> infinity.

Let n be the size of a program's _input_ (in bits or data words or whatever).
Let T(n) be a function that specifies the algorithm's running time, given an
  input of size n.
Let f(n) be another function--preferably a simple function like f(n) = n.

Formally, T(n) is in O( f(n) ) if

               T(n)
      lim      ---- < infinity.
  n->infinity  f(n)

For example,

              10,000 + 10n
      lim     ------------ = 10,
  n->infinity       n

so 10,000 + 10n is in O(n).  But

              10,000 + 10n
      lim     ------------ = infinity,
  n->infinity       1

so 10,000 + 10n is not in O(1).

Efficiency
----------
Often, the best way to make a program faster is to use an algorithm with a
better asymptotic running time.

(1)  Searching for a number in a sorted array of ints takes O(n) time if you
     search the whole array from left to right, but only O(log n) time if you
     use binary search.

(2)  Determining all the prime numbers in 2...n takes O(n^2) time if you do it
     the easy way, trying every number against every possible divisor.  The
     Sieve of Eratosthenes, which you implemented for the midterm, takes
     O(n log n) time.

(3)  Computing a Fibonacci number using the recursive algorithm you used in lab
     takes

            1 + sqrt(5)  n    ~         n
       O( ( ----------- )  )  ~  O(1.618 )
                 2

     time, which grows exponentially--that means _incredibly_ quickly.  But the
     iterative algorithm takes only O(n) time.

ALGORITHM ANALYSIS
==================
Problem #1:  Given an array of n integers, find the smallest.
Algorithm #1:  Maintain a variable "min" that stores the smallest integer
                       scanned so far.
               Set "min" to the first element of the array.
               Scan through the rest of the array, updating "min" as necessary.

  The running time is in O(n), because all we do is scan the whole array once.

Problem #2:  Given a set of p points, find the pair closest to each other.
Algorithm #2:  Calculate the distance between each pair; return the minimum.

  There are p (p - 1) / 2 pairs, and each pair takes constant time to examine.
  Therefore, the running time is in O(p^2).

  Hint:  The code for this algorithm gives a strong hint, because it contains
         a doubly-nested loop, with both loops iterating through input points.

         double minDistance = point[0].distance(point[1]);

         /* Visit a pair (i, j) of points. */
         for (int i = 0; i < numPoints; i++) {
           /* We require that j > i so that each pair is visited only once. */
           for (int j = i + 1; j < numPoints; j++) {
             double thisDistance = point[i].distance(point[j]);
             if (thisDistance < minDistance) {
               minDistance = thisDistance;
             }
           }
         }