# Developer notes

Here follows some (not yet so well organized) documentation and developer notes for our ideas about an ANSI CL linear algebra library.

## Matrix creation

Q: How do I create a matrix?

1. Using a library function, as in Matlab: ones, zeros, constant, identity, gallery function, etc.
2. Read a matrix in from a file (e.g. in MatrixMarket dense format -- we have C code for that already)
3. Call some function that operates on a matrix and returns a matrix, e.g. factorization (not in place)

## Dense matrix storage

Q: How is the data in a matrix stored?

We give a full discussion of the 2-D Lisp array vs. 1-D Lisp array vs. 1-D C array question in the Design Alternatives document.

LAPACK supports a number of storage schemes for specialized matrix structures. Triangular matrices may be unit-diagonal or no. Triangular, symmetric, Hermitian, banded, tridiagonal, etc. matrices may use full or packed storage. LAPACK uses a special kind of packed storage for orthogonal matrices from e.g. a QR or SVD routine. Depending on the storage type, we should dispatch to the appropriate BLAS / LAPACK routine. In some cases routines may not be available: we should either write our own (difficult, especially if we want optimized versions) or make a full-storage copy and call the BLAS / LAPACK on that, then copy back if necessary.

Matlab does things like examine all the entries of a matrix to figure out the optimal matrix solver routine (e.g. if upper triangular, use a backward solve). However, there is no way in Matlab to "tag" a matrix as e.g. upper triangular, in order to save that cost. Tagging is prone to errors and if correctness is to be preserved, any matrix update routine must check and potentially modify the tags. For example, if we tag a matrix A as upper triangular, then get a matrix view of the lower triangle and add nonzero elements to it, A is no longer upper triangular.

What may make sense instead of a user-controlled tagging scheme is a system-controlled scheme. It's pretty rare that a user creates an upper triangular matrix knowing that it's upper triangular. Usually such matrices come from a factorization routine. The routine can tag the representation. If the user then modifies the representation, the tag is lost and it becomes a regular matrix.

What may be a good idea is to expose an interface to expensive things like runtime tests. For example:

(solve A b)
may do a full O(n^3) LU factorization of A to solve Ax = b, but
(solve (expose-structure A) b)
may be able to tag A as e.g. upper triangular and thus let solve dispatch to the appropriate more efficient routine. Subsequent modifications of A then "lose" the structure tag.

This explicit-tuning principle works like Rich Vuduc's OSKI sparse matrix code. Tuning operations on sparse matrices may be expensive, so the user must initiate them explicitly. In a tight loop with small matrices, O(n^2) tests for structure have a significant cost and should not be undertaken without a user request.

Later on, if we want sparse matrix support, we can exploit OSKI, which supports a rich set of sparse matrix operations, as well as my own Sparse Matrix Converter for type conversions and I/O. We can also call in some sparse factorization packages like SuperLU.

## Matrices and matrix views

Q: What is the difference between a matrix and a matrix view?

You can think of a MATRIX as "a pointer to the actual matrix." When it is garbage-collected, the data goes away. If a function returns a matrix, that returned object is independent of any objects given to the function.

A matrix view (MATVIEW), in contrast, is like a reference to some submatrix of a MATRIX. It doesn't store any actual data but only gives you a means to access that data as if it were conceptually a matrix in itself. If you modify an element of a MATVIEW, it modifies the corresponding element in the MATRIX from which the MATVIEW was derived. If a MATVIEW is GC'd, nothing happens to the original data or the MATRIX which wraps that data.

There are two kinds of matrix views (supported thus far): window, and strided. A window matrix view is a block of elements in the matrix. The 2-d indices of the elements in the block are consecutive. A strided matrix view takes elements separated by a constant stride0 in the column index and a constant stride1 in the row index from the original matrix. Other matrix view types may be supported if there is interest.

The base class MATRIX-LIKE encompasses both MATRIX and MATVIEW objects.

### Some developer ideas about matrix views

I think it's necessary for a matrix view to have a "pointer" to the original matrix. For example, suppose that we are building a smart Matlab compiler that tries to compute factorizations in place without copying results, whenever possible. We have a script that says the following:

[Q, R] = qr (A);
[L, U] = lu (A(2:4, 2:4));
A(2:4, 2:4) is a WINDOW-MATVIEW of A. The first line may choose to compute the QR factorization of A "in place," that is, overwriting A. The second line takes a subview of A and computes with it. That means the subview needs to know about the original matrix so it can tell the compiler to save a copy of the original matrix (???).

Note that it's not useful to do the dependency analysis in the interpreter, because it would require backtracking: e.g. if we compute [Q, R] = qr(A) and then try to read an element of A, we would need to have saved a copy of A anyway so that we can recover it. We shouldn't rely on the factorization being accurate (so that we can reassemble A = Q*R).

## How do I iterate over all the elements of a matrix (view)?

Calling MREF (the equivalent of AREF for MATRIX-LIKE objects) on a MATRIX-LIKE may involve a lot of indexing overhead which can slow down your code. If all you want to do is iterate over the elements of a MATVIEW, you can use series functions on the output of the SCAN-MATVIEW method, which scans a MATVIEW or MATRIX to produce a corresponding series. The scanners amortize the indexing overhead over the entire scan of the matrix.

## What is WITH-COPIES?

BLAS and LAPACK functions require their matrix arguments to be in a certain format: column-oriented (row-oriented is allowed for the CBLAS interface) and contiguous, not strided. It may be more efficient (and certainly easier to code) to copy a strided matrix view into contiguous storage first and use the BLAS or LAPACK on it there, then copy the results back into the matrix view, rather than trying to write a BLAS or LAPACK view that works directly on our strided matrix view (or whatever obscure matrix view we may come up with). However, we don't always need to copy: if we have a nice column-oriented window matrix view, we can run LAPACK on it directly. Furthermore, we don't want to copy all the arguments -- just those that the LAPACK function will modify.

WITH-COPIES is a system for abstracting all of that away. For each BLAS or LAPACK function that we want to wrap, we give it a "function template" which says which arguments are MATRIX-LIKE objects and which arguments are modified by the function. The system uses that information to parse a BODY of code given to WITH-COPIES. It determines which MATRIX-LIKE objects need to be copied, which need to be restored after the BODY executes, and does the variable substitution (in the BODY) and code generation to make this happen.

Suppose that "f" is one of the functions that we have registered as "ours." The function f is called as: (f A B), in which the first argument is marked as "output" and the second argument is marked as "input." Then we can write the following:

(WITH-COPIES (A B) (f A B))
and it expands to something like:
(let ((#:g1 (optional-copy A))
(#:g2 (optional-copy B)))
(prog1
(progn
(f #:g1 #:g2))
(optional-restore B #:g2)))

We employ a helper function, MARK-OUTPUT-VARIABLES, to go through the body of code and mark the output variables. In order for this to work, the property list of the function "f" must contain a PARAMETER-ATTRIBUTES key which is a list of the tags for the arguments of f. We include this as part of the template of f (f itself is generated by a code generator from a template).

## CBLAS notes

• If all the matrices have the same orientation, we can adjust appropriately by specifying the CBLAS_ORDER argument. (Note: this doesn't work for LAPACK, there may not be CBLAS-like wrappers available! In that case we have to handle things manually, e.g. operating on the transpose and swapping around the output arguments.) If any input matrix has an order different from the "prevailing order," we specify the TRANSA argument accordingly and swap its array dimensions in the CBLAS function call.
• Users do not need to specify the TRANSPOSE arguments because this can be handled through the TRANSPOSE functionality of MATVIEW. So our functions won't have transpose arguments.
• I had an idea whereby we could handle transposes automatically by parsing the CBLAS headers (e.g. using SWIG, or parsing a hand-created CFFI -- don't try to write a C parser). Any argument called "TransA" goes with an argument "A", annd so on. Any argument called "incX" goes with an argument "X", and so on. This would let us set up the transposing and strides automatically, so that you don't have to do it by hand for each CBLAS routine. The problem is that this doesn't work for standard LAPACK unless you have CLAPACK-style wrappers available. Furthermore, LAPACK functions tend not to have transposes for the output arguments.
• Say there are >= output arguments (e.g. DGEEV -- left and right eigenvectors). If they have different orientations, the only logical thing I can think of is to compute them in-place in the wrong order, and then use a transpose-in-place to get them into the right orientation. If they are matrix views, we can't go back and change the orientation of the entire matrix which they view. DGESVD is another interesting case: It returns U and V^T rather than V. U is m x m and V is n x n, so V^T is just V in row order. We can support this by changing the orientation of the output matrix from column order to row order after we compute it, but this only works if we are given a MATRIX for the output argument VT. A better option is to compute the transpose in place; we have code for this (that works for general matrices, not just square ones as in IBM's ESSL).

## Type promotion rules

This issue is discussed somewhat in the Design Conflicts document (in the "works like math" section). There are some tricky issues involved with matrix views. In particular, promotion of an output matrix view would require promotion of the entire output matrix, which violates the idea of a "matrix view" as a "look inside" a matrix that isn't able to alter the properties of the original matrix. We may resolve this by requiring that output arguments may only be type-promoted if they are MATRIX objects. Alternately, in an interactive system, we can give a warning and ask the user what to do.

I think we may have to define a new datatype so that we can specify the type of the data in the matrix as part of the syntax. Otherwise the defmethods won't work right. CLtL2 gives an example of using DEFTYPE in this way to define a "square 2-d array" type:

(defun equidimensional (a)
(or (< (array-rank a) 2)
(apply #'= (array-dimensions a))))
(deftype square-matrix (&optional type size)
"SQUARE-MATRIX includes all square 2-d arrays."
`(and (array ,type (,size ,size))
(satisfies equidimensional)))
So we can say something like:
(deftype *matrix-like* (&optional type nrows ncols)
`(and matrix-like (satisfies (or (= (* (nrows A) (ncols A)) 0)
(typep (mref A 0 0) type)))))
and then make the defmethods specify (*matrix-like* double-float) or whatever type as the types of the matrix arguments. Note that zero-element matrices and matrix views have any data type you want, and we don't evaluate (mref A 0 0) unless we know that there is at least one element in the matrix (view). Note that we don't ask for (array-element-type (my-array A)) because (a.) zero-element Lisp arrays may be tricky (I'm not sure), and (b.) we might use a C array instead of a Lisp array.

## Code generation

The BLAS and LAPACK support four different datatypes: single-float, double-float, (complex single-float) and (complex double-float). We can generate DEFMETHODs for all different types of matrices by using the same "method template" that we use for WITH-COPIES. It's easy to find the right BLAS / LAPACK routine for a given datatype because it begins with a letter corresponding to the datatype: S for single-float, D for double-float, C for (complex single-float) and Z for (complex double-float). I've written some code sketches for this.

The BLAS and LAPACK wrapper functions that we support are put into an alist mapping from method name to method template. We could even store them in a file, though this could be a security risk if people mess with the file, no?

### Dependency analysis (for a "smart Matlab compiler")

Here follows discussion of a "smart Matlab compilation" technique that tries to preserve Matlab copy semantics while avoiding copying whenever possible.

Types of accesses:

• (Matlab would copy the input, but our system wants to overwrite instead.)
2. overwrite-write: A(2:4, 2:4) = [0 1; 2 3];
• (Matlab would (sometimes) overwrite)
• (Matlab would never modify the read part, and neither would we.)
We can classify datapath links as follows. A (plain) read can be followed by anything.
[Q, R] = qr (A);
[L, U] = lu (A);

This requires saving a copy of the dependent variable the first time.
2. overwrite-read -> overwrite-write: [Q, R] = qr (A); A(2:4, 2:4) = [0 1; 2 3]; These require saving a copy of the dependent variable, or at least a copy of the subview that is required. Of course if we are in a function and we don't have inter-procedural analysis, we have to return a whole copy of A.
[Q, R] = qr (A);
print (A);

This requires saving a copy of the dependent variable the first time.
[Q,R] = qr (A);
E = eig (Q); (???)

OK, no special handling needed.
5. overwrite-write -> overwrite-write:
[Q, R] = qr (A);
[Q, S] = qr (B);

Cleverness can avoid computing Q the first time (e.g. with a QR factorization, we can avoid computing Q explicitly or at all; with an eigenvalue decomposition, we can avoid computing the eigenvectors). Otherwise there's nothing we need to do.
[Q, R] = qr (A);
print (Q);

OK, no special handling needed.

Dependency analysis is pretty easy in straight-line code. We may not know if we have a loop unless we macro-expand all the way and look for primitives (DO, GO, etc.). User-defined functions can be optimized if we ask the user to make assertions about inputs and outputs, or we can do inter-procedural analysis ourselves (operate on blocks -- find their inputs and outputs). For example, in Allegro CL v. 8.0 on MacOS X PPC:

CL-USER> (macroexpand `(dotimes (i n) (setf (aref A i) 0)))
(block nil
(let ((#:g251 n) (i 0))
(tagbody
#:Tag4
(cond ((>= i #:g251) (return-from nil (progn))))
(tagbody (setf (aref A i) 0))
(psetq i (1+ i))
(go #:Tag4))))
t
and
CL-USER> (macroexpand-1 `(dotimes (i n) (setf (aref A i) 0)))
(do ((#:g252 n) (i 0 (1+ i))) ((>= i #:g252))  (setf (aref A i) 0))
t
If we want to do this thoroughly, we'll have to trace out the control flow. User-defined functions may even have return-froms that make things messy. But eventually we have to make simplifying assumptions. These can be governed by a conservativity level so that the user can choose to sacrifice safety for speed.

## "Matlab in Lisp"

Parse a Matlab file and convert it into a PROGN of Lisp statements. This can then be compiled (using the standard Lisp compiler) which gives you, therefore, compiled Matlab code! We can also use our own code optimization tools on the PROGN before we feed it into COMPILE.

It's too ambitious to try to replicate all of Matlab in Lisp, especially the GUI stuff and the toolboxes. Nevertheless, we can bring in a lot of useful functionality from e.g. the GSL (though that's GPL'd, be careful) or other libraries.