Design alternatives
Underlying storage
-
Lisp k-D array:
Advantages:
- Already supports general tensor access
- Some Lisps optimize accesses with ROW-MAJOR-AREF
Disadvantages:
- Row-oriented only
- Displaced arrays only support a 1-D model, which makes window
matrix views difficult
- GC's (especially multithreaded GC's) may move the data around in
the middle of a BLAS / LAPACK call (some Lisps support pinning an
array as "static" so the GC doesn't move it around). Question:
What guarantees can we make about data locations given a
multithreaded application and garbage collector?
-
Many Lisps don't implement k-D arrays (for k > 1) efficiently. In
particular, AREFs may not be efficent, and the data may not be
stored contiguously in memory.
- Lisp 1-D array:
Advantages:
- Can choose column- or row-oriented
- Displaced arrays work if we handle k-D indexing ourselves (NLISP
code "array-data-address.lisp" shows you how to get pointer from
Lisp array, even if the array is displaced)
Disadvantages:
-
Boxed vs. unboxed issue and copying overhead for BLAS / LAPACK
- We have to handle the indexing ourselves, which may be slow
(though macros / code transformations may be able to fix this)
-
GC's (especially multithreaded GC's) may move the data around in
the middle of a BLAS / LAPACK call (some Lisps support pinning an
array as "static" so the GC doesn't move it around)
-
Displaced arrays aren't guaranteed by CLtL2 to be simple arrays
(this may be different in the ANSI standard???)
- C 1-D array:
Advantages:
- GC's don't move it around
- Lisp finalizations can take care of deallocating it
automatically
- "Displaced arrays" are easy and don't change the array type
Disadvantages:
-
We have to handle the indexing ourselves, which may be slow
(though macros / code transformations may be able to fix this)
-
Array accesses via the FFI may be slow (function calls,
branching?). We have to benchmark the FFI and pick the right Lisp
compiler parameters.
-
Lisp finalizations aren't part of the CL standard and differ from
implementation to implementation. Of the implementations
currently supported by CFFI, only ECL lacks user hooks for finalization.
-
In some cases, we may want to allocate the arrays on the stack
(e.g. for short fixed-length vectors). ALLOCA-based approaches may
not be portable, and they also incur the cost of a function call.
(There be dragons here.)
Level 1 (lowest Lisp level) interface generation
- Parse the Fortran comment headers and function declarations
Advantages:
- Automatically get in/out/inout parameter status
- Can get parameter types as well
- Adapts to LAPACK / BLAS interface changes (just reparse)
- Avoids bugs from hand-coding CFFI interfaces
- SWIG may not be able to handle Fortran well, and CLAPACK may
not be so well maintained
- rif is working on this :)
Disadvantages:
- Special cases may require hand intervention (depends on how much
time we want to spend writing the parser)
- Need to write a new parser for C libraries
-
Fortran-to-C conventions may differ from machine to machine.
- Write the CFFI interfaces ourselves
Advantages:
- Lets us decide exactly what we want to include, rather than
pulling everything in at once (which may introduce memory overhead,
depening on the underlying FFI implementation), including the
aux. LAPACK functions that users don't need to call.
- Handles special cases
Disadvantages:
- Time-consuming
- Error-prone
- Must write an interface every time a new external function must be
referenced
- Fortran-to-C conversion must be done manually -- may even have to
write wrappers (though this process can be automated).
- Fortran->C conventions may differ from machine to machine.
- Run SWIG on the CBLAS and CLAPACK headers
Advantages:
- We get a parser (SWIG) for free, and it's pretty well
maintained
- No need to worry about Fortran-to-C conventions (the CBLAS /
CLAPACK authors do that work for us)
Disadvantages:
- Not every machine may have a CBLAS / CLAPACK available
- CLAPACK may not be so well maintained / updated (David Bindel was
responsible before, and he's pretty busy). If the next version of
LAPACK comes out, it may be a while before they release C bindings
(???).
Level 2 interface
- Support general tensors
Advantages:
- Can specialize for matrices and vectors (using a macro technique
like C++ templates, so TENSOR-RANK is a compile-time constant), so
that we get a standard matrix / vector interface for free
- Lets tensor-happy people do cool stuff they want to do :)
Disadvantages:
- We may not have factorization libraries for general tensors. This
means we have to exclude e.g. LU, QR factorizations of general
tensors. This means we have to use CLOS for those factorization
functions in the medium level (???).
- May complicate the design of Level 2 (or maybe not?)
Notes:
- If we do this, we'll have Level 2 be general tensors and "Level
2.5" be matrices and vectors, specialized at compile time. A Level
3 Matlab-like interface can be built atop Level 2.5.
Higher-precision arithmetic
The chief issue here is getting the full LAPACK-like functionality
without rewriting it all by hand.
- Option 1: Let the LAPACK developers do it
-
LAPACK e-mail list says that the next version of LAPACK will
include arbitrary precision versions of all the LAPACK routines.
The arithmetic will be implemented using Fortran 95 modules. Much
of the source code will be automatically generated.
-
Will we be able to link to the Fortran 95 modules from Lisp? How
much FFI work will we have to do to call the arbitrary-precision
routines from Lisp? We could automate some of that work using a
tool like SWIG, but giving "bigfloats" an intuitive arithmetic
interface may require a lot of hand coding. Luckily we can borrow
the design from UC Berkeley Prof. Richard Fateman's code for
calling MPFR (a
different arbitrary-precision floating-point arithmetic library)
from Lisp.
-
When will the next LAPACK release be? They haven't done much
testing of the arbitrary-precision stuff, from what I understand.
- Option 2: Do it ourselves
-
Use Prof. Fateman's Lisp interface to MPFR (or some other
interface).
-
Hack the back end of f2cl to get it to replace the floating-point
numbers with MPFR numbers in the LAPACK routines. Alternately,
get some code generator scripts from the LAPACK developers (they
already generate the routines automatically for different
precisions) and adapt them to produce routines that call MPFR.
-
This may take a while but if we do some serious hacking soon, it
has a chance of being completed before the next LAPACK release.
That means we can use it for our own research. The problem is,
the code will be based on LAPACK 3.0 source, so users won't be
able to plug in a new version LAPACK binary that's been tuned by
the vendor (e.g. ATLAS includes a few LAPACK routines, like LU
factorization).
-
This option also repeats a lot of work that the LAPACK developers
are already doing. Users are already pushing for the
higher-precision routines and so they have incentive to finish
soon.
-
If we need a temporary solution we can always do some Mex coding
and provide an interface to MPFR in Matlab. (Matlab does provide
arbitrary-precision arithmetic out of the box but it's implemented
using the Symbolic Toolbox and the Maple kernel, and it's
reaaaallly slooooow (I know because I tried it for some tests, and
it was so slow that I ended up re-writing the routines in C and
calling MPFR -- much faster).) Newer versions of Matlab can do
classes and operator overloading. Actually, it seems
someone has done this already. I haven't tried their routines
yet.
System for managing temporary variables
Whenever we go from a vector expression to a function that expects a
vector or vector view, we have to generate a temporary. For example:
(solve! x A (+ w y z))
(which solves Ax = b (with b = w + y + z) and puts the result in x)
has to create a temporary to store the result of w + y + z. If this
code is called many times, e.g. in a loop, we end up creating a lot of
temporaries. That's not something we can fix with vector expression
optimization, because the solver expects a vector proper, not a series
or some kind of expression template. (One could get around this by
coding up a solver in Lisp that can take a series or expression
template, but we don't want to redo the work of the LAPACK
developers!)
It may be a good idea to have some sort of temporary vector
management system that keeps and recycles temporaries. We could
perhaps take advantage of finalization to make this system easier to
implement. A system like this is handy if all the objects are about
the same size, as for example with arbitrary-precision floating-point
objects that are all set to have the same precision. However, for a
heterogeneous collection of vectors and matrices, there would be a lot
of overhead in matching sizes and shapes of storage.
Last modified: Sun Oct 15 00:25:31 CST 2006