This page lists some design desiderata that

- General tensors vs. LAPACK-style linear algebra (optimized for large
matrices) vs. fixed-size vector style (optimized for smaller matrices
and/or image processing)
- May be able to solve this by a multi-layer interface. There should be two lowest levels: pure BLAS / LAPACK wrappers (optimized for larger vectors / matrices) and a set of operations optimized for fixed-size vectors and matrices (where we can do things like exploit SSE2 operations).
- Tensor operations can be built upon the BLAS wrappers.

- High-level ("anything that makes mathematical sense works")
vs. high performance (e.g. avoiding CLOS and BLAS error-checking
overhead for fixed-size vectors and matrices)
- Can solve by a multi-layer interface. The "works like math" layer can use CLOS all it wants to. The "fixed-size vectors" layer can present an interface somewhat like Cg (the GPU programming language) -- though two lowest levels (one using the BLAS and one optimized for small fixed-size problems) may be needed.
- "Works like math" means we need to be able to mix expressions
of different numerical types. There are a number of different
solutions:
- Type promotion: can call the BLAS, but copy overhead
- Implement your own mixed-precision BLAS routines: slow, but no copy overhead
- Implement your own mixed-precision BLAS routines and optimize them: time-consuming, error-prone, though may be useful to others as a spin-off project. Probably won't be as fast as a typical vendor-optimized BLAS.

- Optimization of expressions: "Do everything at compile time"
(write a lot of code walkers and such; interpreted code gets no
optimization) vs. "expressions" (suggested by Jason Riedy; similar
to what is done in the uBLAS ("expression templates")), which
support interpreter-based optimization but incur an overhead.
- Is this a false dichotomy? Maybe the code for doing things at compile time is very similar / pretty much the same as the code for doing things at runtime / in the interpreter. Maybe we can do both: optimize at compile time if we're in the compiler, otherwise optimize at runtime.
- Matlab's print semantics (semicolon vs. no semicolon) mean that if you don't ask to print a matrix, you can do expression optimization as much as you want. (Is this true?)
- Maybe we should worry about this later ("Premature optimization is the root of all evil").

Last modified: Wed Sep 27 12:31:40 CST 2006