This page lists some design desiderata that may (or may
not) conflict in the implementation.
- General tensors vs. LAPACK-style linear algebra (optimized for large
matrices) vs. fixed-size vector style (optimized for smaller matrices
and/or image processing)
- May be able to solve this by a multi-layer interface. There
should be two lowest levels: pure BLAS / LAPACK wrappers
(optimized for larger vectors / matrices) and a set of operations
optimized for fixed-size vectors and matrices (where we can do
things like exploit SSE2 operations).
- Tensor operations can be built upon the BLAS wrappers.
- High-level ("anything that makes mathematical sense works")
vs. high performance (e.g. avoiding CLOS and BLAS error-checking
overhead for fixed-size vectors and matrices)
Can solve by a multi-layer interface. The "works like math" layer
can use CLOS all it wants to. The "fixed-size vectors" layer can
present an interface somewhat like Cg (the GPU programming
language) -- though two lowest levels (one using the BLAS and one
optimized for small fixed-size problems) may be needed.
- "Works like math" means we need to be able to mix expressions
of different numerical types. There are a number of different
- Type promotion: can call the BLAS, but copy overhead
- Implement your own mixed-precision BLAS routines: slow, but no
- Implement your own mixed-precision BLAS routines and optimize
them: time-consuming, error-prone, though may be useful to
others as a spin-off project. Probably won't be as fast as a
typical vendor-optimized BLAS.
- Optimization of expressions: "Do everything at compile time"
(write a lot of code walkers and such; interpreted code gets no
optimization) vs. "expressions" (suggested by Jason Riedy; similar
to what is done in the uBLAS ("expression templates")), which
support interpreter-based optimization but incur an overhead.
- Is this a false dichotomy? Maybe the code for doing things at
compile time is very similar / pretty much the same as the code
for doing things at runtime / in the interpreter. Maybe we can do
both: optimize at compile time if we're in the compiler, otherwise
optimize at runtime.
- Matlab's print semantics (semicolon vs. no semicolon) mean
that if you don't ask to print a matrix, you can do expression
optimization as much as you want. (Is this true?)
Maybe we should worry about this later ("Premature optimization is
the root of all evil").
Last modified: Wed Sep 27 12:31:40 CST 2006