TODO

Table of Contents

1 Integration into LAPACK

Snapshot of the research iterative refinement code (test drivers, non-GE versions, etc.) is at http://www.cs.berkeley.edu/~ejr/dev/lapack/itref-work/.

Note that the code still needs some changes. See the section Code Work below.

1.1 Changes to existing routines

1.1.1 DONE Longer name support in LAPACK.

Included in main branch of the LAPACK repo.

1.1.2 TEST XBLAS Fortran to C interface , make.inc flag testing

In XBLAS distribution and LAPACK trunk.

  • [X] Needs linking/calling tests in TESTING.
  • [X] Needs review and make.inc flag testing

Included in main branch of the XBLAS repo.

1.1.3 TEST Use XERBLA in XBLAS

So users can hook XERBLA rather than having to hook XERBLA, BLAS_error, and BLAS_ERROR.

Mostly done. Still needs build documentation and wider build testing.

1.1.4 DONE Remove plain BLAS Tech Forum routines from XBLAS

Having the CBLAS interface (BLAS_dgemm and crew) in the XBLAS is potentially confusing. Not generating the routines themselves is easy, but disentangling them from the tester is a bit more complicated.

1.1.5 TODO Add workspace query functions for the top-level refinement routines.

  • [X] xyySVXX : Depends on equil., fact., and refinement WS needs.
  • [X] xyyRFSX : Depends on cond. est. and refinement WS needs.
  • [X] xLA_yyRFSX_EXTENDED?
  • [X] various _NORMEST routines?

Currently, these routines are functions of N and NRHS only. We could use additional workspace to work on multiple right-hand sides at once.

1.2 Refinement routines

1.2.1 WAITING Pull in xGESVXX

Requires:

  • [X] Longer name support
  • [X] XBLAS Fortran to C interface
  • [X] Cleanups

1.2.2 Other formats

SDCZ
GEcode, testcode, test(ip)code, testcode, test(ip)
SYcode, test(ip)code(ip), test(ip)code(ip), test(ip)code(ip), test(ip)
HEn/an/a
POcode
GB
GT
PB
PT
PP
HPn/an/a
PS

Legend:

  • (ip): Someone is signed up to work on it.
  • S, D, C, Z: REAL, DOUBLE PRECISION, COMPLEX, COMPLEX*16
  • code: Have working code, needs cleaned up.
  • test: Have LAPACK tester.
  • GE: General
  • SY: Symmetric
  • HE: Hermitian, complex-only
  • PO: Hermitian, positive definite
  • GB: General, banded
  • GT: General, tridiagonal
  • PB: Hermitian, positive-definite, banded
  • PT: Hermitian, positive-definite, tridiagonal
  • PP: Hermitian, positive-definite, packed
  • HP: Hermitian, packed
  • PS: Symmetric, packed

1.3 TIMING    DEFERRED

1.3.1 DONE Non-extended timing

Code for xGESV and xGESVX is included in timing-integration.

1.3.2 DONE Extended timing

Code is in itref:master

Requires:

  • [X] Pull in xGESVXX

1.4 TESTING

1.4.1 Questions

  • If there's an error in an intermediate routine, the testing XERBLA gripes about being given the wrong name. Should we care?

    Nope.

  • Should we test the pivot growth functions? What of the other low-level routines (e.g. xLA_GEAMV)?

    Not now.

1.4.2 FEEDBACK Get single-precision code working

In itref:master:

  • [X] Calling sget07 correctly produces a good number of errors.
  • [X] Testing with multiple rhs also causes problems, possibly due to the errbnd array ordering. We don't have a "ld_errbnd", so code that uses the same errbnd array for different numbers of rhs has to index errbnd manually.
  • [X] Decide if FERR tests should be disabled. The "truth" isn't accurate enough.

    The tests were disabled "temporarily" by commit e061efc111b, but they may stay disabled. We can only re-enable them if we can produce more accurate true solutions for testing.

Deferred, so long as all the tests pass on various machines:

  • Re-equilibrate with xyyEQUB before testing xyySVXX. Yuck. The current code actually works, which means the tester isn't testing along the condition number boundaries. (Using xyyEQUB universally is deferred, Use xyyEQUB)

    Testing code doesn't require that the equilibration generated in the test routine be the same as the equilibration used internally.

1.4.3 TODO Add complex testing

  • [X] Verify condition number thresholds for our "guarantee". They're the same.
  • [X] Verify maximum number of iterations. Do we need a few more to cope with the extra constant factor in the error?

1.4.4 TODO Add double testing

  • [X] Verify condition number thresholds. Should be similar to single.

We don't have torture-testing code to check.

  • [X] Verify maximum number of iterations. Do we need more for extra precision?

1.4.5 TODO Add complex*16 testing

  • [X] Verify condition number thresholds. Should be similar to single.

We don't have torture-testing code to check.

  • [X] Verify maximum number of iterations. If either of the above need more...

1.4.6 TODO Add positive-definite testing

  • [X] Verify condition number thresholds.
  • [X] Verify maximum number of iterations

1.4.7 TODO Integrate into LAPACK

  • [X] Convert Hilbert generator into LAPACK MATGEN form.
  • [X] Convert the Hilbert tester into LAPACK form.
  • [X] Clean up tester format, and be more liberal with comments.

Requires:

  • [X] Pull in xGESVXX

1.5 Documentation

1.5.1 Questions

  • Which of the low-level routines should be documented in the LUG?
    • xyyEQUB : man page like xyyEQU
    • xLA_yyRPVGRW : internal for now
    • xLA_yyAMV : internal for now
    • xLA_LIN_BERR : internal for now
  • What promises are made (if any) when the user supplies equilibration factors and/or a factored matrix? This should be included in the routine's comments as well as LUG section 4.4.

    We promise nothing. Explained in leading comments of xyySVXX, also needs explained in the LUG.

1.5.2 TODO Add to the LUG

This list covers only xGESVXX.

A snapshot of the LUG is available at http://www.cs.berkeley.edu/~ejr/lapack-snapshots/trunk/lug-latest.tar.gz or unpacked under http://www.cs.berkeley.edu/~ejr/lapack-snapshots/trunk/lug . It's also on the EECS file system under file:/project/cs/demmel/b/LAPACK/snapshots/trunk/ .

  • [X] 2.3.1 Driver Routines - Linear Equations

    Just a table entry.

  • [X] 2.4.1 Computational Routines - Linear Equations

    xyyEQUB, xyyRFSX, and section's last sentence

  • [X] 3. Performance of LAPACK

    Add "Cost of Extra Precision"?

  • [X] 4.4 Accuracy and Stability - Error Bounds for Linear Equation Solving

    Add a 4.4.2 for extra precise refinement, possibly absorbing the last bit of 4.4.1. Caveat for user-provided factorization and equilibration.

    Symmetric routines will need a description of the equilibration somewhere.

  • [X] 5. Design and Documentation of Argument Lists

    Array for optional arguments in xGESVXX: need feedback from users if this is acceptable. See the description of PARAMS in http://www.cs.berkeley.edu/~ejr/dev/lapack/itref-work/sgesvxx/sgesvxx.f and http://www.cs.berkeley.edu/~ejr/dev/lapack/itref-work/sgesvxx/sgerfsx.f .

  • [X] 6. Installing LAPACK Routines
    • How to build the XBLAS: Fortran->C definitions in make.inc. This may be replaced by requiring an external XBLAS.
  • [X] 7. Troubleshooting - Wrong Results

    Add failure modes from tech report. Actually, this section should provide a bit more guidance for those cases where the algorithms honestly fail. Right now it just blames the user.

  • [X] Manual page for xyyEQUB. (Julie has some system to handle these.)

1.5.3 TODO Add to the installation guide (new LAWN 81)

The included LAWN 81 needs a lot of basic editing and updating. It certainly needs edited in the following ways:

  • General replacement of "tape" with web site, etc.
  • Update 5.2 "Edit the LAPACK/make.inc" to include necessary bits for linking with the XBLAS.
  • Information about the XBLAS.

2 Code Work

2.1 TODO Cleanups

  • [X] Sanitize the include files and _PARAM names.
  • [X] Rename BLAS_sge_diag_scale back to xLASCL2. The BLAS Tech Forum scaling routines are not useful for complex matrices, and we'd rather be internally consistent.
  • [X] Correct comments about INFO. INFO reports the first rhs that does not end in a converged state.
  • [X] Rearrange comments to assist the wrapper-generating script. Move the PARAMS description and replace the leading text with a forward reference. (Or fix the wrapper generator.)
  • [X] Double-check info return for tiny RCOND.
  • [X] Add "no promises" for user-provided factorization and scalings to the comments.
  • [X] Use convergence status to decide if the result is trustworthy. (Not a small change. See below.)
  • [X] Strip out debugging and tracing infrastructure and other refinement variations (Wilkinson and working-precision).
  • Final precision updates from origin SyySVXX, etc.:
    • [X] double
    • [X] complex
    • [X] double-complex

2.2 TODO Use convergence status for trustworthiness

Currently, the code is structured as follows:

  • SGESVXX : Equilibrates, factors, and then refines.
  • SGERFSX : Refines all solns, decides if they're trustworthy.
    • SLA_GERFSX_EXTENDED : Runs the refinement state machine on all rhs.
    • SLA_GERFSX_{WILK,WORKING} : (Different algs for testing)

Each level operates on all NRHS right-hand sides. There is a loop over the rhs in SLA_GERFSX_* that processes each rhs on its own. The routine currently is provided only enough workspace to process one rhs at a time.

The "trust me" policies are separated from the mechanics of refinement. So a user can apply the refinement loop in SLA_GERFSX_EXTENDED without applying the LAPACK policies in SGERFSX.

To keep this separation but apply the "only trust the answer when refinement has converged" policy, we need to return whether each rhs has converged. And we don't have space for an NRHS-long array anywhere.

Possible options:

  1. Move the per-rhs loop up to SGERFSX, making SLA_GERFSX_* apply to only one RHS at a time. The componentwise condition estimator would need run within the RHS loop so we could apply all the "trust me" policies there.

    This completely disallows working on blocks of the rhs at once using SLA_GERFSX_*. We currently don't have the workspace for blocks (or working on each rhs in a separate thread), so it probably doesn't matter.

  2. Add a NRHS to the REAL workspace and encode the naturally LOGICAL convergence state. We don't want to return the entire state, only whether the state machine considers the solution converged.

    If we start mucking with the workspace, we should add a full LWORK/LRWORK/LIWORK/etc. query interface. The query routine would take N and NRHS, possibly asking for enough workspace to process blocks at a time.

Block refinement would allow a single pass through A for each step, which may prove useful for the first few steps.

2.3 TODO If the code accepts the componentwise result, also accept the normwise result.

Componentwise convergence implies normwise convergence, so we should always accept the normwise result when we accept the componentwise result.

The normwise condition number for xyySVXX is

   norm(inv(R*A),inf)*norm(R*A,inf)
for a suitable R. This ignores the solution and right-hand side, so we do not detect correct solutions in such cases as A*x=0. But the componentwise condition number will catch these cases and accept the solution.

There are more considerations when A*x=0 occurs as a block in (possibly permuted) block-diagonal systems. We may want to include the solution / right-hand side in the normwise condition number at some point.

3 Deferred LAPACK changes

3.1 Propagate scaling-independent pivot growth to old code    DEFERRED

Won't do for this release. Re-evaluate how it might affect existing users.

xyySVXX (extended) calculates the pivot growth for each column and reports the maximum. The older xyySVX (not extended) calculates the growth across the entire equilibrated matrix. If the user disables equilibration or provides a poor column scaling factor, the xyySVX code may underestimate the pivot growth.

3.2 Change underflow protection in BERR computations    DEFERRED

A SuperLU user reported an "interesting" bug which also can occur in LAPACK: Solving Ax = 0 is accurate but reports BERR = 1. Other interesting problems occur with block-structured matrices where some block has a zero solution. To protect against underflow and mis-rounding the residual, the routines compute:

  BERR = max(0, SAFEMIN) / max(0, SAFEMIN) = 1

To fix this, xyySVXX uses a modified abs(A)*abs(x)+abs(b) calculation that still protects against underflow but does not perturb exact zero components (see xLA_yyAMV). We also encapsulated slightly more complicated 0/0 logic in xLA_LIN_BERR.

This should be evaluated for eventual inclusion in the plain xyySVX. But this shouldn't block releasing LAPACK now.

Might be worth considering an interval xyyMV for the computation, if any actually exist.

3.3 Use xyyEQUB rather than xyyEQU    DEFERRED

The xyySVXX routines equilibrate by powers of the radix to avoid the extra rounding error. This would be trivial to include in xyySVX. However, users may expect entries in the equilibrated A to be less than one in magnitude. Scaling by xyyEQUB's results only guarantee the entries to be less than the radix in magnitude.

However, users cannot require the matrix be equilibrated. If the equilibration factors are all less than some threshold (see the refinement drivers), they are not applied. So really the user cannot rely on the entries' magnitudes...

4 General / Research

This section is not up-to-date.

4.1 Least-squares

  • Add statistical tests. Generate test data sets from a random linear model. Compare the sample variance of the refined r, b-Ax in working precision, and b-Ax in extended precision.
  • Also check each residual's orthogonality to the columns of A.
  • Extend to generalized least squares (straight-forward) and constrained least squares (difficult).

4.2 Parallel / ScaLAPACK

  • PXBLAS? XPBLAS? ick?
  • Needs XBLACS as well.

4.3 Generic iterative refinement driver

itref:master-f90

  • Adapt for least-squares. Doesn't need many changes.
  • Adapt for parallel use.

4.4 Compare results and performance with interval methods

  • MKL has interval solvers
  • MKL does not have an interval gemv to compare with geamv for berr, but it does have interval scalar operations.

4.5 Compare with "DSGESV"

See also http://icl.cs.utk.edu/trac/lapack-dev/wiki/RefinementComparison .

  • Compare timings and error results.
  • Need to separate condition estimation, etc.

Variables:

  • equilibration: none, col-only, both
  • factorization precision: single, double
  • residual precision: = fact-prec, > fact-prec (80-bit), >= 2*fact-prec
  • update (x+dx) precision: = fact-prec, > fact-prec (80-bit), >= 2*fact-prec
  • termination criteria: # steps, berr, ferr-est

Results:

  • time to solution for 1 rhs, many rhs
  • normwise BERR
  • cwise BERR
  • normwise FERR
  • cwise FERR

4.6 Profiling and Optimization

  • How much faster would refinement be with HW quad? What if quad were 2x or 10x slower than double?

    Need:

    • Time in factorization
    • Time in refinement
    • Time in GEMV_X and GEMV2_X
    • speed of GEMV_X and GEMV2_X v. GEMV

    For what would be saved if we had fast intervals:

    • Time in error computation (compare with using mkl's?)

    Remaining in total:

    • Time in equilibration and scaling
    • Time in rconds
    • Time in copying overheads?

    Interesting problems?

    • compare v. size
    • compare v. condition number
  • Compare relative costs for sparse & dense.
  • What is the accuracy / performance trade-off with 80-bit accumulators. We know the error v. condition number pictures will be more difficult to describe.

4.7 Other solvers?

  • GMRES (normwise backwards-stable, so could work when A is not available); latency-amortizing GMRES
  • unstable but fast sparse triangular solves
  • block LU factorization, Strassen (not column-scaling invariant), etc.

4.8 Possible applications

  • Finite termination algorithms in linear optimization use equality-constrained least squares.

    If we can solve these accurately enough, combinatorial optimization algorithms may recognize when the solution is at a vertex rather than in a face.

  • Integration into Octave and R. Both have expressed an interest in using slightly more expensive arithmetic to get the right answer by default.

5 Current source repositories:

We mostly use git for source control.

5.1 iterative refinement experimental driver

locations:

snapshots:

dependencies:

  • Needs the matrix-generator library from LAPACK. It's the TMGLIB in the top-level make.inc.
  • Needs the latest XBLAS snapshot.

interesting branches:

  • master (both): The current main line for stable development. Has testing/ and TIMING/.
  • master-f90 (both): General refinement interface, F95. Not up-to-date relative to master.
  • TODO (ejr's): Maintains this TODO list. It's a special tree, similar to dev-tools in the LAPACK repo below.

5.2 least squares iterative refinement

locations:

snapshots:

interesting branches:

  • master: Main line for development.
  • sgesvxx/*: A copy of plain refinement's master branch for use as a library. Merged by the experimental git subtree merge strategy.

5.3 XBLAS repo

locations:

snapshots:

interesting branches:

  • dev-tools (yozo's): Tools for updating the LAPACK side.

merged, no longer interesting:

  • f2c-bridge (both): Includes an automatically generated bridge to allow Fortran->C XBLAS calls.

dead branches:

  • use-xerbla (ejr's): Work towards using XERBLA in BLAS_error. Dead. Use the in-progress xerbla-c2f changes as detailed in Use XERBLA.

5.4 LAPACK repo

locations:

snapshots:

interesting branches in local git:

  • svn-master, svn-june-release: Mirrors of the UTK Subversion repository.
  • master: Based off of svn-master. Ideally, this will not differ from svn-master for long.
  • timing-integration : Changes to integrate the timing part of the current itref code. Note: TIMING/ is dead.
  • testing-integration : Changes to integrate testing, currently may not exist.
  • autoconf-helper : Adds a XERBLA_ARRAY procedure that converts its argument from an array of CHARACTER(1) to a string and calls XERBLA. To be used by components in other libraries, particularly the XBLAS's BLAS_error. Also adds a make.inc-generating configure script.
  • dev-tools : Collection of scripts and extra git-specific gizmos like the exclude list. It's not a true branch. Use by cloning this specific branch into a dev-tools subdirectory of the lapack.git clone. Something like the following should work:

     mkdir dev-tools
     cd dev-tools
     git init-db
     git fetch `pwd`/.. dev-tools:master
     git checkout master
     git repo-config remote.origin.url `pwd`/..
     git repo-config remote.origin.fetch refs/heads/dev-tools:refs/heads/master
    

    Then you can ignore generated files by:

     cp dev-tools/git-info-exclude .git/info/exclude
    

Author: E. Jason Riedy <ejr@cs.berkeley.edu>

Date: 2007/11/08 09:20:44 PM