TODO -*- mode: org; -*- # Feel free to change the next two lines. I haven't figured out how # to make org-mode not add these fields to the exported file. #+AUTHOR: E. Jason Riedy #+EMAIL: ejr@cs.berkeley.edu * Integration into LAPACK Snapshot of the research iterative refinement code (test drivers, non-GE versions, etc.) is at [[http://www.cs.berkeley.edu/~ejr/dev/lapack/itref-work/]]. Note that the code still needs some changes. See the section Code Work below. ** Changes to existing routines *** DONE <> in LAPACK. # View at http://www.cs.berkeley.edu/~ejr/dev/lapack/wide-names/ Included in main branch of the LAPACK repo. *** TEST <>, make.inc flag testing # View at http://www.cs.berkeley.edu/~ejr/dev/lapack/xblas/ In XBLAS distribution and LAPACK trunk. - [ ] Needs linking/calling tests in TESTING. - [ ] Needs review and make.inc flag testing Included in main branch of the XBLAS repo. *** TEST <> in XBLAS So users can hook XERBLA rather than having to hook XERBLA, BLAS_error, and BLAS_ERROR. Mostly done. Still needs build documentation and wider build testing. *** DONE Remove plain BLAS Tech Forum routines from XBLAS Having the CBLAS interface (BLAS_dgemm and crew) in the XBLAS is potentially confusing. Not generating the routines themselves is easy, but disentangling them from the tester is a bit more complicated. *** TODO Add workspace query functions for the top-level refinement routines. - [ ] xyySVXX : Depends on equil., fact., and refinement WS needs. - [ ] xyyRFSX : Depends on cond. est. and refinement WS needs. - [ ] xLA_yyRFSX_EXTENDED? - [ ] various _NORMEST routines? Currently, these routines are functions of N and NRHS only. We could use additional workspace to work on multiple right-hand sides at once. ** Refinement routines *** WAITING <> Requires: - [X] Longer name support - [X] XBLAS Fortran to C interface - [ ] Cleanups *** Other formats | | S | D | C | Z | |----+----------------+--------------------+--------------------+--------------------| | GE | code, test | code, test(ip) | code, test | code, test(ip) | | SY | code, test(ip) | code(ip), test(ip) | code(ip), test(ip) | code(ip), test(ip) | | HE | n/a | n/a | | | | PO | code | | | | | GB | | | | | | GT | | | | | | PB | | | | | | PT | | | | | | PP | | | | | | HP | n/a | n/a | | | | PS | | | | | Legend: - (ip): Someone is signed up to work on it. - S, D, C, Z: REAL, DOUBLE PRECISION, COMPLEX, COMPLEX*16 - code: Have working code, needs cleaned up. - test: Have LAPACK tester. - GE: General - SY: Symmetric - HE: Hermitian, complex-only - PO: Hermitian, positive definite - GB: General, banded - GT: General, tridiagonal - PB: Hermitian, positive-definite, banded - PT: Hermitian, positive-definite, tridiagonal - PP: Hermitian, positive-definite, packed - HP: Hermitian, packed - PS: Symmetric, packed ** TIMING :DEFERRED: *** DONE Non-extended timing Code for xGESV and xGESVX is included in [[timing-integration]]. *** DONE Extended timing Code is in itref:[[master]] Requires: - [ ] Pull in xGESVXX ** TESTING *** Questions - If there's an error in an intermediate routine, the testing XERBLA gripes about being given the wrong name. Should we care? *Nope*. - Should we test the pivot growth functions? What of the other low-level routines (e.g. xLA_GEAMV)? *Not now*. *** FEEDBACK Get single-precision code working In itref:[[master]]: - [X] Calling sget07 correctly produces a good number of errors. - [X] Testing with multiple rhs also causes problems, possibly due to the errbnd array ordering. We don't have a "ld_errbnd", so code that uses the same errbnd array for different numbers of rhs has to index errbnd manually. - [X] Decide if FERR tests should be disabled. The "truth" isn't accurate enough. The tests were disabled "temporarily" by commit e061efc111b, but they may stay disabled. We can only re-enable them if we can produce more accurate true solutions for testing. Deferred, so long as all the tests pass on various machines: - Re-equilibrate with xyyEQUB before testing xyySVXX. Yuck. The current code actually works, which means the tester isn't testing along the condition number boundaries. (Using xyyEQUB universally is deferred, Use xyyEQUB) Testing code doesn't *require* that the equilibration generated in the test routine be the same as the equilibration used internally. *** TODO Add complex testing - [X] Verify condition number thresholds for our "guarantee". *They're the same.* - [ ] Verify maximum number of iterations. *Do we need a few more to cope with the extra constant factor in the error?* *** TODO Add double testing - [ ] Verify condition number thresholds. *Should be similar to single.* We don't have torture-testing code to check. - [ ] Verify maximum number of iterations. *Do we need more for extra precision?* *** TODO Add complex*16 testing - [ ] Verify condition number thresholds. *Should be similar to single.* We don't have torture-testing code to check. - [ ] Verify maximum number of iterations. *If either of the above need more...* *** TODO Add positive-definite testing - [ ] Verify condition number thresholds. - [ ] Verify maximum number of iterations *** TODO Integrate into LAPACK - [X] Convert Hilbert generator into LAPACK MATGEN form. - [ ] Convert the Hilbert tester into LAPACK form. - [ ] Clean up tester format, and be more liberal with comments. Requires: - [ ] Pull in xGESVXX ** Documentation *** Questions - Which of the low-level routines should be documented in the LUG? + xyyEQUB : *man page like xyyEQU* + xLA_yyRPVGRW : *internal for now* + xLA_yyAMV : *internal for now* + xLA_LIN_BERR : *internal for now* - What promises are made (if any) when the user supplies equilibration factors and/or a factored matrix? This should be included in the routine's comments as well as LUG section 4.4. *We promise nothing*. Explained in leading comments of xyySVXX, also needs explained in the LUG. *** TODO Add to the LUG This list covers only xGESVXX. A snapshot of the LUG is available at http://www.cs.berkeley.edu/~ejr/lapack-snapshots/trunk/lug-latest.tar.gz or unpacked under http://www.cs.berkeley.edu/~ejr/lapack-snapshots/trunk/lug . It's also on the EECS file system under file:/project/cs/demmel/b/LAPACK/snapshots/trunk/ . - [ ] 2.3.1 Driver Routines - Linear Equations Just a table entry. - [ ] 2.4.1 Computational Routines - Linear Equations xyyEQUB, xyyRFSX, and section's last sentence - [ ] 3. Performance of LAPACK Add "Cost of Extra Precision"? - [ ] 4.4 Accuracy and Stability - Error Bounds for Linear Equation Solving Add a 4.4.2 for extra precise refinement, possibly absorbing the last bit of 4.4.1. Caveat for user-provided factorization and equilibration. Symmetric routines will need a description of the equilibration somewhere. - [ ] 5. Design and Documentation of Argument Lists Array for optional arguments in xGESVXX: need feedback from users if this is acceptable. See the description of PARAMS in http://www.cs.berkeley.edu/~ejr/dev/lapack/itref-work/sgesvxx/sgesvxx.f and http://www.cs.berkeley.edu/~ejr/dev/lapack/itref-work/sgesvxx/sgerfsx.f . - [ ] 6. Installing LAPACK Routines + How to build the XBLAS: Fortran->C definitions in make.inc. *This may be replaced by requiring an external XBLAS*. - [ ] 7. Troubleshooting - Wrong Results Add failure modes from tech report. Actually, this section should provide a bit more guidance for those cases where the algorithms honestly fail. Right now it just blames the user. - [ ] Manual page for xyyEQUB. (Julie has some system to handle these.) *** TODO Add to the installation guide (new LAWN 81) The included LAWN 81 needs a lot of basic editing and updating. It certainly needs edited in the following ways: - General replacement of "tape" with web site, etc. - Update 5.2 "Edit the LAPACK/make.inc" to include necessary bits for linking with the XBLAS. - Information about the XBLAS. * <> ** TODO <> - [X] Sanitize the include files and _PARAM names. - [X] Rename BLAS_sge_diag_scale back to xLASCL2. The BLAS Tech Forum scaling routines are not useful for complex matrices, and we'd rather be internally consistent. - [X] Correct comments about INFO. INFO reports the first rhs that does not end in a converged state. - [X] Rearrange comments to assist the wrapper-generating script. Move the PARAMS description and replace the leading text with a forward reference. (Or fix the wrapper generator.) - [X] Double-check info return for tiny RCOND. - [X] Add "no promises" for user-provided factorization and scalings to the comments. - [ ] Use convergence status to decide if the result is trustworthy. (Not a small change. See below.) - [ ] Strip out debugging and tracing infrastructure and other refinement variations (Wilkinson and working-precision). - Final precision updates from origin SyySVXX, etc.: * [ ] double * [ ] complex * [ ] double-complex ** TODO Use convergence status for trustworthiness Currently, the code is structured as follows: - SGESVXX : Equilibrates, factors, and then refines. - SGERFSX : Refines all solns, decides if they're trustworthy. * SLA_GERFSX_EXTENDED : Runs the refinement state machine on all rhs. * SLA_GERFSX_{WILK,WORKING} : (Different algs for testing) Each level operates on all NRHS right-hand sides. There is a loop over the rhs in SLA_GERFSX_* that processes each rhs on its own. The routine currently is provided only enough workspace to process one rhs at a time. The "trust me" policies are separated from the mechanics of refinement. So a user can apply the refinement loop in SLA_GERFSX_EXTENDED without applying the LAPACK policies in SGERFSX. To keep this separation but apply the "only trust the answer when refinement has converged" policy, we need to return whether each rhs has converged. And we don't have space for an NRHS-long array anywhere. Possible options: 1. Move the per-rhs loop up to SGERFSX, making SLA_GERFSX_* apply to only one RHS at a time. The componentwise condition estimator would need run within the RHS loop so we could apply all the "trust me" policies there. This completely disallows working on blocks of the rhs at once using SLA_GERFSX_*. We currently don't have the workspace for blocks (or working on each rhs in a separate thread), so it probably doesn't matter. 2. Add a NRHS to the REAL workspace and encode the naturally LOGICAL convergence state. We don't want to return the entire state, only whether the state machine considers the solution converged. If we start mucking with the workspace, we should add a full LWORK/LRWORK/LIWORK/etc. query interface. The query routine would take N and NRHS, possibly asking for enough workspace to process blocks at a time. Block refinement would allow a single pass through A for each step, which may prove useful for the first few steps. ** TODO If the code accepts the componentwise result, also accept the normwise result. Componentwise convergence implies normwise convergence, so we should always accept the normwise result when we accept the componentwise result. The normwise condition number for xyySVXX is : norm(inv(R*A),inf)*norm(R*A,inf) for a suitable R. This ignores the solution and right-hand side, so we do not detect correct solutions in such cases as A*x=0. But the componentwise condition number *will* catch these cases and accept the solution. There are more considerations when A*x=0 occurs as a block in (possibly permuted) block-diagonal systems. We may want to include the solution / right-hand side in the normwise condition number at some point. * Deferred LAPACK changes ** Propagate scaling-independent pivot growth to old code :DEFERRED: Won't do for this release. Re-evaluate how it might affect existing users. xyySVXX (extended) calculates the pivot growth for each column and reports the maximum. The older xyySVX (not extended) calculates the growth across the entire equilibrated matrix. If the user disables equilibration or provides a poor column scaling factor, the xyySVX code may underestimate the pivot growth. ** Change underflow protection in BERR computations :DEFERRED: A SuperLU user reported an "interesting" bug which also can occur in LAPACK: Solving Ax = 0 is accurate but reports BERR = 1. Other interesting problems occur with block-structured matrices where some block has a zero solution. To protect against underflow and mis-rounding the residual, the routines compute: : BERR = max(0, SAFEMIN) / max(0, SAFEMIN) = 1 To fix this, xyySVXX uses a modified abs(A)*abs(x)+abs(b) calculation that still protects against underflow but does not perturb exact zero components (see xLA_yyAMV). We also encapsulated slightly more complicated 0/0 logic in xLA_LIN_BERR. This should be evaluated for eventual inclusion in the plain xyySVX. But this shouldn't block releasing LAPACK now. Might be worth considering an interval xyyMV for the computation, if any actually exist. ** <> rather than xyyEQU :DEFERRED: The xyySVXX routines equilibrate by powers of the radix to avoid the extra rounding error. This would be trivial to include in xyySVX. However, users may expect entries in the equilibrated A to be less than one in magnitude. Scaling by xyyEQUB's results only guarantee the entries to be less than the radix in magnitude. However, users cannot require the matrix be equilibrated. If the equilibration factors are all less than some threshold (see the refinement drivers), they are not applied. So really the user cannot rely on the entries' magnitudes... * General / Research *This section is not up-to-date*. ** Least-squares - Add statistical tests. Generate test data sets from a random linear model. Compare the sample variance of the refined r, b-Ax in working precision, and b-Ax in extended precision. - Also check each residual's orthogonality to the columns of A. - Extend to generalized least squares (straight-forward) and constrained least squares (difficult). ** Parallel / ScaLAPACK - PXBLAS? XPBLAS? ick? - Needs XBLACS as well. ** Generic iterative refinement driver itref:[[master-f90]] - Adapt for least-squares. Doesn't need many changes. - Adapt for parallel use. ** Compare results and performance with interval methods - MKL has interval solvers - MKL does not have an interval gemv to compare with geamv for berr, but it does have interval scalar operations. ** Compare with "DSGESV" See also http://icl.cs.utk.edu/trac/lapack-dev/wiki/RefinementComparison . - Compare timings and error results. - Need to separate condition estimation, etc. Variables: - equilibration: none, col-only, both - factorization precision: single, double - residual precision: = fact-prec, > fact-prec (80-bit), >= 2*fact-prec - update (x+dx) precision: = fact-prec, > fact-prec (80-bit), >= 2*fact-prec - termination criteria: # steps, berr, ferr-est Results: - time to solution for 1 rhs, many rhs - normwise BERR - cwise BERR - normwise FERR - cwise FERR ** Profiling and Optimization - How much faster would refinement be with HW quad? What if quad were 2x or 10x slower than double? Need: * Time in factorization * Time in refinement * Time in GEMV_X and GEMV2_X * speed of GEMV_X and GEMV2_X v. GEMV For what would be saved if we had fast intervals: * Time in error computation (compare with using mkl's?) Remaining in total: * Time in equilibration and scaling * Time in rconds * Time in copying overheads? Interesting problems? * compare v. size * compare v. condition number - Compare relative costs for sparse & dense. - What is the accuracy / performance trade-off with 80-bit accumulators. We know the error v. condition number pictures will be more difficult to describe. ** Other solvers? - GMRES (normwise backwards-stable, so could work when A is not available); latency-amortizing GMRES - unstable but fast sparse triangular solves - block LU factorization, Strassen (not column-scaling invariant), etc. ** Possible applications - Finite termination algorithms in linear optimization use equality-constrained least squares. If we can solve these accurately enough, combinatorial optimization algorithms may recognize when the solution is at a vertex rather than in a face. - Integration into Octave and R. Both have expressed an interest in using slightly more expensive arithmetic to get the right answer by default. * Current source repositories: We mostly use [[http://git.or.gz/][git]] for source control. ** iterative refinement experimental driver locations: - http://www.cs.berkeley.edu/~yozo/gits/itref.git - http://www.cs.berkeley.edu/~ejr/gits/itref-work.git snapshots: - http://www.cs.berkeley.edu/~ejr/dev/lapack/itref-work/ \\ See the sgesvxx-(date).tar.bz2 files. dependencies: - Needs the matrix-generator library from LAPACK. It's the TMGLIB in the top-level make.inc. - Needs the latest XBLAS snapshot. interesting branches: - <> (both): The current main line for stable development. Has testing/ and TIMING/. - <> (both): General refinement interface, F95. Not up-to-date relative to master. - TODO (ejr's): Maintains this TODO list. It's a special tree, similar to dev-tools in the LAPACK repo below. ** least squares iterative refinement locations: - http://www.cs.berkeley.edu/~ejr/gits/itref-lsq.git snapshots: - http://www.cs.berkeley.edu/~ejr/dev/lapack/itref-work/ \\ See the lsqxx-(date).tar.bz2 files. interesting branches: - master: Main line for development. - sgesvxx/*: A copy of plain refinement's [[master]] branch for use as a library. Merged by the experimental git [[http://git.kernel.org/?p=git/git.git;a=commit;h=68faf68938ee943fc251c702f2027e4dfda354db][subtree merge strategy]]. ** <> locations: - http://www.cs.berkeley.edu/~yozo/gits/xblas.git - http://www.cs.berkeley.edu/~ejr/gits/xblas.git snapshots: - http://www.cs.berkeley.edu/~yozo/software/ \\ Use the latest xblas-*.tar.gz . interesting branches: - dev-tools (yozo's): Tools for updating the LAPACK side. merged, no longer interesting: - f2c-bridge (both): Includes an automatically generated bridge to allow Fortran->C XBLAS calls. dead branches: - use-xerbla (ejr's): Work towards using XERBLA in BLAS_error. Dead. Use the in-progress [[xerbla-c2f]] changes as detailed in Use XERBLA. ** <> locations: - svn master: https://icl.cs.utk.edu/svn/lapack-dev/trunk/lapack - local git repo: http://www.cs.berkeley.edu/~ejr/gits/lapack/lapack.git *WARNING: I haven't updated my git repo lately.* snapshots: - from svn on the CS file system: file:/project/cs/demmel/b/LAPACK/snapshots/trunk/ \\ also through: http://www.cs.berkeley.edu/~ejr/lapack-snapshots/trunk/ interesting branches in local git: - svn-master, svn-june-release: Mirrors of the UTK Subversion repository. - master: Based off of svn-master. Ideally, this will not differ from svn-master for long. - <>: Changes to integrate the timing part of the current itref code. Note: TIMING/ is dead. - <>: Changes to integrate testing, currently may not exist. - <>: Adds a XERBLA_ARRAY procedure that converts its argument from an array of CHARACTER(1) to a string and calls XERBLA. To be used by components in other libraries, particularly the XBLAS's BLAS_error. Also adds a make.inc-generating configure script. - <>: Collection of scripts and extra git-specific gizmos like the exclude list. It's not a true branch. Use by cloning this specific branch into a dev-tools subdirectory of the lapack.git clone. Something like the following should work: : mkdir dev-tools : cd dev-tools : git init-db : git fetch `pwd`/.. dev-tools:master : git checkout master : git repo-config remote.origin.url `pwd`/.. : git repo-config remote.origin.fetch refs/heads/dev-tools:refs/heads/master Then you can ignore generated files by: : cp dev-tools/git-info-exclude .git/info/exclude #+SEQ_TODO: OPEN TODO WAITING FEEDBACK TEST DONE # Local Variables: # org-export-html-style: "" # End: #+OPTIONS: ^:nil