From 6b8cea9797540986fe80eb37f58d8dbca80c004c Mon Sep 17 00:00:00 2001 From: Jason Riedy Date: Sat, 26 Aug 2006 13:15:04 -0700 Subject: [PATCH 3/3] Document the installation and Fortran->C bridge in the README. Naturally, the documentation is not perfect, but it's something. And I added myself to the helpers. Signed-off-by: Jason Riedy --- README | 56 ++++++++++++++++++++++++++++++++++++++++++++++---------- 1 files changed, 46 insertions(+), 10 deletions(-) diff --git a/README b/README index c5fba9e..5656eaf 100644 --- a/README +++ b/README @@ -1,3 +1,11 @@ +Table of Contents + 1. Introduction + 2. Content + 3. Installation + 4. Code Generation with M4 macro processor + 5. Testing + 6. Feedback + 1. Introduction This library of routines is part of a reference implementation for @@ -36,9 +44,8 @@ 2. Content The BLAS Standard defines language bindings for Fortran 95, Fortran - 77, and C. Here, we have only implemented the C version. However, - our methodologies for code generation and correctness testing also - apply to Fortran. + 77, and C. Here, we have only implemented the C version and + provided a method for binding to one Fortran 77 ABI. In this initial release, we provide the following 11 routines: @@ -93,8 +100,37 @@ testing/test-sum - SUM test code and results testing/... +3. Installation + + The reference XBLAS are built similarly to the reference BLAS and + LAPACK. You need to provide a make.inc file in the source + directory that defines the compiler, optimization flags, and + options for building a Fortran->C bridge. Some examples are + provided. The current build system produces a static libxblas.a. + + M4 is not necessary for the distributed archive; it is only + necessary if you modify the generator sources under m4/. See + README.devel for more information. + + The Fortran->C bridge uses details of a specific toolchain's binary + interface, in particular how the Fortran compiler mangles names. + See src/f2c-bridge.h for the available options. Most compilers can + support different name mangling schemes; be sure to use the *SAME* + naming options for all your Fortran code. + + The Fortran->C bridge is included in libxblas.a. Each of the + bridge's object files matches *-f2c.o, so you can extract them with + ar if you need a to share one libxblas.a between multiple Fortran + compilers. Example steps to strip the Fortran->C routines from + libxblas.a and place them in a separate libxblas-myfortran.a are as + follows: + + ar t libxblas.a |fgrep -- -f2c.o | xargs ar x libxblas.a + ar ru libxblas-myfortran.a *-f2c.o + ar x libxblas.a *-f2c.o + rm *-f2c.o -3. Code Generation with M4 macro processor +4. Code Generation with M4 macro processor In the existing BLAS, there are usually 4 routines associated with each operation. All input, output, and internal variables are @@ -166,7 +202,7 @@ (This does not count the shared M4 macros in the file cblas.m4.h.) -4. Testing +5. Testing The goal of the testing code is to validate the underlying implementation. The challenges are twofold: First, we must thoroughly test routines @@ -194,7 +230,7 @@ order to reveal the internal precisions actually used. For details, see the paper in file doc/report.ps. - 4.1 Testing DOT + 5.1 Testing DOT DOT performs the following function: @@ -253,7 +289,7 @@ (*) |r_computed-r_acc| <= (n+2)(eps_int and y(1:n) judiciously in order to cause as much cancellation in r as possible. - 4.2 Choosing input data and computing r_acc + 5.2 Choosing input data and computing r_acc The general approach is to choose some of the input values of x(i) and y(i) so that the exact (partial) dot product of these values has a lot of @@ -283,7 +319,7 @@ (*) |r_computed-r_acc| <= (n+2)(eps_int x(1)*y(1) = -x(3)*y(3) >> x(2)*y(2) so that SUM_{i=1,3} x(i)*y(i) = x(2)*y(2) exactly. - 4.3 Testing SPMV and GBMV + 5.3 Testing SPMV and GBMV SPMV, GBMV, and many other BLAS2 routines perform the following function: y <- beta * y + alpha * A * x @@ -301,7 +337,7 @@ (*) |r_computed-r_acc| <= (n+2)(eps_int This approach can be generalized to most other Level 2 and 3 BLAS. -5. Making comments on this design +6. Feedback Please send any comments or bug reports to extended_blas@cs.berkeley.edu. This code was developed by @@ -317,4 +353,4 @@ (*) |r_computed-r_acc| <= (n+2)(eps_int Teresa Tung, Daniel Yoo - with help from Ben Wanzo, Berkat Tung and Weihua Shen. + with help from Ben Wanzo, Berkat Tung, Weihua Shen, and Jason Riedy. -- 1.4.2.g6580-dirty