[ Connecting | Storage | Compilers | Shared Interactive Use | Exclusive Batch Use | PAPI | MPI | UPC ]
Be sure to visit the CITRIS cluster site as well. Also see my slides (pdf) on using the CITRIS (and Seaborg) system.
You can log onto the front-end nodes (indicated below under Shared Interactive Use and Exclusive Batch Use) from any berkeley.edu domain and some lbl domains (as well as some others). To log on, use a SSH program (such as OpenSSH and PUTTY; others are listed at here). For many command-line ssh, you can use ssh -l username hostname, for example,
ssh -l yozo lime.millennium.berkeley.edu
There are three storage location available on the CITRIS cluster:
Both the GCC and the Intel compilers are installed on the CITRIS cluster. The default gcc compiler in /usr/bin is GCC version 3.3.5. If anyone is interested, I have GCC version 3.4.3 installed in /home/eecs/yozo/opt/ia64-unknown-linux-gnu/gcc-3.4.3. The Intel compilers (version 8.1, called icc for C/C++ and ifort for Fortran 77/95) is located under /usr/mill/bin. You probably want to add /usr/mill/bin to your PATH and /usr/mill/lib to your LD_LIBRARY_PATH.
There are approximately 45 nodes (includes both 900 MHz and 1300 MHz nodes) for shared interactive use. You and others can run multiple jobs on a single node simultaneously. This means that these nodes are not particularly well-suited for highly synchronized jobs or timing measurements since the load on these nodes can vary over time while your program is executing. However, it is useful for general testing of your code as well as running of loosely synchronized parallel jobs, and gives better turnaround time (since jobs start executing immediately). Slow (900 MHz) nodes are c1-c16 and c49-c54. Fast nodes (1300 MHz) are c17-c32 and c55-c61.
The front-end login nodes are
lime.millennium.berkeley.edu lemon.millennium.berkeley.edu
Parallel jobs can be executed with gexec. To restrict what nodes are used by gexec, you can set the GEXEC_SVRS environment variable. For example, to use c23 and c19, you can do
% env GEXEC_SVRS="c23 c19" gexec uname -n 0 c23 1 c19To run k instances of a program prog do
% gexec -n k progprovided you have already set the GEXEC_SVRS environmental variable.
Command gstat shows the load on each node.
There are 16 nodes (all 1300 MHz nodes) reserved for exclusive batch use. This system is designed for longer jobs, highly synchronized jobs, and jobs requiring accurate timing. You submit a job (most likely a shell script), and at some times in the future depending on the load, the job is run with exclusive access to the assigned set of nodes.
The front-end login node is grapefruit.millennium.berkeley.edu. From grapefruit, you can launch PBS (Portable Batch System) jobs into the queue. PBS is installed in /usr/pbs. Please make sure that /usr/pbs/bin/ is in your $PATH and that /usr/pbs/man is in your $MANPATH.
Following is an example script that runs the command hostname on 7 nodes, each with 2 processors (so it runs 14 instances of hostname).
#!/bin/sh #PBS -l nodes=7:ppn=2 #PBS -l mem=400mb #PBS -l walltime=1:00:00 gexec -n 0 hostname
Pay particular attention to the three PBS options above. The option nodes=7:ppn2 specifies that you want 7 nodes with 2 processors per node. Please do not reserve more nodes than you actually need. The option mem=400mb specifies that your program will require at most 400 MB of memory, and if it uses more it will be terminated. This is used as a safeguard against runaway memory allocation, for example. Finally the option walltime=1:00:00 specifies that the program should not run longer than one hour. This is used as a safeguard against infinite loop, for example. Please use these options to guard against unnecessarily wasting the computing resources (due to bugs).
If you want to run MPI program using the PBS system, use something like
NPROCS=`wc -l < $PBS_NODEFILE` mpirun -v -machinefile $PBS_NODEFILE -np $NPROCS mympiprograminstead of the gexec -n 0 hostname line.
Assuming that the above script is saved in the file hostname.sh, you can submit them by the command
qsub hostname.sh
When the job is done, you will see two output files (such as hostname.sh.e728 and hostname.sh.o728), one capturing the standard error and the otehr capturing the standard output. To query the current status of your jobs, use the qstat command. In particular, qstat -f gives a very detailed information. To cancel a submitted job, use qdel. For more detailed information about PBS, see the man pages (in /usr/pbs/man) and PBS Pro Users Guide (pdf).
PAPI provides a common interface to hardware performance counters found in various platforms. On CITRIS, PAPI version 2.3.4 is installed at /usr/mill/pkg/papi. Programs using PAPI can be compiled with
gcc -o my_prog -I/usr/mill/pkg/papi/include/papi-2.3.4 my_prog.c -L/usr/mill/lib -lpapi
See PAPI documentations for details. In particular the User's Guide gives a nice introduction, and toward the end, in Appendix A, it has a list of events it can count on different platforms (including Itanium). Not only it can measure L1/L2/L3 cache misses, it can also measure things like "Cycles Stalled Waiting for Memory Access" and "Cycles with No Instruction Issue".
MPICH, an implementation of the MPI (Message Passing Interface) library is installed in /usr/mill. The mpicc program is used to compile MPI programs. To run the programs you have compiled on processors, use something like
mpirun -np 4 my_program
See this tutorial on using MPI on Millennium for more details. This tutorial was written with the x86 Millennium cluster in mind, but most of it carries over to the CITRIS cluster.
On CITRIS cluster, there is two versions of UPC, depending of which compiler (gcc or icc) it uses as a backend. Add to your PATH either
/project/cs/titanium/b/temp/citris-upc/stable/ecc/runtime/inst/binor
/project/cs/titanium/b/temp/citris-upc/stable/gcc/runtime/inst/binIn addition, to make running on multiple nodes a bit easier, you might want to set the following environment variables:
export GASNET_GEXEC_CMD="/usr/bin/gexec -p none"(use setenv and no = sign for C shell). To run a UPC-compiled program, you can do something like
export GASNET_SPAWNFN=C
export GASNET_CSPAWN_CMD="gexec -n %N %C"
upcrun -n 8 -c 2 /path/to/myprogramwhich runs with 8 processes distributed over 4 nodes (2 per node). You can generate analyze the communication behaviour of UPC programs using upc_trace, at least for UPC programs running on Myrinet. You need to compile your program with a debug version of UPC provided at
upcc -gRun your program with -trace option, which generates a bunch of trace files like upc_trace-knap-gm-2-25567-0. You can then generate a report by running
upc_trace upc_trace-knap-gm-2-25567-*for example. See the man page for upc_trace for details.
More information can be found at UPC Homepage, in particular see the UPC Users' Guide, the upcc, upcrun upcrun man pages.
| [ Main CS 267 | GSI Page ] | Last updated March 16, 2005 |