Batch system in a nutshell
The SLAC's batch system uses SCS UNIX compute farm that is based on LSF (Load Sharing Facility).
13 Feb 2006
Related pages: [BaBar Home Page] [Computing] [Tools]
Map:
Introduction:
Submits a command to the batch system.
Syntax:
bsub [options] command [argument]
Major Options:
-c <hh:mm> [amount of CPU time]
-q <queue> [job queue]
Minor Options:
-J <jobname> [specify job name]
-m <host> [run job on this machine]
-R <resource> [run job on this resource]
Execution Options:
-E <command> [specify pre-run command]
-L <shell> [specify a login-shell]
-nr [job is not re-runable from beginning or last check point]
-r [job is re-runable from beginning or last check point]
I/O Options:
-i <infile> [specify standard input file]
-o <outfile> [specify standard output file]
-e <errfile> [specify standard error file]
Example of bsub:
bsub -q bldrecoq -m build02 gmake all
bsub -q bldrecoq -m build02 ls -la /u1/drjohn/bfdist/releases/nightly
bsub -q bldrecoq -m build02 'ls -la /u1/drjohn/bfdist/releases/nightly/DbiEvent/*'
Introduction:
Queries the status of jobs in the batch system.
Syntax:
bjobs [options]
Major Options:
-u <user> [specify user, all means all users]
Minor Options:
-a [all jobs]
-l [long form]
Example:
bjobs [query my jobs in the batch queue]
bjobs -u mark [query all jobs submitted by user mark]
Major batch queue commands:
bkill [kill batch jobs.]
bsub [submit a job for batched execution.]
bmod [modify the parameters of a submitted job.
Minor batch queue commands:
bacct [generate accounting information about batch jobs.
bchkpnt [checkpoint batch jobs.]
bmig [migrate a job.]
brestart [restart a job from checkpoint its files.]
Suspend/resume commands:
bbot [move a pending job to the bottom (end) of its queue.]
bresume [resume suspended batch jobs.]
bstop [suspend batch jobs.]
bswitch [switch pending jobs from one queue to another.]
btop [move a pending job to the top (beginning) of its queue.]
Query commands:
bjobs [display the status and other information about batch jobs.]
bqueues [display the status and other information about batch job queues]
bhosts [display the status and other info about Batch server hosts]
bhpart [display information about Batch host partitions]
busers [display information about Batch users]
bugroup [display the user group names and their memberships]
bmgroup [display the host group names and their memberships]
bparams [display the info about the configurable system parameters]
bpeek [display the stdout and stderr output produced so far by a batch]
bhist [display the processing history of batch jobs.]
Examples of batch system:
General examples:
bsub -c00:30 gmake all [build test release]
bjobs [find my batch job]
bkill 388999 [kill this job]
Use specific host:
bqueues -m <host> [which queue suports this machine]
bsub -q <queue> -m <host> <commands..> [run on this machine]
[Note]: this won't work for the moment for build10. Use
the following:
bsub -q <queue> -R sol7 <commands..> [run on build10]
Getting help or more information regarding LSF batch system. This is web page "High Performance Computing at SLAC" provided by SCS.
Maintained by Terry Hung. Send suggestions and additions to
terryh@slac.stanford.edu 650-926-3618
|