Two Minute Overview of SLAC's UNIX Batch System 4Feb2004 Here are a few pointers on getting started with UNIX batch computing using the SCS UNIX compute farm. The compute farm is under the responsibility of SCS' High Performance Computing team (aka the HPC team). UNIX batch is controlled by a system called LSF, Load Sharing Facility, for handling job scheduling, queues, etc. All of the batch machines offered by SCS under LSF start with the name "bronco" for Sun Solaris systems and either "barb, noma or tori" for RedHat Linux systems though other groups may have machines available for their own use under LSF. NOTE: LSF is only available on systems that are so licensed -- this includes any of the interactive compute farm machines (with the generic name of "tersk" for Sun Solaris systems and "noric" for RedHat Linux systems) and the X hosts (currently a set of machines with the generic name of "flora" for Sun Solaris systems). Use a web browser to look at http://www.slac.stanford.edu/comp/unix/unix-hpc.html for more information. But in the meantime, since you're probably in a hurry, here are some ways to learn more. >>>>> IF YOU READ ANYTHING AT ALL, PLEASE READ THIS PART <<<<< To try running a batch job, simply give the command: bsub -c0:1 hostname This will submit a trivial batch job (-c0:1 means I've requested a maximum of 0 hours and 1 minute of SLAC CPU time) that consists of just the UNIX command "hostname" (which just prints out the name of the host computer on which the job runs). Try it out. You'll get a message like: Job <277785> is submitted to default queue . Then, in a short amount of time unless the batch system is terribly overloaded, you'll get output back in your mail that will look something like this: =============================================================================== Date: Thu, 06 Jul 2000 17:16:09 -0700 (PDT) From: LSF Subject: Job 277785: Done Sender: LSF System To: randym@SLAC.Stanford.EDU Message-id: <200007070016.RAA13255@bronco426.SLAC.Stanford.EDU> Job was submitted from host by user . Job was executed on host(s) , in queue , as user . was used as the home directory. was used as the working directory. Started at Thu Jul 6 17:16:00 2000 Results reported at Thu Jul 6 17:16:09 2000 Your job looked like: ------------------------------------------------------------ # LSBATCH: User input hostname ------------------------------------------------------------ Successfully completed. Resource usage summary: CPU time : 0.77 sec. Max Memory : 2 MB Max Swap : 4 MB Max Processes : 1 The output (if any) follows: bronco426 =============================================================================== You can use the bjobs command to ask the system about the status of any submitted jobs that are not yet finished. >>>>> EASY, RIGHT? READ ON FOR THOSE STILL WANTING MORE INFORMATION. <<<<< 1. You can use the "man" command to learn about lsf by typing "man lsfintro". This will in turn give pointers to other commands, such as "bsub", that will be useful. Also "man lsfbatch" has useful information. 2. You can also try "man xlsbatch" to learn about a graphical user interface to much of the job submission system. The "xlsbatch" command runs in any X windowing environment that is also licensed for LSF. 3. More information can be found on the Web at http://www.slac.stanford.edu/comp/unix/farm/farm.html including a version of SLAC's "Introduction to the UNIX CPU Compute Farm". 4. The HPC team member responsible for LSF is Neal Adams, ext. 2821, neal@slac.stanford.edu. Neal is always interested in helping people use LSF or listening to suggestions about configuration changes, wishlist features, etc. 5. The HPC team also has a news group called slac.comp.computefarm. Items are posted there that are not time-critical in nature and/or fit into an environment suitable for dialog. I hope that should be enough to get started. After you've read a little bit, you might want to simply try the xlsbatch command with a graphical user interface and submit a trivial job or two, even just a command if you wish, to get some experience. And of course, I'm always interested in feedback. (There is also a mechanism for dealing with large staged files on working disk space that reside on silo cartridges. But that's another story.) If you'd prefer to talk to a colleague who has started this, here are some people who have agreed to help: Charlie Young, young@slac.stanford.edu Tom Glanzman, dragon@slac.stanford.edu Charlotte Hee, chee@slac.stanford.edu -- Randy Melen, SCS/High Performance Computing Team, randym@slac.stanford.edu, x2841