-
Best Practices Using Batch
Writing Output from Batch Jobs
There are a number of different options available for writing output from
batch:
-
Local space provides the best performance when writing your batch
output. By local space, we mean /tmp or /scratch on the batch worker.
Not all batch workers have a /scratch, you can check an individual
batch worker with the command "bhosts -l ". Look for
lines like the following under "CURRENT LOAD USED FOR SCHEDULING:"
lammpi_load scratch
Total 0.0 38.0
Reserved - 0.0
The entry 38 under scratch tells you how much /scratch
space is currently available, in gb.
-
NFS space can be used for writing output from batch jobs, but
has the disadvantage of having to go across the network which will be
slower than writing locally. Another problem when writing to NFS is
the severe impact that results when the NFS filesystem becomes or is
close to becoming full. An NFS server can become unresponsive if
there numerous batch jobs trying to write to an already full
filesystem. The server has to spend its time checking on the
filesystem and then denying each request if the system is full. Since
NFS servers often have a number of filesystems that they export, users
of filesystems that are not full will also suffer since it is the
server that is the victim, not just the full filesystem.
-
AFS space is usually NOT a good place to write from batch. AFS
has a difficult time of it when multiple machines are all reading and
writing to the same file
at the same time. The same goes for changing the status of entries in
a directory. Everytime the file/directory is changed, each system that
is trying to read it needs to recontact the server for the update.
This puts quite a load on the AFS server and can cause the server to
slow down having a noticeable impact on all users....interactive
sessions will appear to hang and in severe circumstances users will
see "Lost contact with fileserver xxxx" messages.
If it is desireable to store results in AFS, it is best to
write locally to /tmp or /scratch space and then copy the
results into AFS, once, at the end of the job.
Owner: Renata Dart
|