SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo
Workbook HEPIC Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews
Unwrap page!
Wkbk. Search
Wkbk. Sitemap
Introduction
Non SLAC
HOWTO's
Introduction
Logging In
QuickTour
Detector
Info Resources
Software Infrastructure
CM2 Introduction
Unix
OO
SRT
Objectivity
Event Store
Framework
Beta
Modifying Code
Writing and Editing
Compiling
Debugging
Analysis
Framework II
Analysis
Find Data
Batch Processing
PAW
PAW II
ROOT I
ROOT II
ROOT III
Advanced Infrastructure
New Releases
Workdir
Main Packages
Event Displays
Gen/Sim/Reco
Contributing Software
SRT and CVS
Coding
Advanced Topics
Make CM2 Ntuples
New Packages
New Packages 2
Persistent Classes
Java
Site Installation
Check this page for HTML 4.01 Transitional compliance with the
W3C Validator
(More checks...)

Workbook for BaBar Offline Users - Quick Tour Trouble-Shooting

The quicktour was becoming over-burdened with notes about what might go wrong, and what to do if something did go wrong. Those notes have been moved to here. This page will be continually updated when we find any other problems or "gotchas" in the workbook.

Contents


Logging into SLAC from a remote machine

If you logged into yakut using ssh yakut.slac.stanford.edu -l <username>, (or noric or tersk, similarly) and you have logged into yakut in the past, you may have had got a response like
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    @       WARNING: HOST IDENTIFICATION HAS CHANGED!         @
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
    Someone could be eavesdropping on you right now (man-in-the-middle attack)!
    It is also possible that the host key has just been changed.
    ...
In this case it should be sufficient to note the number of the machine you are currently logged into: yakut0x, where x is a number, log out, then log in again with the command ssh yakut0x.slac.stanford.edu -l <username> and the display should work ok.

Return to Quicktour


Shared libraries error

If you get an error message like:
bin/Linux24SL3_i386_gcc323/BetaMiniApp: error while loading shared libraries: 
libCore_pkgid_4.04-02.so: cannot open shared object file: 
No such file or directory
it probably means that you have forgotten the "srtpath" command.

No BOOT file

If you get an error message like:
No boot file has been set, either explicitly or using OO_FD_BOOT
it means that you have forgotten to set up the data path with the cond18boot command.

"Could not find datafile" messages

If you get an error message like:

Could not find datafile "BetaPid/PidDRCLike.dat"
   Path was ".:RELEASE:ONLINEPARENT:PARENT:/afs/slac.stanford.edu/g/babar"
Could not find datafile "BetaPid/PidDRCLike.dat"
   Path was ".:RELEASE:ONLINEPARENT:PARENT:/afs/slac.stanford.edu/g/babar"
Could not find datafile "BetaPid/PidDRCLike.dat"
   Path was ".:RELEASE:ONLINEPARENT:PARENT:/afs/slac.stanford.edu/g/babar"
Could not find datafile "BetaPid/PidDRCLike.dat"
check that you are actually in workdir. (Running the job from other directories is a common error.)

Here you might occasionally (should only happen once) get an error message about BetaMiniApp being unknown. If that is the case, you should first make sure you are actually in workdir (a common error), and if that's not the problem, type

gmake setup
in workdir and this will reset the workdir configurations correctly (the symbolic links in workdir sometimes get mangled if you do a gmake clean from within workdir rather than in the release directory). Then you should be able to run the BetaMiniApp executable without problems.

Return to Quicktour


Federated Database Unavailable, waiting...

When you try to put in a collection with
   > mod talk KanEventInput
   KanEventInput> input add /store/SP/R18/001237/200309/18.6.0b/SP_001237_013238
if you see output of the form:
The Federated database [/afs/slac/g/babar-ro/objy/databases/boot/physics
/V7/ana/conditions/BaBar.BOOT] is currently unavailable - waiting...
and it's Monday 8:00 am to 4:00 pm or Thursday 4:00pm to midnight (SLAC time), you have tried to run your job during the time set aside for making data skims and general database maintenance. Do a CTRL-C to exit the job and try again later. If it isn't during these times, there is possibly a problem with the databases, give it a while to be fixed and look at hypernews to see if anyone is reporting a problem there.

Return to Quicktour


Problems Running PAW

If you try to start up a PAW session and get an error message such as:

   X connection to shire01:11.0 broken (explicit kill or server shutdown).
then you do have a problem. PAW has been unable to open a window on your desktop. Exit paw, check that your xwindow client is turned on and then try the pawX11 command again.

Another source of possible difficulty is confusion caused by running with ssh, required for enhanced security. In this case, instead of hitting <CR>, try:

   Workstation type (?=HELP) <CR>=1 :  1.my_workstation     (your workstation name)
or
   Workstation type (?=HELP) <CR>=1 :  1.aaa.bbb.ccc.ddd    (your workstation ip address) 

If the HIGZ window does appear, but the PAW session never returns to the "PAW >" prompt, exit (ctrl-C), and try again by typing the command:

   ana30/workdir> /cern/95a/bin/pawX11 

Finally, if the PAW session seems to start ok and you get a prompt, but no HIGZ window appears, try quitting the session, and at the start when you are asked for "workstation type", try entering "2".

If all of this fails, you have a major problem, and should consult with someone working on similar machines as you, or your department system manager.

Return to Quicktour


Where did my binary go?

There are two possible causes for an executable file to "go missing":
  1. you haven't used the file in a week
  2. you accidentally (or intentionally) issued a "gmake clean" command from within your workdir
In the first case the binary is indeed gone. The Quicktour setup includes a command to put all potentially large temporary files in the BaBar "scratch" area. Since files like binaries and library files can easily be regenerated, and because people tend to hoard files they won't actually use any more, but just want to keep around "just in case", file space is saved by BaBar automatically cleaning out the scratch space by deleting files that have been there for more than a week. In the case of binaries, the solution is to simply re-compile and re-link the files. In the second case, your binary is almost certainly still there. What has happened is that you issued the gmake clean command from the workdir, rather than from the release directory where one would normally issue the command. The result is that a symbolic link from your workdir to the area on the scratch disk where your binary is actually stored will have been deleted. This is easily solved - simply issue a
gmake setup
command from your workdir (or gmake workdir.setup from your release directory.

Return to Quicktour


Problems writing to NFS disks

When writing output with the -o option of the bsub command, y ou should never refer to NFS disks by their "automount" names, namely:

   /a/...
But by the full NFS path:
   /nfs/...
as the former will only work if there has previously been a job submitted to the same batch machine that mounted the disk via the NFS path and the automounter hasn't unmounted it yet. Return to Quicktour

Why did my job die?

Generally when a job dies (e.g. the ntuples are not completely filled, or are not even made), an error message with an error code is written to the output (which you will presumably have written to a log file for checking for just such a problem).

One of the most comon exit codes is exit code 130. This usually means that the job has exceeded the CPU time allowed for the particular queue it is running in. The solution here is to use a queue with a larger CPU limit, or run your job on a smaller number of events.

More information about exit codes can be found on the exit codes webpage.

Return to Quicktour


Back to Workbook Quicktour

Author: Jenny Williams

Last updated: 13 February 2006
Last significant update: 3 June 2005