Valgrind for BaBar
Summary
Valgrind is an open-source tool for finding memory management problems (leaks
and corruption) on linux/x86 systems. The main web page for the tool can be
found here. It is simple to
use and requires no recompilation of your program.
To summarize, to obtain information on your program all you have to do is:
valgrind <valgrind-options> <your-executable> <your-exe-arguments>
and the 'valgrind' output will appear on stdout, mixed with your program
output. See 'valgrind -h' for a number of useful options.
Notes and suggestions on how to use valgrind
- When you run your application with valgrind you will find that it:
- takes more memory that it would when run normally
- runs significantly more slowly
In practice this means that you should run over a limited number of events
(e.g. in a Framework based application). Most problems can in fact be found
by running over a small number of events, however.
- If you are looking at a crash which happens on the Nth event in some
event collection (where N is large), usually you can just skip ahead and start
processing just before that event when you run with valgrind.
- The simplest way to run valgrind just checks for memory corruption or
misuse, but not leaks. Useful valgrind options to use in this case have been
found to be:
--num-callers=15 --error-limit=no
From version 2.2.0 onwards you can also specify the tool to use with something like:
--tool=memcheck
- In addition valgrind can be used to look for memory leaks. (This slows it
down a bit and requires somewhat more memory.) Useful valgrind options to use
in this case have been found to be:
--leak-check=full --show-reachable=yes --num-callers=15 --error-limit=no
- When chasing memory leaks, you will have to differentiate between once
per job leaks and leaks which occur in the event loop. (The latter are
typically those which cause more problems.) When running over a small
number of events it can sometimes be difficult to differentiate these
two things. Many of us find it useful to run two valgrind
jobs when looking at leaks, one on M events and one on 2*M events. By
looking at the difference, the per event leaks are more readily
identified.
- When examining a valgrind leak report, it is usually useful to begin
by focusing on those flagged as "definitely lost", then look at those
flagged as "possibly lost" and then finally at the "still reachable"
category. Sometimes things in the later categories are simply knock-on
effects of those in the former (i.e. in particular "definitely lost").
Installation and use at SLAC
John Bartelt has kindly installed version 3.1.0 of 'valgrind' in
/usr/local/bin on the linux machines, so you should find it automatically
in your path.
At SLAC a special batch queue (the "valgrindq") is provided which has a small
number of machines setup such that jobs can use up to 3GB of real
memory. Please use this queue only for running your application with valgrind
rather than run interactively. (And run only valgrind jobs in the valgrindq.)
Installation and use away from SLAC
Download the valgrind source from the valgrind website and
follow the instructions to build it in the "INSTALL" file included with the
source. It is very straightforward and requires nothing particular in terms
of other installed software (except for the compiler and other things you
would have on any standard linux system).
At SLAC we noticed that it was necessary to compile valgrind explicitly
for different RH versions (e.g. RH7.2 and RH9) due to the fact that the use
different glibc versions. It was not possible to run a valgrind binary
compiled on RH7.2 on a RH9 machine. (AFAIK, this has not been cheked for
RHEL3.)
As we note for SLAC above, valgrind is very memory intensive, so you
will need to run valgrind on a machine with the appropriate amount of memory
for the application you are testing and arrange things such that one user
running valgrind on a machine doesn't cause problems for other users on the
same machine.
Last modified 27-Jan-2006,
Peter.Elmer@cern.ch
|