Debugging
A debugger is a utility to deal with run-time errors -
errors that you encounter when you try to run your executable.
This section will show you how to compile your code in debug mode,
and how to use debuggers to trace problems in a program at the
source code level.
A debugger enables you to control a program's execution,
symbolically monitoring program control flow, variables, and memory
locations. You can also use the debugger to trace the logic and flow
of control to acquaint yourself with a program written by someone
else.
Different machines have different debuggers. The debugger
for Linux machines like yakut and noric is called gdb. The
debugger for Sun machines like shire is called dbx. This section
will give a brief overview of both debuggers.
This section also includes examples of how to use gdb.
They assume that you already have checked out analysis-42 and followed the instructions in Example 2
to include a momentum histogram in the QExample module.
Generally when you compile and link your code for running an analysis
job, the code is optimized to run faster. The BaBar packages
that you did not check out (but that your code still uses) will have
been compiled optimized as well. When you want to debug code it will
typically be your own code that will have bugs that you need to
find, as most BaBar releases (particularly analysis releases) have
been tested by experts. The best way to do this is to compile and link
your code with the flags -noOptimize-Debug. This will generally
enable the debuggers to pinpoint the exact line in your code where the problem
occured.
There are two ways to compile and link in debug mode:
- When you issue the
srtpath command, select
the -noOptimize-Debug option instead of the default. For
example, if you are logged into yakut and have checked out
the release analysis-42, srtpath will give you four options:
Select/enter BFARCH (CR=1):
1) Linux24SL3_i386_gcc323 [prod][test][active][default]
2) Linux24SL3_i386_gcc323-noOptimize-Debug [prod]
3) Linux24RHEL3_i386_gcc323 [default2]
The default option is option 1, but if you select 2, all of your gmake
commands will run in -noOptimize-Debug mode.
- Alternatively, you can select just
Linux24SL3_i386_gcc323
architecture. Then when you wish to debug your code, you would just issue the
gmake commands with ROPT=-noOptmize-Debug, as follows:
ana42> bsub -q bldrecoq -o all.log gmake all ROPT=-noOptimize-Debug
Note that if you have been compiling and linking in Optimized mode, and
you want to recompile for debugging, you will need to issue a gmake clean
or gmake cleanarch command to flush out the Optimized library and binary files.
Information on debugging BaBar analysis jobs is available
in the HOWTO file HOWTO-Basic-Debugging.
This very useful HOWTO is written for beginners, and contains information
about:
- How to report problems to get help from others
- Descriptions of common types of problems
- Summary of how to use the debuggers
- Other useful tips, tricks and sources of information
The debugger for Linux machines is called gdb. gdb allows you to see what is
going on inside a program while it executes -- or what the program was doing
at the moment it crashed. gdb can do can do four main kinds of things
(plus other things in support of these) to help you catch bugs in the act:
- Start your program, specifying anything that might affect its behavior.
- Make your program stop on specified conditions.
- Examine what has happened, when your program has stopped.
- Change things in your program, so you can experiment with
correcting the effects of one bug and go on to learn about another.
The basic syntax for gdb is one of the following:
gdb program - To debug program.
gdb program core - To debug using the core file, produced when program
was core dumped.
gdb program PID - To debug a running process with process ID number PID.
For more information about gdb, you can look at the man page:
man gdb
or the info page
info gdb
The info page in particular contains a lot of information and
even a sample gdb session. The info page looks like a text document, but in
fact it has links that you can follow to other pages. To navigate
the info page, put the cursor on the menu item that you are interested
in, and press enter. To exit, press q for quit.
Link: Online GDB manual
gdb commands
Here are some of the most frequently needed gdb commands:
| Command |
Description |
| print [x] |
Print the object x |
| break [file:]function |
Set a breakpoint at function (in file).
|
| run [arglist] |
Start your program (with arglist, if specified).
|
| bt
| Backtrace: display the program stack.
|
| print expr
| Display the value of an expression.
|
| c
| Continue running your program (after stopping, e.g. at a breakpoint).
|
| next
| Execute next program line (after stopping); step over any function
calls in the line.
|
| edit [file:]function
| Look at the program line where it is presently stopped.
|
| list [file:]function
| type the text of the program in the vicinity of where it is
presently stopped.
|
| step
| Execute next program line (after stopping); step into any function
calls in the line.
|
| help [name]
| Show information about gdb command name, or general information about
using gdb.
|
| quit
| Exit from gdb.
|
Here is a very quick example of standard use of the debugger on
a Linux machine demonstrating the minimal procedure that you are
likely to use frequently. The particular responses in this section are
from running on analysis-42 on a yakut machine.
To begin, you will deliberately introduce an error into your code.
Open the QExample.cc file that you used in the last
WorkBook section, and comment out the line where the momentum
histogram is initialized:
// _pHisto = manager->histogram("Momentum", 25, 0., 1. );
Now try to recompile and link (with the Debug flag set to maximise the
information we can get when things go wrong):
ana42> gmake cleanarch
ana42> bsub -q bldrecoq -o all-Linux.log gmake all ROPT=-noOptimize-Debug
Since _pHisto is declared
in the header file, the code will compile and link with no
problems. But when you try to run the job, something goes wrong:
workdir> BetaMiniApp snippet.tcl
> mod talk KanEventInput
KanEventInput> input add /store/SP/R18/001237/200309/18.6.0b/SP_001237_013238
KanEventInput> exit
> ev beg -nev 10
The job doesn't even get past the first event! It dies with a message
like:
EmcTrackMatch::EmcPocaMatchMethod.cc(877):Use track match constants from ASCII file.
*** Break *** segmentation violation
*** Break *** segmentation violation
QExample:DrcDetector: number of sets to destroy: 51
The program crashed with a segmentation violation.
A segmentation violation generally means that the program tried to access
something that isn't there. In this case we deliberately created a
common problem - a new histogram is put into the code, declared in the
header, filled in the event() function of the implementation file, but
we haven't actually instantiated it - that is, you have to make the
histogram before you can fill it.
However, in most cases you do not put in an error deliberately,
and have made many small changes to code before checking it. So
the message "segmentation fault" isn't particularly useful
for determining which of your small changes caused the error.
Therefore, we rerun the executable with debugger gdb to find
out exactly where it crashes:
workdir> gdb bin/$BFARCH/BetaMiniApp
(gdb) run snippet.tcl
These first two lines run gdb on the job "BetaMiniApp snippet.tcl".
(Recall that $BFARCH is set up for you when you
type srtpath at the start of a session, and that
workdir/bin is a link to the bin directory in your test release.)
At the framework prompt, input your collection as usual:
> mod talk KanEventInput
KanEventInput> input add /store/SP/R14/001237/200309/14.3.1c/SP_001237_000533
KanEventInput> exit
> ev beg -nev 10
Again we get a crash, but this time with a more helpful message
about where it went wrong:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1218669216 (LWP 17231)]
0x0809fd55 in QExample::event (this=0x107e8d10, anEvent=0x12339100)
at /afs/slac.stanford.edu/u/br/penguin/ana41/BetaMiniUser/QExample.cc:89
89 _pHisto->accumulate( trk->p() );
Current language: auto; currently c++
So the segmentation fault occured at line 89 of QExample.cc. You would
have guessed that the QExample module was responsible
(even if you had not put in the error yourself) since
this is the code that you added, rather than part of the
standard BaBar code that you haven't touched.
To try to get a bit more information, you can ask the debugger
where it was with all the processes it was running when the crash
occured:
(gdb) where
Predictably, one of the places where things were running was:
#0 0x0809fd55 in QExample::event (this=0x107e8d10, anEvent=0x12339100)
at /afs/slac.stanford.edu/u/br/penguin/ana41/BetaMiniUser/QExample.cc:89
confirming our knowledge of where the error occured. To look more
closely, you can enter:
(gdb) frame 0
to look at the particular region where it went wrong:
#0 0x0809fd55 in QExample::event (this=0x107e8d10, anEvent=0x12339100)
at /afs/slac.stanford.edu/u/br/penguin/ana41/BetaMiniUser/QExample.cc:89
89 _pHisto->accumulate( trk->p() );
You still don't know for sure that it was the non-instantiation of the
_pHisto histogram that caused the problem, but you have really
narrowed down the suspects. In this frame, you can also try to
interrogate the objects listed to see if you can get a few more hints:
(gdb) print trk
gives output:
$1 = (struct BtaCandidate *) 0x1380fce0
Which says the object trk is a pointer to a BtaCandidate
and has a sensible memory location - this is good. (Your pointer address is
probably different from mine, but as long as it's not 0x0, a null pointer,
then you're OK.)
So finally we have a look at our histogram:
print _pHisto
Which tells us what's wrong:
$3 = (struct HepHistogram *) 0x0
The code knows that _pHisto is a pointer to a HepHistogram object, but
is has a null memory location.
So now we know where it went wrong, and the task of fixing things is made much simpler.
Now that you know what is wrong, you can quit:
(gdb) quit
The system responds,
The program is running. Exit anyway? (y or n)
Answer "y", and you're out.
A core file is produced when a program exits abnormally
and produces a core dump. When you are core
dumped, you get a file called core.XXXX in your workdir
directory. This core file contains a very detailed record of your
job, up to the point where it crashed.
When you run the above example in my analysis-42 test release, you do not get a core file.
The example below is therefore from analyis-31, which does produce a core file from the above example.
gdb can debug a core file instead of a running job.
For example, returning to the above debugging session, but this time using the core file.
The analysis-31 core file is called core.7670 (your number is probably different).
gdb BetaMiniApp core.7670
Then you can use (almost) all the same commands you used before.
To find out where the error occured:
(gdb) where
Again, you find the error in QExample, although this time it is frame 7 instead of frame 0:
#7 0x0809142d in QExample::event (this=0xeb47530, anEvent=0x10d5a480)
at /afs/slac.stanford.edu/u/br/penguin/ana41/BetaMiniUser/QExample.cc:89
So you look at frame 7:
(gdb) frame 7
#7 0x0809142d in QExample::event (this=0xeb47530, anEvent=0x10d5a480)
at /afs/slac.stanford.edu/u/br/penguin/ana41/BetaMiniUser/QExample.cc:89
89 _pHisto->accumulate( trk->p() );
Current language: auto; currently c++
As before, you investigate the object "trk":
(gdb) print trk
$1 = (struct BtaCandidate *) 0x11ec36d8
Finally, you check the histogram and find your problem, as before:
(gdb) print _pHisto
$2 = (struct HepHistogram *) 0x0
Then exit gdb:
(gdb) quit
The debugger for Sun machines is called dbx.
The syntax of dbx is:
> dbx [object_file [corefile]]
The object_file is the name of the executable object file
that you want to debug. It provides the code that dbx
executes.
The commands and syntax for dbx are similar, but not identical, to those
used for gdb. Here are some of the most common (and platform-independent)
commands:
| Command |
Description |
| help |
Display general help (uses more) |
| help [command] |
Display help for command command |
| run [args] |
Start the program with argument list args |
| pathmap [path] |
Add path to the list of paths in which dbx will look for code |
| file [filename] |
Tells the debugger to look in file filename for code |
| list |
List lines of source code |
| print [x] |
Print the object x |
| stop in [foo] |
Set a break point at the beginning of function foo |
| stop at [line] |
Set a break point at line line |
| assign [x]=[y] |
Set variable x to be y (another variable or a number) |
| next |
Step to the next line (stepping over function calls) |
| step |
Step to the next line (stepping into functions) |
| cont |
Continue to the next stop (e.g. a break-point) |
| where |
Print the current activation levels of a program |
| quit |
Quit debugging session |
For more information, use "man dbx". (Sadly, there does not
appear to be an info page for dbx.)
Note: During a dbx session, the backspace and
delete keys do not work. If you mis-type a command
you have to use CTRL-H instead of backspace.
Page maintained by Adam Edwards
Last modified: January 2008
|