SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo
HEPIC E,S & H Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews
Workbook Home Introduction Account Setup QuickTour Packages Modules Unwrap page!
Event Information Tcl Commands Editing Compile and Link Run the Job Debugging
Check this page with the
W3C Validator
(More checks...)
Parameters Tcl Files Find Data Batch Analysis ROOT Tutorial

Debugging


Contents


Introduction

A debugger is a utility to deal with run-time errors - errors that you encounter when you try to run your executable. This section will show you how to compile your code in debug mode, and how to use debuggers to trace problems in a program at the source code level.

A debugger enables you to control a program's execution, symbolically monitoring program control flow, variables, and memory locations. You can also use the debugger to trace the logic and flow of control to acquaint yourself with a program written by someone else.

Different machines have different debuggers. The debugger for Linux machines like yakut and noric is called gdb. The debugger for Sun machines like shire is called dbx. This section will give a brief overview of both debuggers.

This section also includes examples of how to use gdb and dbx. They assume that you already have checked out analysis-41 and followed the instructions in Example 2 to include a momentum histogram in the QExample module.


Compiling your Code for Debugging

Generally when you compile and link your code for running an analysis job, the code is optimized to run faster. The BaBar packages that you did not check out (but that your code still uses) will have been compiled optimized as well. When you want to debug code it will typically be your own code that will have bugs that you need to find, as most BaBar releases (particularly analysis releases) have been tested by experts. The best way to do this is to compile and link your code with the flags -noOptimize-Debug. This will generally enable the debuggers to pinpoint the exact line in your code where the problem occured. There are two ways to compile and link in debug mode:
  1. When you issue the srtpath command, select the -noOptimize-Debug option instead of the default. For example, if you are logged into yakut and have checked out the release analysis-41, srtpath will give you four options:
    Select/enter BFARCH (CR=1):
    1) Linux24SL3_i386_gcc323                     [prod][test][active][default]
    2) Linux24SL3_i386_gcc323-noOptimize-Debug    [prod]
    3) Linux24RHEL3_i386_gcc323                   [default2]
    
    The default option is option 1, but if you select 2, all of your gmake commands will run in -noOptimize-Debug mode.
  2. Alternatively, you can select just Linux24SL3_i386_gcc323 architecture. Then when you wish to debug your code, you would just issue the gmake commands with ROPT=-noOptmize-Debug, as follows:
    ana41> bsub -q bldrecoq -o all.log gmake all ROPT=-noOptimize-Debug
    
Note that if you have been compiling and linking in Optimized mode, and you want to recompile for debugging, you will need to issue a gmake clean or gmake cleanarch command to flush out the Optimized library and binary files.

HOWTO-Basic-Debugging

Information on debugging BaBar analysis jobs is available in the HOWTO file HOWTO-Basic-Debugging. This very useful HOWTO is written for beginners, and contains information about:

Debugging on Linux: gdb

The debugger for Linux machines is called gdb. gdb allows you to see what is going on inside a program while it executes -- or what the program was doing at the moment it crashed. gdb can do can do four main kinds of things (plus other things in support of these) to help you catch bugs in the act: The basic syntax for gdb is one of the following: For more information about gdb, you can look at the man page:
man gdb
or the info page
info gdb
The info page in particular contains a lot of information and even a sample gdb session. The info page looks like a text document, but in fact it has links that you can follow to other pages. To navigate the info page, put the cursor on the menu item that you are interested in, and press enter. To exit, press q for quit.

Link: Online GDB manual

gdb commands

Here are some of the most frequently needed gdb commands:
Command Description
print [x] Print the object x
break [file:]function Set a breakpoint at function (in file).
run [arglist] Start your program (with arglist, if specified).
bt Backtrace: display the program stack.
print expr Display the value of an expression.
c Continue running your program (after stopping, e.g. at a breakpoint).
next Execute next program line (after stopping); step over any function calls in the line.
edit [file:]function Look at the program line where it is presently stopped.
list [file:]function type the text of the program in the vicinity of where it is presently stopped.
step Execute next program line (after stopping); step into any function calls in the line.
help [name] Show information about gdb command name, or general information about using gdb.
quit Exit from gdb.

Running a quick debug session on yakut to find a segmentation violation

Here is a very quick example of standard use of the debugger on a Linux machine demonstrating the minimal procedure that you are likely to use frequently. The particular responses in this section are from running on analysis-41 on a yakut machine.

To begin, you will deliberately introduce an error into your code. Open the QExample.cc file that you used in the last WorkBook section, and comment out the line where the momentum histogram is initialized:

//  _pHisto = manager->histogram("Momentum",  25,  0.,  1. ); 
Now try to recompile and link (with the Debug flag set to maximise the information we can get when things go wrong):
ana41> gmake cleanarch
ana41> bsub -q bldrecoq -o all-Linux.log gmake all ROPT=-noOptimize-Debug
Since _pHisto is declared in the header file, the code will compile and link with no problems. But when you try to run the job, something goes wrong:
workdir> BetaMiniApp snippet.tcl
> mod talk KanEventInput
KanEventInput> input add /store/SP/R18/001237/200309/18.6.0b/SP_001237_013238
KanEventInput> exit
> ev beg -nev 10
The job doesn't even get past the first event! It dies with a message like:
EmcTrackMatch::EmcPocaMatchMethod.cc(877):Use track match constants from ASCII file.

 *** Break *** segmentation violation

 *** Break *** segmentation violation
QExample:DrcDetector: number of sets to destroy: 51

The program crashed with a segmentation violation. A segmentation violation generally means that the program tried to access something that isn't there. In this case we deliberately created a common problem - a new histogram is put into the code, declared in the header, filled in the event() function of the implementation file, but we haven't actually instantiated it - that is, you have to make the histogram before you can fill it.

However, in most cases you do not put in an error deliberately, and have made many small changes to code before checking it. So the message "segmentation fault" isn't particularly useful for determining which of your small changes caused the error.

Therefore, we rerun the executable with debugger gdb to find out exactly where it crashes:

workdir> gdb bin/$BFARCH/BetaMiniApp
> run snippet.tcl

These first two lines run gdb on the job "BetaMiniApp snippet.tcl".

(Recall that $BFARCH is set up for you when you type srtpath at the start of a session, and that workdir/bin is a link to the bin directory in your test release.)

At the framework prompt, input your collection as usual:

> mod talk KanEventInput
KanEventInput> input add /store/SP/R14/001237/200309/14.3.1c/SP_001237_000533
KanEventInput> exit
> ev beg -nev 10
Again we get a crash, but this time with a more helpful message about where it went wrong:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1218669216 (LWP 17231)]
0x0809fd55 in QExample::event (this=0x107e8d10, anEvent=0x12339100)
    at /afs/slac.stanford.edu/u/br/penguin/ana41/BetaMiniUser/QExample.cc:89
89          _pHisto->accumulate( trk->p() );
Current language:  auto; currently c++
So the segmentation fault occured at line 89 of QExample.cc. You would have guessed that the QExample module was responsible (even if you had not put in the error yourself) since this is the code that you added, rather than part of the standard BaBar code that you haven't touched.

To try to get a bit more information, you can ask the debugger where it was with all the processes it was running when the crash occured:

(gdb) where
Predictably, one of the places where things were running was:

#0  0x0809fd55 in QExample::event (this=0x107e8d10, anEvent=0x12339100)
    at /afs/slac.stanford.edu/u/br/penguin/ana41/BetaMiniUser/QExample.cc:89
confirming our knowledge of where the error occured. To look more closely, you can enter:
(gdb) frame 0
to look at the particular region where it went wrong:
#0  0x0809fd55 in QExample::event (this=0x107e8d10, anEvent=0x12339100)
    at /afs/slac.stanford.edu/u/br/penguin/ana41/BetaMiniUser/QExample.cc:89
89          _pHisto->accumulate( trk->p() );
You still don't know for sure that it was the non-instantiation of the _pHisto histogram that caused the problem, but you have really narrowed down the suspects. In this frame, you can also try to interrogate the objects listed to see if you can get a few more hints:
(gdb) print trk
gives output:
$1 = (struct BtaCandidate *) 0x1380fce0
Which says the object trk is a pointer to a BtaCandidate and has a sensible memory location - this is good. (Your pointer address is probably different from mine, but as long as it's not 0x0, a null pointer, then you're OK.) You can also use the command,
(gdb) print trk->p()
to print the magnitude of the 3-momentum of the track:
$2 = 0.73818585099287926
But that doesn't help much in this case.

(Note: Sometimes the above command doesn't work, and instead returns the message:

(gdb) print trk->p()
Couldn't find method (null)p
I'm not sure why that is.)

So finally we have a look at our histogram:

print _pHisto
Which tells us what's wrong:
$3 = (struct HepHistogram *) 0x0
The code knows that _pHisto is a pointer to a HepHistogram object, but is has a null memory location.

So now we know where it went wrong, and the task of fixing things is made much simpler.

Now that you know what is wrong, you can quit:

(gdb) quit
The system responds,
The program is running.  Exit anyway? (y or n)

Answer "y", and you're out.

Debugging with a core file

A core file is produced when a program exits abnormally and produces a core dump. When you are core dumped, you get a file called core.XXXX in your workdir directory. This core file contains a very detailed record of your job, up to the point where it crashed.

When I ran the above example in my analysis-41 test release, I did not get a core file. However, in a previous version of the Workbook based on analysis-31, I was core dumped. The example below is therefore from analyis-31.

gdb can debug a core file instead of a running job. For example, let's rerun the above debugging session, but this time using the core file. My core file is called core.7670 (your number is probably different), so I enter:

gdb BetaMiniApp core.7670
Then you can use (almost) all the same commands you used before. To find out where the error occured:
(gdb) where
Again, you find the error in QExample, although this time it is frame 7 instead of frame 0:
#7  0x0809142d in QExample::event (this=0xeb47530, anEvent=0x10d5a480)
    at /afs/slac.stanford.edu/u/br/penguin/ana41/BetaMiniUser/QExample.cc:89
So you look at frame 7:
(gdb) frame 7

#7  0x0809142d in QExample::event (this=0xeb47530, anEvent=0x10d5a480)
    at /afs/slac.stanford.edu/u/br/penguin/ana41/BetaMiniUser/QExample.cc:89
89          _pHisto->accumulate( trk->p() );
Current language:  auto; currently c++
As before, you investigate the object "trk":
(gdb) print trk
$1 = (struct BtaCandidate *) 0x11ec36d8
But this time you can't print the track's momentum, because the job is not running:
(gdb) print trk->p()
Couldn't find method (null)p
Finally, you check the histogram and find your problem, as before:
(gdb) print _pHisto
$2 = (struct HepHistogram *) 0x0
Then exit gdb:
(gdb) quit

Debugging on Sun: dbx

The debugger for Sun machines is called dbx.

The syntax of dbx is:

   > dbx [object_file [corefile]]

The object_file is the name of the executable object file that you want to debug. It provides the code that dbx executes.

dbx commands

The commands and syntax for dbx are similar, but not identical, to those used for gdb. Here are some of the most common (and platform-independent) commands:
Command Description
help Display general help (uses more)
help [command] Display help for command command
run [args] Start the program with argument list args
pathmap [path] Add path to the list of paths in which dbx will look for code
file [filename] Tells the debugger to look in file filename for code
list List lines of source code
print [x] Print the object x
stop in [foo] Set a break point at the beginning of function foo
stop at [line] Set a break point at line line
assign [x]=[y] Set variable x to be y (another variable or a number)
next Step to the next line (stepping over function calls)
step Step to the next line (stepping into functions)
cont Continue to the next stop (e.g. a break-point)
where Print the current activation levels of a program
quit Quit debugging session

For more information, use "man dbx". (Sadly, there does not appear to be an info page for dbx.)

Note: During a dbx session, the backspace and delete keys do not work. If you mis-type a command you have to use CTRL-H instead of backspace.


Debugging with dbx

Note: This example is temporarily out of order. It worked in analysis-31 but I have run into problems trying to reproduce the result in analysis-41. I will try to fix it asap.

If you followed the example in the Linux-gdb section, then you have already introduced an error into your QExample code. If not, then you need to do so now: In QExample.cc, comment out the _pHisto declaration:

//  _pHisto = manager->histogram("Momentum",  25,  0.,  1. ); 

If you have been following the Workbook, then you have probably done all of your work so far on yakut, which is a scientific Linux machine. So before you can debug in Sun, you will need to compile and link in Sun. Login to shire (a Sun machine), and then do:

ana41> srtpath <enter> <enter>
ana41> gmake cleanarch
ana41> bsub -q bldrecoq -o all-Sun.log gmake all ROPT=-noOptimize-Debug

The cleanarch command removes older Sun libraries and binaries, if there are any. Note, however, that it does not clean out any Linux files. It cleans out only files for the current architecture (as set by srtpath).

   workdir> dbx ../bin/$BFARCH/BetaMiniApp
d This gives you an information screen, which you get rid of by typing "q":
   q 
More information scrolls by, until finally you get your dbx prompt:
(/opt/SUNWspro/bin/../WS6U1/bin/sparcv9/dbx)
From now on I will abbreviate the prompt to (dbx). Now you are ready to start:
   (dbx) run snippet.tcl
Then, after some initial output, you get your usual framework prompt, and talk to KanEventInput:
> mod talk KanEventInput
KanEventInput> input add /store/SP/R14/001237/200309/14.3.1c/SP_001237_000533
KanEventInput> exit
> ev beg -nev 10
After some more output, the job stops with:
t@1 (l@1) signal SEGV (no mapping at the fault address) in QExample::event at line 89 in file "QExample.cc"
   89       _pHisto->accumulate( trk->p() );

As in gdb, now you ask for more information:

(dbx) where

Sure enough, one of the places where the job is stuck is:

=>[1] QExample::event(this = 0xa021fc0, anEvent = 0x11aa56c8), line 89 in "QExample.cc"

The frames command in dbx work a bit differently than the one in gdb. To see what frame you're in right now:

frame
1

You are in frame 1, which is where you want to be. If you wanted to go to frame 4 (say), you'd have to take one step forward:

frame +3
0x02cd9b74: continueHandler+0x028c:     jmpl    %l0, %o7

But right now you want to be in frame 1, so take 3 steps backward:

frame -3
Current function is QExample::event
   89       _pHisto->accumulate( trk->p() );

The other commands are pretty much the same as in gdb:

(dbx) print trk
trk = 0x14409910

(dbx) print trk->p()
trk->p() = 0.73818585099288

(dbx) print _pHisto
_pHisto = (nil)
Once again, you discover that _pHisto is a null pointer. Having found the problem, you can end your dbx session:
(dbx) quit

Running with dbx

In this example, you will use dbx to investigate what's going on at run-time in an application that does not have errors. The following is an example on how to start a dbx session, set a couple of breakpoints and print the value of a variable.

First, remove the error that you put in QExample.cc. That is, uncomment the _pHisto declaration so that you once again have:

  _pHisto = manager->histogram("Momentum",  25,  0.,  1. ); 
in the beginJob function. Then, of course, you need to recompile and relink your code:
ana41> gmake cleanarch
ana41> rm all-Sun.log
ana41> bsub -q bldrecoq -o all-Sun.log gmake all ROPT=-noOptimize-Debug

Once you have your new, bug-free BetaMiniApp, begin your dbx session:

   workdir> dbx ../bin/$BFARCH/BetaMiniApp
and type "q" to get rid of the information screen, as before.

This time, before you run the program, set a break point at the function event of QExample:

   (dbx) pathmap ../BetaMiniUser/
   (dbx) file QExample.cc
   (dbx) stop in QExample::event
   (2) stop in QExample::event(AbsEvent*)
Now when you run the program with dbx, it will stop at the break point.

Now run the program:

   (dbx) run snippet.tcl
and talk to KanEventInput as usual:
> mod talk KanEventInput
KanEventInput> input add /store/SP/R14/001237/200309/14.3.1c/SP_001237_000533
KanEventInput> exit
> ev beg -nev 10
After some more output, as instructed, dbx stops the job at the beginning of the event function:
t@1 (l@1) stopped in QExample::event at line 79 in file "QExample.cc"
   79     HepAList<BtaCandidate>* trkList  =
Now you can explore:
(dbx) list +15

   79     HepAList<BtaCandidate>* trkList  =
   80      Ifd<HepAList< BtaCandidate > >::get(anEvent, "ChargedTracks");
   81
   82     //histogram number of tracks in event
   83     _numTrkHisto->accumulate( trkList->length() );
   84
   85     // Loop over track candidates to plot momentum 
   86     HepAListIterator<BtaCandidate> iterTrk(*trkList);
   87     BtaCandidate* trk(0);
   88     while ( trk = iterTrk()) {
   89      _pHisto->accumulate( trk->p() );
   90     }
   91
   92    // done
   93    return AppResult::OK;

(dbx) next

t@1 (l@1) stopped in QExample::event at line 83 in file "QExample.cc"
   83     _numTrkHisto->accumulate( trkList->length() );

(dbx) print trkList
trkList = 0x15bf1290

(dbx) print *trkList
*trkList = {
/* try using "print -r" to see any inherited members */
 }

(dbx) print -r *trkList
*trkList = {
    HepAList<BtaCandidate>::HepAListBase::p = 0x15bf7258
    HepAList<BtaCandidate>::HepAListBase::n = 10
    HepAList<BtaCandidate>::HepAListBase::s = 14
}

(dbx) stop at 88
(3) stop at "QExample.cc":88

(dbx) cont
t@1 (l@1) stopped in QExample::event at line 88 in file "QExample.cc"
   88      while ( trk = iterTrk()) {

(dbx) status
 (2) stop in QExample::event(AbsEvent*)
*(3) stop at "QExample.cc":88

(dbx) next
t@1 (l@1) stopped in QExample::event at line 89 in file "QExample.cc"
   89       _pHisto->accumulate( trk->p() );

(dbx) print trk->p()
trk->p() = 0.73818585099288

(dbx) delete 2

(dbx) status
(3) stop at "QExample.cc":88

(dbx) quit

Back to Workbook Front Page

Send comments to Workbook Team.