SLAC PEP-II
BABAR
SLACRAL
Babar logo
SPIRES E S & H Databases PDG arXiv
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews
Unwrap page!
Home
Workbook
 1. Introduction
 2. Accnt. Setup
 3. QuickTour
 4. Packages
 5. Modules
 6. Event Info.
 7. Tcl Cmds.
 8. Editing
 9. Comp.+Link
 10. Run the Job
 11. Debugging
 12. Parameters
 13. Tcl Files
 14. Find Data
 15. Batch
 16. Analysis
 17. ROOT I
 18. Kanga
Additional Info.
 Other Resources
 BABAR
 Unix
 C++
 SRT/CVS Cmds.
 SRT/CVS Dev.
 Sim/Reco
 CM2 NTuples
 Root II, III
 PAW I, II
 tcsh Script
 perl Script
Check this page for HTML 4.01 Transitional compliance with the
W3C Validator

(More checks...)

Debugging


Contents


Introduction

A debugger is a utility to deal with run-time errors - errors that you encounter when you try to run your executable. This section will show you how to compile your code in debug mode, and how to use debuggers to trace problems in a program at the source code level.

A debugger enables you to control a program's execution, symbolically monitoring program control flow, variables, and memory locations. You can also use the debugger to trace the logic and flow of control to acquaint yourself with a program written by someone else.

Different machines have different debuggers. The debugger for Linux machines like yakut and noric is called gdb. The debugger for Sun machines like shire is called dbx. This section will give a brief overview of both debuggers.

This section also includes examples of how to use gdb. They assume that you already have checked out analysis-42 and followed the instructions in Example 2 to include a momentum histogram in the QExample module.


Compiling your Code for Debugging

Generally when you compile and link your code for running an analysis job, the code is optimized to run faster. The BaBar packages that you did not check out (but that your code still uses) will have been compiled optimized as well. When you want to debug code it will typically be your own code that will have bugs that you need to find, as most BaBar releases (particularly analysis releases) have been tested by experts. The best way to do this is to compile and link your code with the flags -noOptimize-Debug. This will generally enable the debuggers to pinpoint the exact line in your code where the problem occured. There are two ways to compile and link in debug mode:
  1. When you issue the srtpath command, select the -noOptimize-Debug option instead of the default. For example, if you are logged into yakut and have checked out the release analysis-42, srtpath will give you four options:
    Select/enter BFARCH (CR=1):
    1) Linux24SL3_i386_gcc323                     [prod][test][active][default]
    2) Linux24SL3_i386_gcc323-noOptimize-Debug    [prod]
    3) Linux24RHEL3_i386_gcc323                   [default2]
    
    The default option is option 1, but if you select 2, all of your gmake commands will run in -noOptimize-Debug mode.
  2. Alternatively, you can select just Linux24SL3_i386_gcc323 architecture. Then when you wish to debug your code, you would just issue the gmake commands with ROPT=-noOptmize-Debug, as follows:
    ana42> bsub -q bldrecoq -o all.log gmake all ROPT=-noOptimize-Debug
    
Note that if you have been compiling and linking in Optimized mode, and you want to recompile for debugging, you will need to issue a gmake clean or gmake cleanarch command to flush out the Optimized library and binary files.

HOWTO-Basic-Debugging

Information on debugging BaBar analysis jobs is available in the HOWTO file HOWTO-Basic-Debugging. This very useful HOWTO is written for beginners, and contains information about:
  • How to report problems to get help from others
  • Descriptions of common types of problems
  • Summary of how to use the debuggers
  • Other useful tips, tricks and sources of information

Debugging on Linux: gdb

The debugger for Linux machines is called gdb. gdb allows you to see what is going on inside a program while it executes -- or what the program was doing at the moment it crashed. gdb can do can do four main kinds of things (plus other things in support of these) to help you catch bugs in the act:
  • Start your program, specifying anything that might affect its behavior.
  • Make your program stop on specified conditions.
  • Examine what has happened, when your program has stopped.
  • Change things in your program, so you can experiment with correcting the effects of one bug and go on to learn about another.
The basic syntax for gdb is one of the following:
  • gdb program - To debug program.
  • gdb program core - To debug using the core file, produced when program was core dumped.
  • gdb program PID - To debug a running process with process ID number PID.
For more information about gdb, you can look at the man page:
man gdb
or the info page
info gdb
The info page in particular contains a lot of information and even a sample gdb session. The info page looks like a text document, but in fact it has links that you can follow to other pages. To navigate the info page, put the cursor on the menu item that you are interested in, and press enter. To exit, press q for quit.

Link: Online GDB manual

gdb commands

Here are some of the most frequently needed gdb commands:
Command Description
print [x] Print the object x
break [file:]function Set a breakpoint at function (in file).
run [arglist] Start your program (with arglist, if specified).
bt Backtrace: display the program stack.
print expr Display the value of an expression.
c Continue running your program (after stopping, e.g. at a breakpoint).
next Execute next program line (after stopping); step over any function calls in the line.
edit [file:]function Look at the program line where it is presently stopped.
list [file:]function type the text of the program in the vicinity of where it is presently stopped.
step Execute next program line (after stopping); step into any function calls in the line.
help [name] Show information about gdb command name, or general information about using gdb.
quit Exit from gdb.

Running a quick debug session on yakut to find a segmentation violation

Here is a very quick example of standard use of the debugger on a Linux machine demonstrating the minimal procedure that you are likely to use frequently. The particular responses in this section are from running on analysis-42 on a yakut machine.

To begin, you will deliberately introduce an error into your code. Open the QExample.cc file that you used in the last WorkBook section, and comment out the line where the momentum histogram is initialized:

//  _pHisto = manager->histogram("Momentum",  25,  0.,  1. ); 
Now try to recompile and link (with the Debug flag set to maximise the information we can get when things go wrong):
ana42> gmake cleanarch
ana42> bsub -q bldrecoq -o all-Linux.log gmake all ROPT=-noOptimize-Debug
Since _pHisto is declared in the header file, the code will compile and link with no problems. But when you try to run the job, something goes wrong:
workdir> BetaMiniApp snippet.tcl
> mod talk KanEventInput
KanEventInput> input add /store/SP/R18/001237/200309/18.6.0b/SP_001237_013238
KanEventInput> exit
> ev beg -nev 10
The job doesn't even get past the first event! It dies with a message like:
EmcTrackMatch::EmcPocaMatchMethod.cc(877):Use track match constants from ASCII file.

 *** Break *** segmentation violation

 *** Break *** segmentation violation
QExample:DrcDetector: number of sets to destroy: 51

The program crashed with a segmentation violation. A segmentation violation generally means that the program tried to access something that isn't there. In this case we deliberately created a common problem - a new histogram is put into the code, declared in the header, filled in the event() function of the implementation file, but we haven't actually instantiated it - that is, you have to make the histogram before you can fill it.

However, in most cases you do not put in an error deliberately, and have made many small changes to code before checking it. So the message "segmentation fault" isn't particularly useful for determining which of your small changes caused the error.

Therefore, we rerun the executable with debugger gdb to find out exactly where it crashes:

workdir> gdb bin/$BFARCH/BetaMiniApp
(gdb) run snippet.tcl

These first two lines run gdb on the job "BetaMiniApp snippet.tcl".

(Recall that $BFARCH is set up for you when you type srtpath at the start of a session, and that workdir/bin is a link to the bin directory in your test release.)

At the framework prompt, input your collection as usual:

> mod talk KanEventInput
KanEventInput> input add /store/SP/R14/001237/200309/14.3.1c/SP_001237_000533
KanEventInput> exit
> ev beg -nev 10
Again we get a crash, but this time with a more helpful message about where it went wrong:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1218669216 (LWP 17231)]
0x0809fd55 in QExample::event (this=0x107e8d10, anEvent=0x12339100)
    at /afs/slac.stanford.edu/u/br/penguin/ana41/BetaMiniUser/QExample.cc:89
89          _pHisto->accumulate( trk->p() );
Current language:  auto; currently c++
So the segmentation fault occured at line 89 of QExample.cc. You would have guessed that the QExample module was responsible (even if you had not put in the error yourself) since this is the code that you added, rather than part of the standard BaBar code that you haven't touched.

To try to get a bit more information, you can ask the debugger where it was with all the processes it was running when the crash occured:

(gdb) where
Predictably, one of the places where things were running was:

#0  0x0809fd55 in QExample::event (this=0x107e8d10, anEvent=0x12339100)
    at /afs/slac.stanford.edu/u/br/penguin/ana41/BetaMiniUser/QExample.cc:89
confirming our knowledge of where the error occured. To look more closely, you can enter:
(gdb) frame 0
to look at the particular region where it went wrong:
#0  0x0809fd55 in QExample::event (this=0x107e8d10, anEvent=0x12339100)
    at /afs/slac.stanford.edu/u/br/penguin/ana41/BetaMiniUser/QExample.cc:89
89          _pHisto->accumulate( trk->p() );
You still don't know for sure that it was the non-instantiation of the _pHisto histogram that caused the problem, but you have really narrowed down the suspects. In this frame, you can also try to interrogate the objects listed to see if you can get a few more hints:
(gdb) print trk
gives output:
$1 = (struct BtaCandidate *) 0x1380fce0
Which says the object trk is a pointer to a BtaCandidate and has a sensible memory location - this is good. (Your pointer address is probably different from mine, but as long as it's not 0x0, a null pointer, then you're OK.)

So finally we have a look at our histogram:

print _pHisto
Which tells us what's wrong:
$3 = (struct HepHistogram *) 0x0
The code knows that _pHisto is a pointer to a HepHistogram object, but is has a null memory location.

So now we know where it went wrong, and the task of fixing things is made much simpler.

Now that you know what is wrong, you can quit:

(gdb) quit
The system responds,
The program is running.  Exit anyway? (y or n)

Answer "y", and you're out.

Debugging with a core file

A core file is produced when a program exits abnormally and produces a core dump. When you are core dumped, you get a file called core.XXXX in your workdir directory. This core file contains a very detailed record of your job, up to the point where it crashed.

When you run the above example in my analysis-42 test release, you do not get a core file. The example below is therefore from analyis-31, which does produce a core file from the above example.

gdb can debug a core file instead of a running job. For example, returning to the above debugging session, but this time using the core file. The analysis-31 core file is called core.7670 (your number is probably different).

gdb BetaMiniApp core.7670
Then you can use (almost) all the same commands you used before. To find out where the error occured:
(gdb) where
Again, you find the error in QExample, although this time it is frame 7 instead of frame 0:
#7  0x0809142d in QExample::event (this=0xeb47530, anEvent=0x10d5a480)
    at /afs/slac.stanford.edu/u/br/penguin/ana41/BetaMiniUser/QExample.cc:89
So you look at frame 7:
(gdb) frame 7

#7  0x0809142d in QExample::event (this=0xeb47530, anEvent=0x10d5a480)
    at /afs/slac.stanford.edu/u/br/penguin/ana41/BetaMiniUser/QExample.cc:89
89          _pHisto->accumulate( trk->p() );
Current language:  auto; currently c++
As before, you investigate the object "trk":
(gdb) print trk
$1 = (struct BtaCandidate *) 0x11ec36d8
Finally, you check the histogram and find your problem, as before:
(gdb) print _pHisto
$2 = (struct HepHistogram *) 0x0
Then exit gdb:
(gdb) quit

Debugging on Sun: dbx

The debugger for Sun machines is called dbx.

The syntax of dbx is:

   > dbx [object_file [corefile]]

The object_file is the name of the executable object file that you want to debug. It provides the code that dbx executes.

dbx commands

The commands and syntax for dbx are similar, but not identical, to those used for gdb. Here are some of the most common (and platform-independent) commands:
Command Description
help Display general help (uses more)
help [command] Display help for command command
run [args] Start the program with argument list args
pathmap [path] Add path to the list of paths in which dbx will look for code
file [filename] Tells the debugger to look in file filename for code
list List lines of source code
print [x] Print the object x
stop in [foo] Set a break point at the beginning of function foo
stop at [line] Set a break point at line line
assign [x]=[y] Set variable x to be y (another variable or a number)
next Step to the next line (stepping over function calls)
step Step to the next line (stepping into functions)
cont Continue to the next stop (e.g. a break-point)
where Print the current activation levels of a program
quit Quit debugging session

For more information, use "man dbx". (Sadly, there does not appear to be an info page for dbx.)

Note: During a dbx session, the backspace and delete keys do not work. If you mis-type a command you have to use CTRL-H instead of backspace.


[Workbook Author List] [Old Workbook] [BaBar Physics Book]

Valid HTML 4.01! Page maintained by Adam Edwards

Last modified: January 2008