SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo
HEPIC E,S & H Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews
Unwrap page!
Comp. Search
Who's who?
Meetings
FAQ Homepage
Archive
Environment
Administration
New User Info.
Web Info/Tools
Monitoring
Training
Tools & Utils
Programming
C++ Standard
SRT, AFS, CVS
QA and QC
Remedy
Histogramming
Operations
PromptReco
Simulation Production
Online SW
Dataflow
Detector Control
Evt Processing
Run Control
Calibration
Databases
Offline
Workbook
Coding Standards
Simulation
Reconstruction
Prompt Reco.
BaBar Grid
Data Distribution
Beta & BetaTools
Kanga & Root
Analysis Tools
RooFit Toolkit
Data Management
Data Quality
Event display
Event Browser
Code releases
Databases
Check this page for HTML 4.01 Transitional compliance with the
W3C Validator
(More checks...)

How To Profile Your Code

Profiling your code can be a useful way to find places where you have bottlenecks in your application. In a lot of cases, it's possible to make BaBar applications run significantly faster, just by analyzing where the slowdowns are occurring and optimizing that code. This page describes how to generate, view, and interpret a profile.

Generating a Profile

Generating a profile of your code can be done with a program called callgrind which is a profiling tool included in the valgrind tool suite.

Some profiling tools require the code to be compiled with special flags, callgrind does not. You can use callgrind to profile any executable compiled with any language on a Linux machine. To generate a profile, execute the following command:

valgrind --tool=callgrind ProgramName[program args]
where ProgramName is the execuable you want to profile, and [program args] are the arguments you pass to your program. For more callgrind options, see the callgrind user's manual .

Note that callgrind is very slow. It will take 10x - 40x longer to run your program under callgrind. There's a special queue at slac for running valgrind (and therefore callgrind) jobs, called valgrindq that all users have permission to use.

Viewing and Analyzing the Profile

Once your program finishes running, there will be a file called callgrind.out.< pid > , where pid is the process id of the valgrind run. You can open this file for interactive viewing with a program called kcachegrind. Below is a profile of SkimMiniApp in release 24.4.0.

kcachegrind offers various options for analyzing the profile. You can click on the various tabs to view the information in different ways. This is what some of the tabs do:

  • Incl.: Sort the functions by inclusive cost. The inclusive cost of a function is measured by the total number of cpu operations that occur between entering and exiting the function. The cost of all functions that are called are summed. For a generic C++ program, the function main should have an inclusive cost of ~100%.
  • Self: Sort the functions by self cost. This is a measure of the number of cpu operations that the function itself performs, without taking into account the cost of other functions that are called by the highlighted function. Sorting by self cost is a very useful way to find bottlenecks.
  • Called: The number of times a function is called by the program.
  • Function: The name of the function.
  • Location: The library that contains the function.
  • Call Graph: Produce a callgraph of a specific function, showing all of the functions that call the function in question, and all of the functions that are called by the highlighted function.
  • Callees Show all of the functions that are called by the highlighted function, as well as their inclusive cost and number of times called.
  • Callers: Show all of the functions that call the highlighted function, as well as their inclusive costs and number of times called.

For example, clicking on the self tab of the example profile shown above, and then clicking on TClonesArray::delete in the left pane gives:

We can see that TClonesArray::delete calls many other functions and is called only by KanClonesVector::resetSelf . Since it has a fairly high self cost, this could be a good place to start optimizing.

Profiling sections of the code

In many situations, profiling only a certain part of the program can lead to more useful information, since the profile won't contain any of the function associated with startup. It's also possible to profile fewer numbers of events to get a reasonable profile. For example, in a the SkimMiniApp profile shown above, the profile contains only information from events 100-200.

Callgrind provides c++ macros that can be compiled into the code to switch the instrumentation on or off. When the instrumentation is off, the program runs under callgrind without collecting any profiling information; this makes it run much faster. A list of all of the macros can be found in see the callgrind user's manual . Below shows a small example of how to use them.

Profiling a Section of Code: Example

Imagine a program where the user is calling minuit-> migrad() to minimize a function, and this is taking a long time to run. We want to find out which functions in minuit are causing the slowdown, and don't want any of the initialization in the function.

First, we have to include the file where the callgrind macros are defined:

#include <BbrValgrind/callgrind.h>
Then, in the section of the code that calls minuit, we can bracket the minuit-> migrad() with macros to turn callgrind's instrumentation on before the function is called, and off after the function is finished. Recompile.
  CALLGRIND_START_INSTRUMENTATION;
  minuit2->migrad();
  CALLGRIND_STOP_INSTRUMENTATION;
  CALLGRIND_DUMP_STATS;
Once you've recompiled the code, you can run it under callgrind. This time, however, you want to make sure to pass the --instr-atstart=no flag to callgrind to ensure that profiling data is not collected until callgrind encounters the CALLGRIND_START_INSTRUMENTATION macro that you've compiled in:
valgrind --tool=callgrind --instr-atstart=no ProgramName [program args]
The callgrind macros are very lightweight, and have no effect on the execution of your program when the code is not being run under callgrind. When the program has finished running, you can analyze the resulting profile in kcachegrind, as before.

Things That do not Show Up in the Profiles

There are a couple of things to note when analyzing a profile:
  1. If the compiler inlines a function, it will not appear as a separate entry in the profile. Rather, all of the cost associated with the inlined function will be included in the self cost of the parent function. Note that in C++, the keyword inline is interpreted as somewhat of a suggestion. That is to say, not all functions marked inline will be inlined, and the compiler can sometimes choose to inline functions not marked with the inline keyword.
  2. Callgrind records the number of CPU instructions, not the actual wallclock time spent in a function. If you have a program where the main bottleneck is file I/O, the costs associated with reading and writing files won't show up in the profile, as those are not CPU intensive tasks.

Page author(s): Kyle Fransham
Last significant update: Jun-10-2009