Editing Code
This section is intended to teach the user just enough
about C++ to be able to make minor changes to existing
code. A more detailed discussion of writing C++ is reserved for
the later section, Writing Code.
The links and references at the bottom of this page are useful resources
for learning C++ more thoroughly.
Some of the conceptual information and terminology defined
in the previous
Object Orientation section might
be useful background reading for this section.
You can edit C++ and tcl code on any text editor. Popular choices
are xemacs, or emacs.
emacs "knows" about different types of code, so if you name your
file "foo.cc" or "foo.hh" then it will "know" that it is C++, and if
you call it "foo.tcl" then it will "know" that it is tcl. emacs will
then adopt a mode which applies a colour coding to different aspects
of the file, a process called "fontifying". It can also be used to
spot errors like failure to close brackets or to appropriately indent
code (as in Fortran mode). For example, on my emacs editor, the
first part of the Quicktour's NTrkExample.cc looks like this:
// in general, a module constructor should not do much. The begin(job) or
// begin(run) members are better places to put initialization
NTrkExample::NTrkExample( const char* const theName,
const char* const theDescription )
: AppModule( theName, theDescription )
, _btaChargedList("trackCandidates", this, "ChargedTracks")
{
}
Other text editors you could try include pico (the
underlying editor for the pine email program),
kwrite, nedit, kate, which are
pretty straightforward. Less intuitive to use is vi.
The xemacs OO-Browser is a source code browser for developers.
It allows easy navigation between the header files and
implementation files. In addition, it can display the inheritance
trees and with simple mouse-clicks, it can edit the corresponding
header files. Note that it can be a bit problematic to run on some
networks, particularly SunOS.
See the detailed page, The xemacs OO-Browser.
(This page hasn't been updated for CM2, but the results are
independent of computing model.)
Basic Syntax
C++ code is case sensitive. Two variables named Test and test
are differentiated between. However, the use of variable or
function names that differ only by case is strongly discouraged.
Variable and function names may consist of all alphabet
characters (upper and lower case). Variable names must begin with
either a letter or the underscore character '_'. Most of the C++
language consists of lower case letters comprising key words and
punctuation.
Comments are indicated by a double slash '//'. All text from
the double slash to the end of the line is a comment and is
ignored by the compiler. An alternate syntax uses '/*' to
begin a comment and '*/' to end a comment. This style of
commenting is supported to be compatible with C code. The
double slash is a preferred convention in BaBar C++ code.
A declaration is a statement that introduces a
name into a program. It consists of four parts, two
optional and two mandatory. The general structure is:
[specifier] <base type> <declarator> [initializer];
When the optional initializer is included the statement is
both a declaration and a definition. With the
exception of function definitions, declarations are ended
with a semicolon.
int x; //declaration
int x = 5; //declaration and definition
The '= 5' in the last statement is the
initializer. The equals sign '=' is the
assignment operator.
A function declaration has the format:
<return type> ClassName::FunctionName(<type> <name> , <type>
<name> , .....);
The type and name pairs in parenthesis are arguments of the
function. When a function has more than one argument they
are separated by commas. If a function has no arguments
then the parenthesis are still included but with nothing in
between. When a function does not return any values or objects
the return type is void. This declaration syntax
may also be referred to as the function prototype.
The definition of a function typically consists of a series
of statements. The code that comprises the function's
definition is demarcated by curly braces. There is no
semicolon at the end of a function's definition.
For example:
int ExampleFunction(int i){
int j; //declaration of j
statement one;
statement two;
...
return j;
}
ExampleFunction takes one argument, an integer. Within the
body of the function an integer named j is declared. Some
calculations using the argument integer i and the function's
integer j are performed. The final value of j is returned
by ExampleFunction.
Typically a statement is one line of code, in
either the body of a program or the definition of a function.
Statements are a base unit of code and in C++, as in C, they
must be terminated by a semicolon. A statement may fulfill
one of many roles: declaration, definition, call to a
function, allocation of memory, assignment, calculation,
and so forth.
There are two ways to use your new function, the main one being to
make an object (an "instantiation") of the class:
ClassName myClassObject;
Then call that function through your new instance of the class:
int returnedinteger = myClassObject->ExampleFunction(3);
where in this example, the integer 3 is passed to the example
function, and the new integer "returnedinteger" is assigned the
returned value of the function.
Data Types, Pointers, and Arrays
When a declaration is made, memory is set aside for the
declared variable. This compiler needs a type
associated with all memory so that it can properly interpret
the stored data. The built in memory types of C++ are: int
(integer), double (floating point), bool (boolean), and char
(character).
In addition to memory being one of these types, memory can be
defined as a pointer to a type. A variable declared as a pointer
is interpreted as an address of memory. The syntax of this
declaration is to append an asterix '*' to the type. For example:
int* x;
The variable x can have the address of an integer
stored in its associated memory. It is important to recognize
that a declaration will set aside, allocate, memory for
a pointer. The memory for the data type pointed to must be
allocated and defined explicitly. This is called initialization.
To initialize or access the value of a pointer one must dereference
the memory pointed to with the asterix '*', and the pointer itself
with the ampersand '&'. For example:
*x = 5; //assign 5 to the memory x points to
int y; //declare an int named y
y = *x; //assign y the value x points to (both ints)
y = &x; //assign y the value stored in x, an address
A built in data structure of C++ is the array. An array
is a block of memory set aside for the contiguous storage of
like elements. The declaration of an array must include the
type and number of elements to be stored. An array is indicated
by appending a set of square brackets '[ ]' to the variable name.
For example:
int myarray[10]; //declare array of 10 integers
To define any element of the array, one may use the assignment
operator and place the element index between the square brackets.
Array indexing begins with zero, 0. Thus, for an array of ten
integers valid indices are 0 through 9.
myarray[4] = 6; //assign 6 to the 5th element
Class Syntax
User-defined data types is central to the design of C++. The most
common unit of user-defined data in C++ and BaBar code specifically
is the class.
To create a class a user must define it to the compiler. A class
is identified by the name determined in the declaration and
consists of data members and member functions
(also called methods).
The syntax of this declaration is:
class ClassName{
data members and member functions
};
This defines a new type and is thus referred to as a class
definition. However, historically and analogous to general
declarations, it is also called a class declaration.
To add to the linguistic confusion, the implementation of the
class (the code that defines each member function), is also
called the class definition. For consistency I will refer to
this initial step as the declaration and the implementation as
the definition.
The data members and member functions of a class can be placed in
one of three categories:
- public
- accessible to all code
- protected
- accessible only to friends of this class, classes and
friends of classes that inherit from this one
- private
- accessible only to classes that inherit from this one
Typically data members that store the state of the object should
be private. This protects the implementation of the class. Though
users can read the declaration of a class, client code is not
allowed to access it, and the data is protected from being
altered. The functions supporting the class, those that make it
a useful entity for applications, should be made public.
Data members of a class are declared with the same syntax as
variables. Similarly, member functions are declared with the
same syntax as general functions. However, the name of a member
function is ClassName::FunctionName. The '::'
character is called the scope resolution operator. It
indicates that FunctionName is a function of the
ClassName class. Multiple classes with same function
names do not give rise to conflicts or ambiguity. When a member
function is called, it is accessed using the scope resolution
operator on the object. For example:
ClassName example; //declare an object named example, type ClassName
example::FunctionName(); // call the function FunctionName associated
// with ClassName to act on example
Two special member functions are the constructor, same name
as the class, and the destructor , class name prepended with
a tilde '~' character. The constructor is called whenever a
variable of its class type is declared. Similarly the destructor
is called whenever an object of the class type needs to be deleted.
When the class is declared, the member functions and data members
are also declarations. The body of the class declaration is
within curly braces and is terminated with a semicolon. The
class definition provides each member function's definition.
The definition of each member function is contained within
curly braces and is not terminated with a semicolon.
Here is an example class declaration and
class definition.
Loops and Conditional Statements
C++ offers a set of characters for performing comparison. If the
relationship is satisfied then a boolean (bool) true is returned,
else a bool false is returned.
| Operator | Definition |
| = = | equal to |
| ! = | not equal |
| > | greater than |
| > = | greater or equal |
| < | less than |
| < = | less or equal |
Some built in C++ statements take boolean values as their argument.
If the conditional argument is true the rest of the statement will
be executed if the conditional argument is false then it is not.
When the statement to be executed only exists of one statement, that
statement is terminated with a semicolon. When there are multiple
statements to be executed, they are bounded by curly braces. Each
statement in the statement body is terminated
by a semicolon, whereas the body itself is not (no semicolon after
the closing curly brace). Perhaps the most common conditional
statements for analysis are the if and the while statements.
If statements are useful for execution of a block of code once
after a test has been satisfied. For example, only if a particle
is within a mass range should analysis be continued. The conditional
test is performed, if it evaluates to true the statement body is
executed. In the if/else block if the condition evaluates to false
then the else body of statements is executed.
if (condition) statement;
if (condition) {
statements....
}
if (condition) {
statements...
} else {
statements...
}
While statements are used when a block of code should be executed as
long as the test/conditional argument is true. If the condition
evaluates to true then the statements in the body are executed.
Execution of code then returns to the condition and evaluates it
again. This sequence will continue until the condition evaluates
to false. While loops are very useful for running quick checks
and making plots.
while (condition) statement;
while (condition) {
statements...
}
A common source of errors, bugs, is the accidental replacement
of the assignment operator with the boolean 'is equal' comparator and
vice versa.
General Code Structure
While loops and if statements can be used on their own, as
part of a function, and be nested within
themselves and each other. The body of code associated with
each statement is marked by curly braces. Each open curly
brace must be matched with a closed curly brace. When
there is nesting the entire nested statement must be
within the body of the outer statement. In a multiply nested
sequence the closed braces will be associated in reverse order
from the open braces. That is the first open curly brace will
be matched by the last closed curly brace.
When a variable is declared memory is allocated for it (in
an area of memory called the stack). The
scope of a variable begins at its declaration
and persists until it is either explicitly deleted or the
body of code in which it was created has finished executing.
For example, a variable declared within a member function is
allocated at the time of its declaration. When execution of
the code moves past the closing curly brace of the function
the variable is said to have gone out of scope. When this
happens the memory that was allocated to the variable is no
longer reserved. That memory can now be reused by the operating
system and the variable that has gone out of scope should not
be accessed.
It is important to keep track of the body delimiters of these
statements for compile purposes, for illuminating the scope
of variables, and for making code readable. A standard
convention, also used in BaBar code, is the use of indentation
when nesting occurs. All code in the body of a statement should
be indented systematically. The amount of indentation should
correspond to the level of nesting. The curly brace closing the
body of a statement should be placed on its own line at the
same level of indentation as the opening part of the statement.
Look
here for an example.
Iterators
Sequential storage of data is a common occurrence. Often general
packages of code or libraries make available typical structures
such as arrays, lists, and vectors. For these containers
to be functional, a user must be able to transverse the elements,
often in a systematic or all inclusive manner. At the same time
the encapsulation of implementation details must be preserved.
The notion of an iterator satisfies both of these
requirements.
An iterator is an abstraction of a pointer to an element. It is
typically implemented as a class or function associated with a
given container class. The iterator points to one element in the
sequence and has access to the information needed to move to the
next element of the sequence. It also has access to information
that will determine the end of the data sequence. Concepts supported
by a general iterator are the idea of the current element, next
element/incrementation, and equality/comparison.
In BaBar each reconstructed event contains many lists of like data,
for example lists of pions, charged tracks, and so forth. Iterators
used in conjunction with loops facilitate execution of a segment of
analysis code on each element in a list. For example, an iterator is
used to access a charged track in an event's list, and then the
momentum is plotted in a histogram. This sequence continues to loop
until each track of the list has been plotted.
Program Organization
For large programs it is not reasonable for all of the code to exist
in one file. This is due to readability, maintenance, and primarily
compile time. If all of the code were in one unit, even the smallest
change would require re-compilation of all code.
To avoid this very costly dependence, code is partitioned into a set
of coherent modules. The physical structure, the system of
code files, is likely to reflect the logical structure of
the program.
The many units of a source code in a large program must be
mutually consistent. For one, types in declarations must be
uniform throughout all units of code. A primary method of
accomplishing this is to gather all declarations and interface
information into one place, a header file, while placing
the definition code into an implementation file.
Header File
Header files will contain the declarations an implementation file
wants to make available to other units of code. The standard code
that a header file should include are type definitions, function
declarations, and name declarations. By BaBar convention header
files have names with the suffix '.hh'
Units of code, files, access the code declared in a header file
by using a preprocessor include command. The syntax is:
#include "<header file name>"
Before code is compiled the preprocessor will prepend a copy of
the header file in any file that has included it. The final
executable usually needs only one compilation of a header file,
even though that header file may be included in many code files.
To prevent unnecessary compilation of header file code the
following macro syntax is used.
#ifndef <definevalue>
#define <definevalue>
...header file contents
#endif
The first time the compiler sees the header file code it is compiled
and internally assigned a value. When the compiler comes to the header
file again, it is already defined so everything between the ifndef and
the endif is ignored. BaBar convention sets
definevalue to the name of the header file in
all capital letters ended by _HH. For example, the
NTrkExample.hh file is defined NTRKEXAMPLE_HH.
Implementation File
All of the source code for the implementation and definition
of a header file's declarations is placed in an implementation
file. Complete function definitions should be placed in the
implementation file. By BaBar convention implementation files
have names with the suffix '.cc'.
The implementation must have access to the declarations and
types that it defines, so it must include its own header file.
Standard Libraries
Standard libraries are included with the C++ language to provide
commonly used and needed functions and types. Accessing the code
of a standard library is analogous to using user-written source
code. Any code making use of a standard library must include
it. The include syntax is the same as for including header files
except the standard library name is enclosed by angle brackets
instead of double quotes.
#include <<library name>>
Classes
In BaBar analysis software each analysis module is implemented
as a class. Each module/class has an associated header and
implementation file dedicated to its definition and implementation.
The important member functions of
a module/class for analysis work are the constructor,
begin() and end() job, and event() functions. The role of these
functions has been covered in a previous chapter:
Framework: the
Environment for Physics Event Reconstruction
Packages
BaBar (reconstruction and simulation) software is organized into
packages. A package is a self-contained piece of software intended
to perform a well defined task, eg. find calorimeter clusters,
simulate the drift chamber response. Each package has a unique name
and its own library and include files. Some packages may not be
usable on their own, requiring integration with others, for example
the individual subsystem simulation packages which together form the
Geant simulation of BaBar.
The BaBar environment offers a facility to book (that is, create)
histograms in C++. The package which allows one to perform this task
is HepTuple. In particular, the HepTuple package includes
the histogram class HepHistogram.
The analysis module NTrkExample class (from the sample
analysis job) books a histogram of the number of tracks per event.
If an analysis module is to use classes from the HepTuple histogramming package,
its header file must declare the collaborating classes. In NTrkExample.hh,
you have:
//------------------------------------//
// Collaborating Class Declarations //
//------------------------------------//
class HepHistogram;
An analysis module that will book a histogram needs to
have a (preferably private) histogram data member.
HepHistogram* _numTracksHisto;
The beginJob method of the analysis module (in the .cc
file) contains the code that calls the histogram manager to define the
histogram data member. To make use of the C++ HBOOK code the source
file of the analysis module needs to include the defining header
files:
#include "HepTuple/TupleManager.h"
#include "HepTuple/Histogram.h"
From within the beginJob function a histogram manager
needs to be declared via:
HepTupleManager* manager = gblEnv->getGen()->ntupleManager();
and then used to book a histogram, which initializes the
histogram data member of the class.
_numTrkHisto = manager->histogram("Tracks per Event", 20, 0., 20. );
The histogram is declared with four arguments:
- the title ("Tracks per Event"),
- the number of bins (20),
- the lower limit of the histogram (0.), and
- the upper limit of the histogram (20.).
This completes the declaration and the definition of the histogram.
The histogram can be filled for each event (from within the
event function of the analysis module)
with a call to the accumulate member function
of the histogram object.
_numTrkHisto->accumulate( trkList->length() );
The default name for the output file from MyMiniAnalysis.tcl is
MyMiniAnalysis.root.
If you want a different name you will need make changes
from the framework. For example, in the Quicktour you overrode
this default with:
set histFileName myHistogram.root
in your snippet.tcl file.
Annotated Quick Tour Analysis Code
As a first step in becoming familiar with some of the analysis code,
I have annotated the C++ header and source files for the NTrkExample
analysis module. This module is appended to the MyAnalysis sequence in
the quick tour analysis. This module is used to generate the the number
of tracks per event histogram. The comments inserted for these purposes
are in blue. Everything else is as these files will be found in their
respective directories (circa Jan 2006).
Begin with the NTrkExample.hh
file and note the use of the HepTuple histogram package.
Using Loops and Lists to Plot a Histogram
With the future intention of modifying the NTrkExample module, let's
compose a segment of code that would histogram the momentum of charged
tracks in an event. To begin with we need a list of Beta candidates.
// get list of input track candidates
HepAList<BtaCandidate>* trkList =
Ifd<HepAList< BtaCandidate > >::get(anEvent, _btaChargedList.value());
Thus declares such a list named trklist which
is a pointer to a HepAList. Then it puts a call to the
function Ifd<HepAList<BtaCandidate> >::get and passes it
the event, the previously declared pointer, and a key word.
A BaBar strategy for objects with multiple data members of a type
such as an event with many lists of Beta candidates is to use key words to
differentiate. This function call will initialize the
trkList variable to the event's list associated with the
key word returned by _btaChargedList.value( ).
Now for each event, we add the number of tracks in
that event to the histogram:
_numTrkHisto->accumulate( trkList->length() );
Booking Another Histogram
At this point it is a relatively straightforward task to
add another histogram to the quick tour analysis. You have
two options, to modify the existing NTrkExample module
class or to create a new module class. Creating a new
module class involves a few more steps but is likely to
be useful information. To create a new module class
involves the following steps: create a header and
implementation file, add the histogram code, and load the new
module into the framework.
It is simplest to create a header and implementation file
by copying a template. To start you can copy the
NTrkExample .cc and .hh to files called PExample
(.cc and .hh respectively), or any other name you wish.
You will need to replace all instances of NTrkExample
with PExample (or your chosen name). Most importantly,
this will include the preprocessor definition name in
the header file, the #included (self) header file name in
the implementation, and all instances in member functions
in both the header an implementation file.
Once the name has been consistently modified, add the new
code. For the data to persist over multiple events the
histogram needs to be of a greater scope than the event( )
function. The logical and conventional place to add this
histogram is as a data member of the PExample class.
The PExample class is declared in the header file and
this is where the modification should be made. To do this
you need only add the line:
HepHistogram* _pHisto;
as a private data member.
A PExample object will now have a pointer to two
HepHistograms at the time of instantiation. Only the
pointer memory has been allocated. Before the histogram
is used the pointer must be initialized. This should be
done by adding the following lines to the definition of
the beginJob( ) member function of the PExample class
(in the PExample.cc file):
HepTupleManager* manager = gblEnv->getGen()->ntupleManager();
assert(manager != 0);
In this case these lines already exist in the beginJob( )
member function from the NTrkExample class that we copied.
They do not need to be added again. Once a
HepTupleManager exists, you can ask it to book a new
histogram on your behalf.
_pHisto = manager->histogram("Momentum", 25, 0., 1. );
The first argument (in quotes) is the title of the
histogram, the second argument (an integer) is the number
of bins, the third and fourth arguments (doubles) are the
low and high values of the x-axis.
The block of code developed in the previous section will histogram the
momentum per track given an event. Now you will add a code segment to the
definition of the PExample::event( ) member function in PExample.cc
to ensure that it is executed on every event in the analysis job.
In your number-of-tracks histogram, all you needed was the length of
the tracks list, which is a property of the tracks list as a whole. But
momentum is a property of a single track. So next we need to declare an
iterator associated with the list of Beta candidates, and a pointer to a
Beta candidate:
// Loop over track candidates to plot momentum
HepAListIterator<BtaCandidate> iterTrk(*trkList);
BtaCandidate* trk;
while ( 0 != ( trk = iterTrk()) ) {
_pHisto->accumulate( trk->p() );
}
This loops over each member of *trkList, and
adds its momentum to the histogram _pHisto.
The block of code added to the PExample::event() function makes use of
some functions that were not used by the NTrkExample class. Whenever you make
use of new code, you need to verify that the defining header file has been
included by the current .cc file. In this example a HepAListIterator has been
introduced into the PExample module. For the PExample module to compile the
following line must be added to the top of the .cc file (along with the many
other included files).
#include "CLHEP/Alist/AIterator.h"
CLHEP, Class Library for High Energy Physics, is a package that contains
general utility classes. If you are looking for the home of a class or
function named HEPsomething, a good place to start is in this package.
(analysis-30 packages are all located in $BFROOT/dist/releases/analysis-30.)
In addition to writing the module class code you will also
need to modify AppUserBuildBase.cc and MyMiniAnalysis.tcl
so that the new PExample module is available to the framework.
To do this, once again you need to replace the lines with
"NTrkExample" with the corresponding "PExample" lines.
First, in AppUserBuildBase.cc you need to #include the header
file for the module:
#include "BetaMiniUser/PExample.hh"
Then, to create and load the module in the framework include the
following line in the constructor of the AppUserBuildBase class:
theBuild->add(new PExample("PExample", "Workbook example module"));
Don't forget to append the module to your analysis path,
which is defined in the MyMiniAnalysis.tcl file. A line
similar to the following should accomplish this.
path append Everything PExample
Once a module is available to the framework
(ie. it has been written, compiled, and loaded via the AppUserBuildBase
class) it can be easily included or excluded in your analysis by
making changes to the .tcl file.
Working examples of the PExample .hh, .cc, and
AppUserBuildBase.cc files are in the
WorkBook's PExample directory, at
$BFROOT/www/doc/workbook/NewExamples/PExample/
The C++ code must be re-compiled and re-linked (gmake all)
before the changes will be incorporated into the executable. You
will do this in the next section of the workbook:
Compile, Link, and Run.
General Related Documents:
Author:
Tracey Marsh
Contributors:
Joseph Perl
James Weatherall
Last updated: Jenny Williams
Last modification: 1 July 2005
Last significant update: 7 June 2005
|