Offline Programming Recommendations: OO/C++
On this page:
[Kinds of Classes]
[Design Issues]
[Documentation Issues]
[Implementation Issues]
Other pages:
[Background Info]
Recent Changes
- Jan 27, 1999 -- Created by merging older pages into new web form (Bob)
A class should be of at most one of the following types:
-
Modules (subclasses of AppModule): These are the unit of execution and
processing. A module gets objects from the event, maniplates them, and
puts new objects back to the event. Its essentially pure algorithm, with
only enough internal state to handle configuration and control
-
Event data classes (subclasses of AbsEvtObj): All transient objects
that can be in the event; the stuff that is manipulated by modules. This
includes tracks, clusters, etc.
-
Classes for event persistance: The persistant classes that store
representations of (2), plus the various Convertor classes that manipulate
them
-
Conditions data classes (subclasses of AbsEnvObj): All transient
objects that represent "conditions" data, such as calibrations, alignments,
etc.
-
Classes for conditions data persistance: The persistant classes that
store representations of (4), plus the various proxies, Cal* classes, etc
that animate them.
Of course, simple classes for internal use may not fit in any
of the above categories.
- Include files in C++ have the extension ".hh". Similarly,
include files in C have the extension ".h".
- Implementation files in C++ have the extension ".cc".
In C they have the extension ".c".
- Dont use inline definition files (e.g. files with
the extension ".icc"
Such files contain inline definitions for simple function members)
(e.g. data member accessors) and are themselves included
into the class header (.hh) file.
We have tools for navigating C++ source that work best when
everything is found in .cc and .hh files.
Remember that the inline keyword is only a hint to the
compiler and there is no guarantee that a complex function will
actually be expanded inline.
- Every include file should contain a mechanism to prevent multiple
inclusions. For the file XxxMyFile.hh, the BaBar convention is:
#ifndef XXXMYFILE_HH
#define XXXMYFILE_HH
[....]
#endif
This should be put at the top of the file, before any opening comments, to
allow the compiler to be smarter about how much of the .h file to read.
- Use a separate .cc file, and corresponding .hh file, for each
C++ class. The filename should be identical to the class name.
- Do not create class names (and therefore filenames) that differ
only by case.
- Include directives should not contain any directory
path names apart from those corresponding to BABAR
packages. In this case, the package name should be used as a directory
prefix:
#include "package/filename.hh"
- The identifier of every globally visible class, function or
variable should begin with
the package "TLA" (Three Letter Acronym) prefix from the package
to which it belongs. This implies that the implementation
(.cc) and interface (.hh) files for C++ classes should also begin
with the same prefix.
- The first line of code in each .cc file (i.e. following the header
comments), should be the following:
#include "BaBar/BaBar.hh"
- Encapsulate global variables and constants, enumerated types,
and typedefs in a class. This greatly reduces the probability of
your names colliding with those from some other package.
class XxxMyGlobals
{
public:
static void myFunc( );
static int myData( );
static void setMyData( int value );
private:
int _myData;
};
[....]
MyGlobals::myFunc( );
- Remember to add comments to both the interface (.hh) file
as well as the implementation (.cc) file for each class.
People will often read just the .hh file to figure out
how to use a class. Interface
comments should stress what the class does whereas implementation
comments should stress how. Using KDOC format for the comments is
greatly preferred, as it allows us to automatically make a
package reference using the LIGHT system.
- Avoid overloading functions and operators unless there is
a clear improvement in the clarity of the resulting code. For
read and write access to data members, use:
int myData( ) const;
void setMyData( const int value );
rather than
int myData( ) const;
void myData( const int value );
The former makes the intention clear whereas the latter is
an example of using a facility because it exists, not because
it has any advantages.
- Data members should have names that begin with a leading underscore
("_") character in order to avoid name conflicts with
any accessor functions.
- As part of defensive coding, use the assert macro to check
that specified conditions are true.
#include <assert.h>
[....]
void
MyClass::myFunction( char* name )
{
assert ( 0 != name ); // Check that the name is not NULL
The assertion can be removed from the code, once it has been
tested, by compiling with the symbol NDEBUG defined.
- Don't implicitly compare pointers to nonzero (i.e. do not
treat them as having a boolean value). Use
if ( 0 != ptr ) ...
instead of
if ( ptr ) ...
- If you are doing an assignment in a comparison expression,
make the comparison explicit:
while ( 0 != (ptr=iterator() ) ) ...
prevents the next person from coming along, seeing
while ( ptr=iterator() ) ...
and saying "That idiot doesn't know the difference
between = and ==, I'll just fix this..." and messing up your code.
-
Unfortunately, the compilers we are currently using don't
support namespaces. Don't use them.
Use KDOC and LIGHT!
When using C++, the preferred form for short comments is:
// This is a one-line comment.
i.e. use the "//" comment format. If the comment
extends over multiple lines, each line must have // at the beginning:
// This is a long and boring comment.
// I need to put // at the start of each line.
// Note that the comment starts at the // and
// extends to the end of line. These comments
// can therefore appear on the same line as code,
// following on from the code.
-
All .cc files
should have
#include "BaBar/BaBar.hh"
as the first non-comment line, before any other include.
This gives us a tool for introducting typedefs, #define's and
other ways of handling compiler differences and class migrations.
- It often useful to choose whether a given pointer
"owns" the object its pointing to or not. If it does, the object
containing the pointer is responsible for deleting the pointed-to owned
object when the owning object has finished using the owned object.
-
A class that contains "owned"
pointers to other objects must have a non-trivial destructor
to delete them.
-
If a class contains a non-trivial destructor, it must be
virtual, unless positive steps have been taken to prevent
inheritance from the class. This is to prevent the non-trivial
destructor from being omitted if an object of a subclass
is being destructed.
-
Arguments passed by reference should never be retained. Use a pointer
argument if its possible that the address will be retained for later use.
Reason: To clue the user of the routine that an address is being used.
-
Member functions which merely return a value (calculated or from a member variable)
should be named for the value, e.g. theta(), x(), length(). Member functions
which store a value into internal state should start with "set", e.g.
setTheta(double ), setX(double ), setLength(int ).
-
Classes should only provide constructors
that build objects that can do what the class is supposed to do.
It sounds obvious, but it means
classes must provide only constructors that make sense for creating
fully functional objects, and
should not provide constructors that do not receive enough
information to build a fully functional object.
Document how these should be
used, including what the null constructor produces. A
null constructor and a copy constructor are desirable iff
they are consistent with what the class is intended to do.
If this means that the
null constructor is not appropriate, that should be documented in the
header file.
-
We have a convention for print methods. "print(ostream & )" prints a
single line summary of an object, without a trailing
endl. "printAll(ostream & )" prints all the
internal state of an object. See AbsEvtObj.hh for
more details. If any print methods are defined, both of
these must also be.
-
Always consider the use of
"standard objects"
within a new class.
If not appropriate for design
reasons, document this with a short statement in the header comments.
If not appropriate because of missing, though consistent, functionality
in the standard object, everybody will
gain if you take the time to add it.
-
Static variables and public data members in module classes are forbidden
because of too many chances for unexpected side-effects.
There should in any case be little need for them,
as there is rarely more than one object of a module class, and they
should not communicate with each other. If inter-module communication
is needed, put it in AbsEvent
or AbsEnv instead.
-
No "random numbers" may be used in reconstruction code. If a sequence of numbers
are needed (e.g. for simulated annealling or doing an integral), they must be
- contained within and used by a single module, i.e. not in a independent class,
- have their seed reset on every event, and
- based on the standard G4Random classes.
-
The size guarantees for various datatypes should be kept in mind. For integer
types, these are:
- short: 8 bits (-127 to +127)
- int: 16 bits (-32767 to +32767)
- long: 32 bits
In particular, use of "short" and "int" should be rare and well documented.
The floating point representations are harder to quantify, but
the net effect is that only "double" should be used in classes
that are likely to be reused or written out.
Optimization to "float", where needed and acceptable, can be done later.
-
Functions should return their value in their return type instead of
via arguments. If more than one
value must be returned, document in the header file why
a new type (class) should not be defined to return the values as a whole.
-
Provide assignment operators and copy ctor for any
class with owned pointers. These can either do the necessary work, or
just be declared "private" and call abort(). They are needed because
the compiler provides them unless they are defined, and the
default ones do not handle owned pointers well.
Be careful of self-assignment; you need to carefully consider
what special things need to happen if somebody writes "a = a;"
or the equivalent.
-
Modules should only change their behavior based on parameters and Framework
commands (including custom ones, if needed). Use of environment
variables directly must be replaced by reading them from
parameters (which can be set from environment variables in the .tcl scripts)
Putting values of cuts, controls, etc, in
"configuration files" is only acceptable as a temporary measure until they can
be loaded from the database; this should be minimized.
-
A class should provide arithmetical operators only if they will
work very much like the underlying operators from high-school math.
Its important that users get what their prior experience leads them
to expect when they use operator + or similar.
-
Because several of the C++ compilers we depend on cannot reliably handle them,
do not use "template specialization" in your designs. This is when you
provide more than one definition of the code for a template, with different ones
for general use and for use with certain templated types.
-
C++ classes that "wrap" FORTRAN (e.g. provide a C++ interface) must
take into account that FORTRAN code is generally not reentrant, and
may not correctly save state across multiple "wrappings". Acceptable
solutions include:
- Keeping the state in the C++ objects, and passing it
as needed to the FORTRAN code (HepTuple works effectively this
way)
- Using a singleton pattern to make sure there is really only
one C++ object
- Using C++ wrappers around FORTRAN code is not an acceptable
way to write a module. If significant FORTRAN code is called
from a module, the module should inherit from AppNoCloneModule
instead of AppModule.
In any case, make sure the restrictions on multiple uses are clearly documented.
-
Classes that logically cannot have more than one instantiation (singleton pattern,
or use of static members for major behavior) must document why its assumed
there will be only one unique instantiation. This must
be an argument about how the
class is to be used when doing physics, not a discussion of how its implemented.
In particular, explain why multiple copies will not be needed for "before and after"
comparisons, "MC vs. data" checks, etc.
-
Ownership of the object that a pointer references should be clearly
thought out and documented. Every pointer in a class definition must be
labeled as either "owned" or "unowned". Every list should be labeled
similarly.
-
Never overlay a fortran name with C++ code. This includes using f2c to
merely translate an existing routine.
Reason: The linker can pick either the new C++ or original F77, and its
impossible to guarantee that the two are equivalent.
-
Routines that do not have a void return type should always
return something (i.e. no "return;" statements or running off the
end of the procedure body).
Null bodies (in member functions that have not yet been written, for
example) often trigger this problem.
-
Preconditions and postconditions are checked with assert( "condition" ).
These may later be compiled out. Whereever possible, these should be
at the beginning and end of a routine, respectively, and commented.
-
If checks for internal logic errors are outside an assert, they
should call abort(), not assert(0) when they fail. This way the
action and check will have the same behavior in test and production versions.
-
Be careful about self-assignment, i.e. you will probably have to have
special purpose code to handle the case "aObject = aObject;", as
in that case memory management has to be careful not to delete things
that its going to use later.
-
Use of "inline" requires judgement.
"inline" makes accessor functions a very good idea. In particular,
only accessing member data via accessor functions, even within the class member functions,
is required for all data and computational classes to ease migration and maintenance.
Performance improvements may not be used as a reason to avoid this; all our compilers
will inline accessors at a high enough optimization level, and most accessors are
not a performance problem in any case.
On the other hand, long, complex in-line member functions should be avoided. They reduce
clarity, make it take longer to compile and harder to debug,
and the performance gain from making
them inline is small for large routines.
It is bad design to make a function inline if that requires additional header
files to be referenced by #include.
In particular, one or more compilers have restrictions that make it
unacceptable to inline:
- functions with multiple returns (except the
simplest 'if...return x; else...return y;',
and the 'else' *must* be present).
- functions with 'while' or 'for' statements
- functions with aggregate initialization
('array = {a1,a2,...an};')
- value returning functions which call non-value-
returning functions.
To avoid future problems, do not inline:
- functions calling member functions of other classes
- functions calling virtual member functions of this class
- functions calling templated functions
In effect, don't inline anything that calls something else, or involves significant
control structures.
-
Only the *GHit and other Geant3 classes should currently
be based on dbio.
-
Use "bool", not "int", for true/false data items.
-
Use enums for small named constants
within classes. Do not use #define for this because
it can result in name collisions in other people code. Unfortunately,
use of "const int nlay=15;" inside a class is not supported by some
compiler(s), so don't use this either. This is the recommended form:
enum { nlay=15 };
-
cpp macros that generate code are generally not acceptable. One exception is "assert"
and similar system-provided macros.
-
Use size_t as the type for array indices instead of int or unsigned int.
(See the prior item about enums for small constants, including array dimensions).
-
Use classes, not arrays when passing arguments and return
values. You need a really
good reason, better than "performance", to pass an array
of objects instead of a list.
You have to have extremely
good reasons to pass "double v[3]" instead of a Hep3Vector object.
-
Use "const" whenever possible. In particular, it should be always used with input
arguments to functions that are passed as pointers or via references. This allows
the compiler to optimize, and clues the user that there will be no side-effects.
Also, if you nest calls without "const" on the arguments:
A& inside ();
void outside(A& a);
outside(inside());
the compiler will be forced to create an extra temporary variable so it can pass a
reference to a copy of the object returned from "inside".
-
Never use #include when a simple forward declaration ("class foo;") will do.
Most header files should not need to include many others.
You only need the full declaration if creating
or destroying an object of that class (e.g.
using as a member variable), inheriting from the class, or
actually invoking a member function. Pointers, references and
use as arguments in declarations do not require the header file
be present.
If header files from another package must be #included to make a member
function "inline",
do not make the function inline.
-
Declaring a member function both "inline" and "virtual" is problematic; there
are very few places where a compiler can actually do the inlining.
-
Due to compiler limitations,
we must be careful with the scoping of variable declarations in for statements.
In particular,
for (int i=0; ... ) { ... }
...
for (int i=0; ... ) { ... }
must be rewritten as
int i;
for (i=0; ... ) { ... }
...
for (i=0; ... ) { ... }
Note that the conditional
expression (the thing in parentheses) in a while statement may not include
declarations, so we expect to
keep the restriction on while statements that code like
while (Foo* p = bar() ) { ... }
is not permitted, and must be instead written as
Foo* p;
while (p=bar() ) { ... }
-
There should be no architecture-specific if's in the GNUmakefiles. Include
an arch_spec_*.mk file, or if need be create a new one.
-
Class static variables of classes that have non-default constructors and/or
destructors must not be used.
These cause too many problems at link time.
Static ints, doubles, and pointers are OK.
Modules
should instead use member variables that point to objects that are allocated
and destroyed in the begin() and end() member functions.
Other classes can use a static member function that wraps a file-static variable, where the
member function definition is in a .cc file:
class Foo {
static Bar& refBar();
}
Foo::refBar () {
static Bar b;
return b;
}
-
If a list of some type of objects must be in a particular order,
for example hits on a track which might have to be in ascending
order by pathlength, the documentation for the list should
clearly specify this, and the member function used to add hits to the
list should check it as a precondition. Also consider making access to the
list "const" to prevent it from being reordered or changed.
-
Indexes (offsets) within an AbsEvent list must never be stored in an object. Use
a pointer instead.
dbio I/O is an exception to this for historical reasons.
Reason:
the lists within an AbsEvent are not guaranteed to
stay in the same order, particularly when written out and read back in.
-
Reconstruction objects in memory should not carry index numbers
of objects within lists
internally instead of pointers.
They must not contain index numbers for lists that they have not created
locally.
Reason: There's no way to force these to stay consistent
with the actual object's index in the list.
Further, the conditions database and event store systems
will only ensure the target objects
are available in memory when they are referenced via pointers.
Note that various objects
moved from GEANT3
via dbio do carry index numbers; these are a historical exception.
-
Some compilers generate unavoidable warnings about "extraneous semicolons". They
seem particular common after function bodies, where they are not needed. Remember
that a semicolon in C and C++ is a statement separator, not the end of a statement.
You rarely need them after a closing brace except at the end of a class definition.
-
On 64-bit machines, you get a warning when you convert from pointer to int. In particular,
this will happen when iPtr is a pointer in
assert(iPtr);
which is better written as
assert(iPtr != 0);
-
Some systems do not have a header file called "ostream.h", so its better to
'#include <iostream.h>' regardless of what types of streams you are referencing.
-
Current vendor compilers do not support the declaration of a data member to
be 'mutable', which allows it to change even inside of a 'const' function.
The 'mutable' declaration is very useful when the data member represents a
a volatile cache of information which might be updated inside a 'const'
function
without changing the state of the core object. Another use is when RogueWave
template classes are used to build object data members. Since many of the RW
functions are not const (including the iterator constructor), access to
RW contained objects not declared 'mutable' in 'const' member functions is
severely restricted.
To temporarily get around this lack, use const inline
functions for every data member which would have been declared mutable, that
'casts off the const' for that variable. Const functions which need to alter
the cache variables then access through this inline. This approach
is somewhat safer than simply casting off the const of the entire object
(*this)
at the begining of these functions. An example is as follows:
class MyClass{
void doSomethingNice() const;
private:
double _cachevalue; // should be declared mutable
double& cacheValue() const { return (double&) _cachevalue;}
RWTPtrSList<double> _doublelist; // should be declared mutable
RWTPtrSList<double>& doubleList() const { return
(RWTPtrSList<double>&) _doublelist; }
};
MyClass::doSomethingNice() const {
// loop over the list
RWTPtrSListIterator<double> iter(doubleList());
...
// update the cache
cacheValue() = newvalue;
}
-
When including a non-system header file, its important that it be written as
#include "packagename/filename.hh"
If you omit the packagename part, there will be trouble with template
instantiations.
-
If you use "assert", you need to define it by
#include "assert.h"
-
The HP iostreams header files are in terrible shape. This implies the order
of their inclusion is important. In particular, iomanip.h must have iostream.h
included _before_ it. Luckily, we're not now using HP, but this is
still a good idea.
- Both the Sun and OSF compilers will not allow you to change the
return type of a virtual function when redefining it in a subclass. This
is very annoying, because it prevents you from returning (for example) a
subclass of what was being returned in the original function. But its a
limitation we have to live with.
-
An "accessor" member function returns access (e.g. a reference or pointer) to a
contained object. If the contained object is always guaranteed to be
present, i.e. the function can always return access to a real object,
then a reference should be returned. If not, i.e. the function must
sometimes indicate that it can't return access to a real object,
it should return a pointer value, and the value of 0 (zero) should be
used to indicate that no object is available. This is a standard C and
C++ idiom.
To avoid accidentally dereferencing a zero pointer, all pointer values
that are returned from functions should be checked for zero values. If
the program cannot cope with an non-existant value, an assert() can be
used. Otherwise, use an explicit comparison to zero in an if statement.
-
C programmers often use "NULL" in place of "0" (zero) when testing a
pointer value to see if no object is pointed to. This can be
problematic in C++. Please don't use NULL, but instead use the literal 0.
(See this
Hypernews thread for discussion)
Return to
BABAR reconstruction software page,
BABAR Computing Home Page,
or
BABAR Home Page.
|