|
|
Reconstruction Design Checklist
This page is an informal and growing checklist for reconstruction code. More structure needs to be added, and an index provided; these things are coming.
On this page: [Kinds of Classes] [Design Issues] [Documentation Issues] [Implementation Issues]
Other pages: [ Tracking Discussion]
Recent Changes
- Jul 27, 1998 -- Removed limitations dues to AIX compiler (Bob)
- Apr 20, 1998 -- Added singleton rule for FORTRAN wrappers (Bob)
- Mar 12, 1998 -- Clarify while loop scope restriction (Bob)
- Mar 10, 1998 -- Module public data, like module static data, forbidden (Bob)
- Feb 16, 1998 -- Made class statics of complex objects forbidden (I-11, III-19) (Bob)
- Feb 12, 1998 -- Added NULL vs. 0 discussion (0 is preferred) (Bob)
A class should be of at most one of the following types:
- Modules (subclasses of AppModule): These are the unit of execution and processing. A module gets objects from the event, maniplates them, and puts new objects back to the event. Its essentially pure algorithm, with only enough internal state to handle
configuration and control
- Event data classes (subclasses of AbsEvtObj): All transient objects that can be in the event; the stuff that is manipulated by modules. This includes tracks, clusters, etc.
- Classes for event persistance: The persistant classes that store representations of (2), plus the various Convertor classes that manipulate them
- Conditions data classes (subclasses of AbsEnvObj): All transient objects that represent "conditions" data, such as calibrations, alignments, etc.
- Classes for conditions data persistance: The persistant classes that store representations of (4), plus the various proxies, Cal* classes, etc that animate them.
- All .cc files should have
#include "BaBar/BaBar.hh"
as the first non-comment line, before any other include. This gives us a tool for introducting typedefs, #define's and other ways of handling compiler differences and class migrations.
- Most complex classes should inherit from exactly one of AppModule, AbsEvtObj or AbsEnvObj. In particular, only classes that inherit from AbsEvtObj or AbsEnvObj will be capable of being stored for later use. No class should inherit from more than one
of these.
- A class that contains "owned" pointers to other objects must have a non-trivial destructor to delete them.
- If a class contains a non-trivial destructor, it must be virtual, unless positive steps have been taken to prevent inheritance from the class. This is to prevent the non-trivial destructor from being omitted if an object of a subclass is being
destructed.
- Arguments passed by reference should never be retained. Use a pointer argument if its possible that the address will be retained for later use. Reason: To clue the user of the routine that an address is being used.
- Member functions which merely return a value (calculated or from a member variable) should be named for the value, e.g. theta(), x(), length(). Member functions which store a value into internal state should start with "set", e.g. setTheta(double ),
setX(double ), setLength(int ).
- Classes should only provide constructors that build objects that can do what the class is supposed to do. It sounds obvious, but it means classes must provide only constructors that make sense for creating fully functional objects, and should not
provide constructors that do not receive enough information to build a fully functional object. Document how these should be used, including what the null constructor produces. A null constructor and a copy constructor are desirable iff they are
consistent with what the class is intended to do. If this means that the null constructor is not appropriate, that should be documented in the header file.
- We have a convention for print methods. "print(ostream & )" prints a single line summary of an object, without a trailing endl. "printAll(ostream & )" prints all the internal state of an object. See AbsEvtObj.hh for more details. If any print
methods are defined, both of these must also be.
- Pay attention to dependencies. In particular, modules go in *Reco packages and classes that are contained in the Event go in *Data packages to avoid problems with link order.
- Always consider the use of "standard objects" within a new class. If not appropriate for design reasons, document this with a short statement in the header comments. If not appropriate because of missing, though consistent, functionality in the
standard object, take the time to add it.
Static variables and public data members in module classes are forbidden because of too many chances for unexpected side-effects. There should in any case be little need for them, as there is rarely more than one
object of a module class, and they should not communicate with each other. If inter-module communication is needed, put it in AbsEvent or AbsEnv instead.
- No "random numbers" may be used in reconstruction code. If a sequence of numbers are needed (e.g. for simulated annealling or doing an integral), they must be
- contained within and used by a single module, i.e. not in a independent class,
- have their seed reset on every event, and
- based on the standard G4Random classes.
- The size guarantees for various datatypes should be kept in mind. For integer types, these are:
- short: 8 bits (-127 to +127)
- int: 16 bits (-32767 to +32767)
- long: 32 bits
In particular, use of "short" and "int" should be rare and well documented.
The floating point representations are harder to quantify, but the net effect is that only "double" should be used in classes that are likely to be reused or written out. Optimization to "float", where needed and acceptable, can be done later.
- Functions should return their value in their return type instead of via arguments. If more than one value must be returned, document in the header file why a new type (class) should not be defined to return the values as a whole.
- Provide assignment operators and copy ctor for any class with owned pointers. These can either do the necessary work, or just be declared "private" and call abort(). They are needed because the compiler provides them unless they are defined, and the
default ones do not handle owned pointers well. Be careful of self-assignment; you need to carefully consider what special things need to happen if somebody writes "a = a;" or the equivalent.
- Modules should only change their behavior based on parameters and Framework commands (including custom ones, if needed). Use of environment variables directly must be replaced by reading them from parameters (which can be set from environment
variables in the .tcl scripts) Putting values of cuts, controls, etc, in "configuration files" is only acceptable as a temporary measure until they can be loaded from the database; this should be minimized.
- A class should provide arithmetical operators only if they will work very much like the underlying operators from high-school math. Its important that users get what their prior experience leads them to expect when they use operator + or similar.
- Because several of the C++ compilers we depend on cannot reliably handle them, do not use "template specialization" in your designs. This is when you provide more than one definition of the code for a template, with different ones for general use and
for use with certain templated types.
C++ classes that "wrap" FORTRAN (e.g. provide a C++ interface) must take into account that FORTRAN code is generally not reentrant, and may not correctly save state across multiple "wrappings". Acceptable
solutions include:
- Keeping the state in the C++ objects, and passing it as needed to the FORTRAN code (HepTuple works effectively this way)
- Using a singleton pattern to make sure there is really only one C++ object
- Using C++ wrappers around FORTRAN code is not an acceptable way to write a module. If significant FORTRAN code is called from a module, the module should inherit from AppNoCloneModule instead of AppModule.
In any case, make sure the restrictions on multiple uses are clearly documented.
- Every package must have a README or README.html file. It must contain at least the name of the package coordinator, a brief description of the purpose of the package, a brief list of classes and their functionality, and a description of the external
dependencies.
- Classes that logically cannot have more than one instantiation (singleton pattern, or use of static members for major behavior) must document why its assumed there will be only one unique instantiation. This must be an argument about how the class is
to be used when doing physics, not a discussion of how its implemented. In particular, explain why multiple copies will not be needed for "before and after" comparisons, "MC vs. data" checks, etc.
- Ownership of the object that a pointer references should be clearly thought out and documented. Every pointer in a class definition must be labeled as either "owned" or "unowned". Every list should be labeled similarly.
- Never overlay a fortran name with C++ code. This includes using f2c to merely translate an existing routine. Reason: The linker can pick either the new C++ or original F77, and its impossible to guarantee that the two are equivalent.
- Use the standard link support, including the standard link command line and PackageList/link_all_reco.mk. This will permit compiler changes, needed to use of automated leaker checkers, etc. See PackageList/README and the header of
PackageList/link_all_reco.mk for more information.
- Routines that do not have a void return type should always return something (i.e. no "return;" statements or running off the end of the procedure body). Null bodies (in member functions that have not yet been written, for example) often trigger this
problem.
- Preconditions and postconditions are checked with assert( "condition" ). These may later be compiled out. Whereever possible, these should be at the beginning and end of a routine, respectively, and commented.
- If checks for internal logic errors are outside an assert, they should call abort(), not assert(0) when they fail. This way the action and check will have the same behavior in test and production versions.
- Be careful about self-assignment, i.e. you will probably have to have special purpose code to handle the case "aObject = aObject;", as in that case memory management has to be careful not to delete things that its going to use later.
- Use of "inline" requires judgement.
"inline" makes accessor functions a very good idea. In particular, only accessing member data via accessor functions, even within the class member functions, is required for all data and computational classes to ease migration and maintenance.
Performance improvements may not be used as a reason to avoid this; all our compilers will inline accessors at a high enough optimization level, and most accessors are not a performance problem in any case.
On the other hand, long, complex in-line member functions should be avoided. They reduce clarity, make it take longer to compile and harder to debug, and the performance gain from making them inline is small for large routines. It is bad design to make
a function inline if that requires additional header files to be referenced by #include. In particular, one or more compilers have restrictions that make it unacceptable to inline:
- functions with multiple returns (except the simplest 'if...return x; else...return y;', and the 'else' *must* be present).
- functions with 'while' or 'for' statements
- functions with aggregate initialization ('array = {a1,a2,...an};')
- value returning functions which call non-value- returning functions.
To avoid future problems, do not inline:
- functions calling member functions of other classes
- functions call virtual member functions of this class
- functions calling templated functions
In effect, don't inline anything that calls something else, or involves significant control structures.
- Only the *GHit and other Geant3 classes should currently be based on dbio.
- Use "bool", not "int", for true/false data items.
- Use enums for small named constants within classes. Do not use #define for this because it can result in name collisions in other people code. Unfortunately, use of "const int nlay=15;" inside a class is not supported by some compiler(s), so don't use
this either. This is the recommended form:
enum { nlay=15 };
- cpp macros that generate code are generally not acceptable. One exception is "assert" and similar system-provided macros.
- Use size_t as the type for array indices instead of int or unsigned int. (See the prior item about enums for small constants, including array dimensions).
- Use classes, not arrays when passing arguments and return values. You need a really good reason, better than "performance", to pass an array of objects instead of a list. You have to have extremely good reasons to pass "double v[3]"
instead of a Hep3Vector object.
- Use "const" whenever possible. In particular, it should be always used with input arguments to functions that are passed as pointers or via references. This allows the compiler to optimize, and clues the user that there will be no side-effects. Also,
if you nest calls without "const" on the arguments:
A& inside ();
void outside(A& a);
outside(inside());
the compiler will be forced to create an extra temporary variable so it can pass a reference to a copy of the object returned from "inside".
- Never use #include when a simple forward declaration ("class foo;") will do. Most header files should not need to include many others. You only need the full declaration if creating or destroying an object of that class (e.g. using as a member
variable), inheriting from the class, or actually invoking a member function. Pointers, references and use as arguments in declarations do not require the header file be present. If header files from another package must be #included to make a member
function "inline", do not make the function inline.
- Declaring a member function both "inline" and "virtual" is problematic; there are very few places where a compiler can actually do the inlining.
- Due to compiler limitations, we must be careful with the scoping of variable declarations in for statements. In particular,
for (int i=0; ... ) { ... }
...
for (int i=0; ... ) { ... }
must be rewritten as
int i;
for (i=0; ... ) { ... }
...
for (i=0; ... ) { ... }
Note that the conditional expression (the thing in parentheses) in a while statement may not include declarations, so we expect to keep the restriction on while statements that code like
while (Foo* p = bar() ) { ... }
is not permitted, and must be instead written as
Foo* p;
while (p=bar() ) { ... }
- There should be no architecture-specific if's in the GNUmakefiles. Include an arch_spec_*.mk file, or if need be create a new one.
Class static variables of classes that have non-default constructors and/or destructors must not be used. These cause too many problems at link time. Static ints, doubles, and pointers are OK. Modules should
instead use member variables that point to objects that are allocated and destroyed in the begin() and end() member functions. Other classes can use a static member function that wraps a file-static variable, where the member function definition is in a
.cc file:
class Foo {
static Bar& refBar();
}
Foo::refBar () {
static Bar b;
return b;
}
- If a list of some type of objects must be in a particular order, for example hits on a track which might have to be in ascending order by pathlength, the documentation for the list should clearly specify this, and the member function used to add hits
to the list should check it as a precondition. Also consider making access to the list "const" to prevent it from being reordered or changed.
- Indexes (offsets) within an AbsEvent list must never be stored in an object. Use a pointer instead. dbio I/O is an exception to this for historical reasons. Reason: the lists within an AbsEvent are not guaranteed to stay in the same order,
particularly when written out and read back in.
- Reconstruction objects in memory should not carry index numbers of objects within lists internally instead of pointers. They must not contain index numbers for lists that they have not created locally. Reason: There's no way to force these to stay
consistent with the actual object's index in the list. Further, the eventual database system will only ensure the target objects are available in memory when they are referenced via pointers. Note that various objects moved from GEANT3 via dbio do carry
index numbers; these are a historical exception.
- Some compilers generate unavoidable warnings about "extraneous semicolons". They seem particular common after function bodies, where they are not needed. Remember that a semicolon in C and C++ is a statement separator, not the end of a statement. You
rarely need them after a closing brace except at the end of a class definition.
- On 64-bit machines, you get a warning when you convert from pointer to int. In particular, this will happen when iPtr is a pointer in
assert(iPtr);
which is better written as
assert(iPtr != 0);
- Some systems do not have a header file called "ostream.h", so its better to '#include <iostream.h>' regardless of what types of streams you are referencing.
- Current vendor compilers do not support the declaration of a data member to be 'mutable', which allows it to change even inside of a 'const' function. The 'mutable' declaration is very useful when the data member represents a a volatile cache of
information which might be updated inside a 'const' function without changing the state of the core object. Another use is when RogueWave template classes are used to build object data members. Since many of the RW functions are not const (including the
iterator constructor), access to RW contained objects not declared 'mutable' in 'const' member functions is severely restricted.
To temporarily get around this lack, use const inline functions for every data member which would have been declared mutable, that 'casts off the const' for that variable. Const functions which need to alter the cache variables then access through this
inline. This approach is somewhat safer than simply casting off the const of the entire object (*this) at the begining of these functions. An example is as follows:
class MyClass{
void doSomethingNice() const;
private:
double _cachevalue; // should be declared mutable
double& cacheValue() const { return (double&) _cachevalue;}
RWTPtrSList<double> _doublelist; // should be declared mutable
RWTPtrSList<double>& doubleList() const { return
(RWTPtrSList<double>&) _doublelist; }
};
MyClass::doSomethingNice() const {
// loop over the list
RWTPtrSListIterator<double> iter(doubleList());
...
// update the cache
cacheValue() = newvalue;
}
- When including a non-system header file, its important that it be written as
#include "packagename/filename.hh"
If you omit the packagename part, there will be trouble with template instantiations.
- If you use "assert", you need to define it by
#include "assert.h"
- The HP iostreams header files are in terrible shape. This implies the order of their inclusion is important. In particular, iomanip.h must have iostream.h included _before_ it.
- Both the Sun and OSF compilers will not allow you to change the return type of a virtual function when redefining it in a subclass. This is very annoying, because it prevents you from returning (for example) a subclass of what was being returned in
the original function. But its a limitation we have to live with.
An "accessor" member function returns access (e.g. a reference or pointer) to a contained object. If the contained object is always guaranteed to be present, i.e. the function can always return access to a real
object, then a reference should be returned. If not, i.e. the function must sometimes indicate that it can't return access to a real object, it should return a pointer value, and the value of 0 (zero) should be used to indicate that no object is
available. This is a standard C and C++ idiom.
To avoid accidentally dereferencing a zero pointer, all pointer values that are returned from functions should be checked for zero values. If the program cannot cope with an non-existant value, an assert() can be used. Otherwise, use an explicit
comparison to zero in an if statement.
C programmers often use "NULL" in place of "0" (zero) when testing a pointer value to see if no object is pointed to. This can be problematic in C++. Please don't use NULL, but instead use the literal 0. (See this
Hypernews thread for discussion)
Return to BABAR reconstruction software page, BABAR Computing Home Page, or BABAR
Home Page.
Maintained by Bob Jacobsen,
Bob_Jacobsen@lbl.gov 510-486-7355
|
|