Introduction to C++
Contents:
This section is intended to teach the user just enough
about C++ to be able to make minor changes to existing code.
You can edit C++ and tcl code on any text editor. Popular choices
are xemacs, or emacs.
emacs "knows" about different types of code, so if you name your
file "foo.cc" or "foo.hh" then it will "know" that it is C++, and if
you call it "foo.tcl" then it will "know" that it is tcl. emacs will
then adopt a mode which applies a colour coding to different aspects
of the file, a process called "fontifying". It can also be used to
spot errors like failure to close brackets or to appropriately indent
code (as in Fortran mode). For example, on my emacs editor, the
first part of the Quicktour's QExample.cc looks like this:
// in general, a module constructor should not do much. The beginJob
// begin(run) members are better places to put initialization
QExample::QExample( const char* const theName,
const char* const theDescription )
: AppModule( theName, theDescription )
{
}
Other text editors you could try include kwrite,
nedit, or kate.
C++ code is case sensitive. Two variables named Test and test
are differentiated between. However, the use of variable or
function names that differ only by case is strongly discouraged.
Variable and function names may consist of all alphabet
characters (upper and lower case). Variable names must begin with
either a letter or the underscore character '_'. Most of the C++
language consists of lower case letters comprising key words and
punctuation.
Comments are indicated by a double slash '//'. All text from
the double slash to the end of the line is a comment and is
ignored by the compiler. An alternate syntax uses '/*' to
begin a comment and '*/' to end a comment. This style of
commenting is supported to be compatible with C code. The
double slash is a preferred convention in BaBar C++ code.
A declaration is a statement that introduces a
name into a program. It consists of four parts, two
optional and two mandatory. The general structure is:
[specifier] <base type> <declarator> [initializer];
When the optional initializer is included the statement is
both a declaration and a definition. With the
exception of function definitions, declarations are ended
with a semicolon.
int x; //declaration
int x = 5; //declaration and definition
The '= 5' in the last statement is the
initializer. The equals sign '=' is the
assignment operator.
A function declaration has the format:
<return type> ClassName::FunctionName(<type> <name> , <type>
<name> , .....);
The type and name pairs in parenthesis are arguments of the
function. When a function has more than one argument they
are separated by commas. If a function has no arguments
then the parenthesis are still included but with nothing in
between. When a function does not return any values or objects
the return type is void. This declaration syntax
may also be referred to as the function prototype.
The definition of a function typically consists of a series
of statements. The code that comprises the function's
definition is demarcated by curly braces. There is no
semicolon at the end of a function's definition.
For example:
int ExampleFunction(int i){
int j; //declaration of j
statement one;
statement two;
...
return j;
}
ExampleFunction takes one argument, an integer. Within the
body of the function an integer named j is declared. Some
calculations using the argument integer i and the function's
integer j are performed. The final value of j is returned
by ExampleFunction.
Typically a statement is one line of code, in
either the body of a program or the definition of a function.
Statements are a base unit of code and in C++, as in C, they
must be terminated by a semicolon. A statement may fulfill
one of many roles: declaration, definition, call to a
function, allocation of memory, assignment, calculation,
and so forth.
There are two ways to use your new function, the main one being to
make an object (an "instantiation") of the class:
ClassName myClassObject;
Then call that function through your new instance of the class:
int returnedinteger = myClassObject->ExampleFunction(3);
where in this example, the integer 3 is passed to the example
function, and the new integer "returnedinteger" is assigned the
returned value of the function.
When a declaration is made, memory is set aside for the
declared variable. This compiler needs a type
associated with all memory so that it can properly interpret
the stored data. The built in memory types of C++ are: int
(integer), double (floating point), bool (boolean), and char
(character).
In addition to memory being one of these types, memory can be
defined as a pointer to a type. A variable declared as a pointer
is interpreted as an address of memory. The syntax of this
declaration is to append an asterix '*' to the type. For example:
int* x;
The variable x can have the address of an integer
stored in its associated memory. It is important to recognize
that a declaration will set aside, allocate, memory for
a pointer. The memory for the data type pointed to must be
allocated and defined explicitly. This is called initialization.
To initialize or access the value of a pointer one must dereference
the memory pointed to with the asterix '*', and the pointer itself
with the ampersand '&'. For example:
*x = 5; //assign 5 to the memory x points to
int y; //declare an int named y
y = *x; //assign y the value x points to (both ints)
y = &x; //assign y the value stored in x, an address
A built in data structure of C++ is the array. An array
is a block of memory set aside for the contiguous storage of
like elements. The declaration of an array must include the
type and number of elements to be stored. An array is indicated
by appending a set of square brackets '[ ]' to the variable name.
For example:
int myarray[10]; //declare array of 10 integers
To define any element of the array, one may use the assignment
operator and place the element index between the square brackets.
Array indexing begins with zero, 0. Thus, for an array of ten
integers valid indices are 0 through 9.
myarray[4] = 6; //assign 6 to the 5th element
User-defined data types is central to the design of C++. The most
common unit of user-defined data in C++ and BaBar code specifically
is the class.
To create a class a user must define it to the compiler. A class
is identified by the name determined in the declaration and
consists of data members and member functions
(also called methods).
The syntax of this declaration is:
class ClassName{
data members and member functions
};
This defines a new type and is thus referred to as a class
definition. However, historically and analogous to general
declarations, it is also called a class declaration.
To add to the linguistic confusion, the implementation of the
class (the code that defines each member function), is also
called the class definition. For consistency I will refer to
this initial step as the declaration and the implementation as
the definition.
The data members and member functions of a class can be placed in
one of three categories:
- public
- accessible to all code
- protected
- accessible only to friends of this class, classes and
friends of classes that inherit from this one
- private
- accessible only to classes that inherit from this one
Typically data members that store the state of the object should
be private. This protects the implementation of the class. Though
users can read the declaration of a class, client code is not
allowed to access it, and the data is protected from being
altered. The functions supporting the class, those that make it
a useful entity for applications, should be made public.
Data members of a class are declared with the same syntax as
variables. Similarly, member functions are declared with the
same syntax as general functions. However, the name of a member
function is ClassName::FunctionName. The '::'
character is called the scope resolution operator. It
indicates that FunctionName is a function of the
ClassName class. Multiple classes with same function
names do not give rise to conflicts or ambiguity. When a member
function is called, it is accessed using the scope resolution
operator on the object. For example:
ClassName example; //declare an object named example, type ClassName
example::FunctionName(); // call the function FunctionName associated
// with ClassName to act on example
Two special member functions are the constructor, same name
as the class, and the destructor , class name prepended with
a tilde '~' character. The constructor is called whenever a
variable of its class type is declared. Similarly the destructor
is called whenever an object of the class type needs to be deleted.
When the class is declared, the member functions and data members
are also declarations. The body of the class declaration is
within curly braces and is terminated with a semicolon. The
class definition provides each member function's definition.
The definition of each member function is contained within
curly braces and is not terminated with a semicolon.
Here is an example class declaration and
class definition.
C++ offers a set of characters for performing comparison. If the
relationship is satisfied then a boolean (bool) true is returned,
else a bool false is returned.
| Operator | Definition |
| = = | equal to |
| ! = | not equal |
| > | greater than |
| > = | greater or equal |
| < | less than |
| < = | less or equal |
Some built in C++ statements take boolean values as their argument.
If the conditional argument is true the rest of the statement will
be executed if the conditional argument is false then it is not.
When the statement to be executed only exists of one statement, that
statement is terminated with a semicolon. When there are multiple
statements to be executed, they are bounded by curly braces. Each
statement in the statement body is terminated
by a semicolon, whereas the body itself is not (no semicolon after
the closing curly brace). Perhaps the most common conditional
statements for analysis are the if and the while statements.
If statements are useful for execution of a block of code once
after a test has been satisfied. For example, only if a particle
is within a mass range should analysis be continued. The conditional
test is performed, if it evaluates to true the statement body is
executed. In the if/else block if the condition evaluates to false
then the else body of statements is executed.
if (condition) statement;
if (condition) {
statements....
}
if (condition) {
statements...
} else {
statements...
}
While statements are used when a block of code should be executed as
long as the test/conditional argument is true. If the condition
evaluates to true then the statements in the body are executed.
Execution of code then returns to the condition and evaluates it
again. This sequence will continue until the condition evaluates
to false. While loops are very useful for running quick checks
and making plots.
while (condition) statement;
while (condition) {
statements...
}
A common source of errors, bugs, is the accidental replacement
of the assignment operator with the boolean 'is equal' comparator and
vice versa.
While loops and if statements can be used on their own, as
part of a function, and be nested within
themselves and each other. The body of code associated with
each statement is marked by curly braces. Each open curly
brace must be matched with a closed curly brace. When
there is nesting the entire nested statement must be
within the body of the outer statement. In a multiply nested
sequence the closed braces will be associated in reverse order
from the open braces. That is the first open curly brace will
be matched by the last closed curly brace.
When a variable is declared memory is allocated for it (in
an area of memory called the stack). The
scope of a variable begins at its declaration
and persists until it is either explicitly deleted or the
body of code in which it was created has finished executing.
For example, a variable declared within a member function is
allocated at the time of its declaration. When execution of
the code moves past the closing curly brace of the function
the variable is said to have gone out of scope. When this
happens the memory that was allocated to the variable is no
longer reserved. That memory can now be reused by the operating
system and the variable that has gone out of scope should not
be accessed.
It is important to keep track of the body delimiters of these
statements for compile purposes, for illuminating the scope
of variables, and for making code readable. A standard
convention, also used in BaBar code, is the use of indentation
when nesting occurs. All code in the body of a statement should
be indented systematically. The amount of indentation should
correspond to the level of nesting. The curly brace closing the
body of a statement should be placed on its own line at the
same level of indentation as the opening part of the statement.
Look
here for an example.
Sequential storage of data is a common occurrence. Often general
packages of code or libraries make available typical structures
such as arrays, lists, and vectors. For these containers
to be functional, a user must be able to transverse the elements,
often in a systematic or all inclusive manner. At the same time
the encapsulation of implementation details must be preserved.
The notion of an iterator satisfies both of these
requirements.
An iterator is an abstraction of a pointer to an element. It is
typically implemented as a class or function associated with a
given container class. The iterator points to one element in the
sequence and has access to the information needed to move to the
next element of the sequence. It also has access to information
that will determine the end of the data sequence. Concepts supported
by a general iterator are the idea of the current element, next
element/incrementation, and equality/comparison.
In BaBar each reconstructed event contains many lists of like data,
for example lists of pions, charged tracks, and so forth. Iterators
used in conjunction with loops facilitate execution of a segment of
analysis code on each element in a list. For example, an iterator is
used to access a charged track in an event's list, and then the
momentum is plotted in a histogram. This sequence continues to loop
until each track of the list has been plotted.
For large programs it is not reasonable for all of the code to exist
in one file. This is due to readability, maintenance, and primarily
compile time. If all of the code were in one unit, even the smallest
change would require re-compilation of all code.
To avoid this very costly dependence, code is partitioned into a set
of coherent modules. The physical structure, the system of
code files, is likely to reflect the logical structure of
the program.
The many units of a source code in a large program must be
mutually consistent. For one, types in declarations must be
uniform throughout all units of code. A primary method of
accomplishing this is to gather all declarations and interface
information into one place, a header file, while placing
the definition code into an implementation file.
Header files will contain the declarations an implementation file
wants to make available to other units of code. The standard code
that a header file should include are type definitions, function
declarations, and name declarations. By BaBar convention header
files have names with the suffix '.hh'
Units of code, files, access the code declared in a header file
by using a preprocessor include command. The syntax is:
#include "<header file name>"
Before code is compiled the preprocessor will prepend a copy of
the header file in any file that has included it. The final
executable usually needs only one compilation of a header file,
even though that header file may be included in many code files.
To prevent unnecessary compilation of header file code the
following macro syntax is used.
#ifndef <definevalue>
#define <definevalue>
...header file contents
#endif
The first time the compiler sees the header file code it is compiled
and internally assigned a value. When the compiler comes to the header
file again, it is already defined so everything between the ifndef and
the endif is ignored. BaBar convention sets
definevalue to the name of the header file in
all capital letters ended by _HH. For example, the
QExample.hh file is defined QEXAMPLE_HH.
All of the source code for the implementation and definition
of a header file's declarations is placed in an implementation
file. Complete function definitions should be placed in the
implementation file. By BaBar convention implementation files
have names with the suffix '.cc'.
The implementation must have access to the declarations and
types that it defines, so it must include its own header file.
Standard libraries are included with the C++ language to provide
commonly used and needed functions and types. Accessing the code
of a standard library is analogous to using user-written source
code. Any code making use of a standard library must include
it. The include syntax is the same as for including header files
except the standard library name is enclosed by angle brackets
instead of double quotes.
#include <<library name>>
General Related Documents:
Page maintained by Adam Edwards
Last modified: January 2008
|