Igor A. Gaponenko
April 7, 2004
2.1 Restrictions
2.1.1 Limitations of the transient model
2.1.2 Limitations of persistent implementations
2.1.2.1 CdbBdbTable
3 Transient representation of generic N-Tuple-s
3.1 Dealing with a hierarchy of N-Tuple classes
3.1.1 Creating instances of transient n-tuples
3.1.1.1 CdbNTupleFactory
3.1.1.2 CdbNTupleFactoyFE
3.2 Metadata
3.3 Columns
3.4 Rows
3.4.1 Changing the number of rows
3.4.2 Adding/inserting a row
3.4.3 Replacing an existing row
3.4.4 Reordering (sorting) rows
3.6 Modifying elements in N-Tuples
3.7 Algorithms
3.7.1 Merge rows of two N-Tuples
3.7.2 Append all rows of one N-Tuple by the end of another one
3.7.3 Make a copy of N-Tuple
3.7.4 Glue columns of two N-Tuples
3.8 Printing the contents of N-Tuples
3.8.1 Printing using a custom conversion policy
3.9 Reading N-Tuple-s from flat text files into a transient memory representation
4 Persistent representations and converters to/from transient ones
4.1 CdbBdbTable
4.1.1 Transient-to-persistent conversion
4.1.2 Persistent-to-transient conversion
4.1.2.1 CdbBdbNTupleProxy
4.1.2.2 Writing custom proxies
5 Tools
This documents introduces an extended CDB API allowing to store table -like data structures ("n-tuples") in CDB. Potential users of this API are those whose data modeling needs can be fully met by these relatively simple data structures. The main benefits of using the proposed API instead of the core CDB API are:
for users: there will be no need to be involved into developing technology-specific persistent representations for transient n-tuples, because each (technology-specific) implementation of CDB API will automatically provide a persistent "back-end" and bidirectional conversion facility between transient and persistent representations of n-tuples. The user's code will be mostly dealing with a pure transient API. Only a tiny fraction of users' code will have to depend on technology-specific conversion facilities to trigger them at a right time. That fraction of code can be easily encapsulated into special modules.
for those who maintain the code and CDB installations: it will simplify a migration of the clients code and database contents from one persistent technology to another one, should this be needed. An application configuration (to choose a source of n-tuples) process will also become simpler at a presence of two or more technology specific implementations of CDB.
The new API and its implementation include the following components:
transient classes representing a model of n-tuples
algorithms dealing with the transient n-tuples (merge, split, sort, print, etc.)
persistent (technology specific) classes representing transient n-tuples for each persistent technology supported by CDB
conversion facilities (persistent technology specific) performing transient-to-persistent (and vs.) transformation of n-tuples
In addition there is:
a set of interactive tools for browsing and printing the contents of existing n-tuples from CDB, loading new tuples into CDB from flat files, etc.
The new API will be available as of 14.5.1/analysis-20 software release. The related code is found in the following package:
CdbTable
Currently the persistent "back-end" support and conversion facility is only available for "Bdb" (Objectivity/DB based) technology of CDB. The related package name is:
CdbBdbTable
New packages for other persistent technologies may show up in the future.
Here is how the "generic n-tuple" is defined in this API:
The n-tuple is a random access table of a fixed number of columns (also: width) and arbitrary number of rows (also: length) whose elements contain data of the same type called "element type". These are three user-controlled parameters of a particular n-tuple.
The number of columns and the type of elements are statically defined at the compilation time as a part of the n-tuple class template. These parameters can't be changed during the lifetime of a tuple.
The number of rows can be dynamically changed by adding new rows, removing existing rows or resizing a tuple.
The columns are addressed either by their numbers (starting from 0) or by their names (specified by a user at the tuple creation time).
The rows are addressed by their numbers (starting from 0).
The order of data stored in a tuple is preserved by API and CDB database, unless a user would make changes. One practical consequence of this rule is that a user may rely on the order of elements in a tuple, for instance, to speed up associative search operations in tuples.
n-tuple also has two strings of arbitrary length representing metadata: ntuple's name and its description. The metadata is also preserved in a persistent database.
At the transient level this model is expressed by mean of an interface (abstract class) which is allowed to have multiple implementations, both "out-of-box" ones, provided by the extended CDB API itself, and those implemented by users themselves. It's up to a user which particular implementation to use. In case of the "out-of-box" implementation there is a special factory class serving instances of n-tuples upon a user's request. Code developers may also create their own implementations of n-tuples by deriving from the above mentioned abstract class. See details at the "Transient representation of generic N-Tuple-s" section.
In full accordance with a mainstream approach of the CDB API, the polymorphic components of the API must be utilized via counted smart pointers. This approach also helps to handle the memory management issues when n-tuple instances must be destroyed. The above mentioned "out-of-box" factory will automatically deliver smart pointers onto newly created transient n-tuples. Since the rest of the proposed API will also be dealing with the smart pointers then it's up to a developer of a custom implementation of the base n-tuple interface to construct the corresponding smart pointer.
Another source of tuples is CDB database itself, where they (n-tuples) were previously stored by some other application process, and from where they can be obtained through the extended CDB API. To get this information from CDB a user code will have to interact with the CDB API to find a required persistent n-tuple in the database and convert it into the transient form using the provided (technology-specific) conversion facility.
Note, that a user must provide the conversion facility with static parameters of a tuple (number of columns and a type of elements), so that the facility would be able to try its best to match these expected parameters with the actual type of the found persistent tuple.
Once created, a transient tuple can be used either directly via its public API or indirectly via one of the supplied algorithms. All algorithms will deal with transient n-tuples only.
N-tuples can also be stored in CDB using a provided (technology specific) conversion facility and the core CDB API.
Restrictions on parameters and use of n-tuples come from two sources:
The limitations of the transient model itself
The limitations of a particular persistent technology-specific implementation of the persistent "back-end" and the corresponding conversion facility
Here is what to expect for the transient model:
number of columns: is virtually unlimited (at least 32-bit range)
number of rows: is virtually unlimited (at least 32-bit range)
type of elements: can be any, as long as this type has the following in its public interface:
default and copy constructors
destructor
assignment and optionally (see section "Algorithms") "strictly-less-then" operators
max. size of a tuple: is limited by the amount of the amount of the virtual memory available to a process
max. length of metadata strings: is limited by the amount of the virtual memory available to a process
The biggest limitation of this Objectivity/DB based "back-end" implementation is its limited support for types of elements. Only the following primitive C++ types are allowed to be used in transient n-tuples if these tuples must be stored at or retrieved back from CDB:
short types : 16-bit integer types
int types : 32-bit integer types
long types : since the number of bits in this type varies (32 or 64 bits) on different platforms/compilers then internally the "back-end" conversion facility will always map this type into 64-bit long long type
long long types : 64-bit integer
float : 32-bit float type
double : 64-bit float type
char : char type
unsigned char : 8-bit integer type
signed char : 8-bit integer type
In addition the API allows the following string type:
At the moment, the current implementation of the "back-end" does not have a mechanism for extending the above mentioned types with arbitrary user defined types of elements. There is a type mapping mechanism though, which can be used by a client's code to map internal transient types into any of the above mentioned predefined types when doing persistent-to-transient or transient-to-persistent conversion. The mechanism is described in the corresponding section later in the document.
As to other restrictions, then:
number of columns: has exactly 32-bit range
number of rows: has exactly 32-bit rang
max. size of a tuple: is limited by the maximum size of a persistent container in n Objectivity/DB
a theoretical limit for the container size is a product of the page size (currently: 16 KB) and the number of number of pages per container (currently: 64 K), which would give us 1 GB
a practical limit depends on how many containers are stored in the same database image, which itself is limited by 2 GB on some platforms
max. length of metadata strings: the theoretical limit is 4 GB. The practical limit is the above mentioned container size.
Transient N-Tuples are represented with the following abstract template class and its base class:
// File: CdbTable/CdbNTuple.hh
template< class T, unsigned int NCOL >
class CdbNTuple : public CdbNTupleBase {
...
};It's important to note that besides a usual template parameter for specifying a type of elements stored in an objects, there is the second parameter specifying the number of column in the N-Tuple. As a result, these two classes:
CdbNTuple<float,10> CdbNTuple<float,8>
will be completely different ones (from a C++ compiler's point of view). Specifying the number of columns in this way would help to achieve greater client's code robustness, so that any attempts to use improper types would be stopped by a C++ compiler at a compilation time rather than be detected at the run time. And clearly in this model the number of columns won't change over the life time of an N-Tuple object. Should there be a need to convert two objects of incompatible types the library of algorithms supplied with N-Tuples would provide the corresponding facilities to do so explicitly.
See restrictions on allowed types of elements in the "Restrictions" section.
Before to proceed with further exploration of the extended API of described N-Tuple facility, it makes a sense to make a short side comment on how the API is going to cope with a hierarchy of transient N-Tuple classes. As it's already been mentioned in the "An overview of the new API" section, the base interface modeling the transient n-tuples can be implemented in various ways, which can be tuned for specific needs of users' applications. That's why, as it will be seen later, CdbNTuple is defined as an abstract class whose (some of them) methods are defined as pure virtual ones meaning that it's up to its particular implementations to provide appropriate solutions. Unfortunately this design instantly turns over the table of a game because now it forces users to deal with pointers onto the abstract base class rather than with values of a concrete class. That means bearing a burden of a responsibility to manage the life cycle of N-Tuple objects. On the other hand dealing with concrete values may not an optimal approach from the performance point of view for big tables. A similar problem applies for smallish objects if they're intensively passed around an application. CDB API has a unified (within the API itself) solution to this dilemma – so called “counted smart pointers”. The pointers are defined through the following template class:
// File: CdbBase/CdbCPtr.hh
template< class T, ... >
class CdbCPtr ... {
...
An idea is that now users will be dealing with small values of smart pointers, which would provide proper memory management for pointed objects when it's needed.
To facilitate creation of new n-tuples the API provides a special factory class. The class will create new instances of some predefined implementation of n-tuples and return (set up) smart pointers onto the above mentioned abstract base class. Here is how the factory class looks like:
// File: CdbTable/CdbNTupleFactory.hh
template< class T, unsigned int NCOL >
class CdbNTupleFactory {
static CdbCPtr< CdbNTuple< T, NPOS > > createSimple( const std::string name = "",
const std::string description = "" );
static CdbCPtr< CdbNTuple< T, NPOS > > createSimple( const std::vector<std::string>& columns,
const std::string name = "",
const std::string description = "" );
};These small smart pointer objects can be used in the same way the regular C++ pointers are used with one important exception – there will be no need to delete them. The pointed objects will be automatically destroyed when the last smart pointer pointing onto the object will be destroyed. Here is an example:
typedef CdbCPtr< CdbNTuple< float, 10 > > MyNTupleOf10FloatsPtr; typedef CdbNTupleFactory< float, 10 > MyFactoryOfNTupleOf10Floats; void foo( MyNTupleOf10FloatsPtr ptr ) { cout << “The Current Number of Rows is: “ << ptr->rows( ) << endl; } void bar( ) { MyNTupleOf10FloatsPtr ntPtr = MyFactoryOfNTupleOf10Floats::createSimple( ); ntPtr->resize( 123 ); foo( ntPtr ); // Pass a copy of the smart pointer down to a method ... } // At this point the local value of “ntPtr” object will be destroyed, and if // it's the only pointer pointing onto an object produced by the factory then // the pointed object itself would be destroyed.
This example is also suggesting to use typedef-s when it's appropriate to get rid of complex and hard to read template specialization.
A biggest problem of the above shown standard factory class is that configuration parameters of a tuple to be created are passed to the factory twice: first time when specifying the template parameters of the factory itself, and the second time - when defining a smart pointer for the tuple. This makes the use of the factory unnecessarily heavy. Besides, a wrongful attempt to use a wrong factory of one type to create a tuple of a different type would cause unintelligent complains from a C++ compiler.
To overcome this problem and therefore to simplify the factory API the CdbTable package also provides a "front-end" version of the factory which can derive parameters of the tuple from its pointer. Here is simple example showing how this factory can be used:
#include "CdbTable/CdbNTupleFactoryFE.hh"
void bar( )
{
CdbCPtr< CdbNTuple< float, 10 > > ptr; // ptr points to 0
CdbNTupleFactoryFE::createSimple( ptr ); // after this line ptr would point onto a new object
...
}The "front-end" factory has the same methods as the standard one, it only adds an extra parameter to these methods - a smart pointer to be initialized:
// File: CdbTable/CdbNTupleFactoryFE.hh
class CdbNTupleFactoryFE {
template< class T, unsigned int NCOL >
static void createSimple( CdbCPtr< CdbNTuple< T, NPOS > >& ptr,
const std::string name = "",
const std::string description = "" );
template< class T, unsigned int NCOL >
static void createSimple( CdbCPtr< CdbNTuple< T, NPOS > >& ptr,
const std::vector<std::string>& columns,
const std::string name = "",
const std::string description = "" );
};
The N-Tuple will allow storing both data and metadata to describe the data. Metadata is represented by two strings called the name and the description. The initial values can be specified at the class's construction time or be set later as it's shown below:
// File: "CdbTable/CdbNTupleBase.hh"
class CdbNTupleBase {
protected:
CdbNTupleBase( ...,
const std::string& initialName = “”,
const std::string& initialDescription = “”,
... );
public:
// get methods
std::string name( ) const;
std::string description( ) const;
// set methods
void set_name( const std::string& newName );
void set_description( const std::string& newDescription );
};Note, that metadata information is handled through a very base class of the transient n-tuple - CdbNTupleBase instead of CdbNTuple. This is done to reduce the amount of code at the level of the template class.
Also remember that even though Standard C++ strings allow storing null symbols as a part of a string the CDB API will treat any first null symbol as the end of the string.
The data stored in the N-Tuple are logically (not necessarily physically - that's why it's not a 2-D array) organized into columns and rows. The number of columns is fixed by the corresponding template parameter. This number can be obtained through the n-tuple API using either a method from the CdbNTupleBase class:
// File: CdbTable/CdbNTupleBase.hh
class CdbNTupleBase {
unsigned int columns( );
};
or an enumerator in the CdbNTuple class:
// File: CdbTable/CdbNTuple.hh
template< class T, unsigned int NCOL >
class CdbNTuple ... {
enum { ncol = NCOL };
};The column numbers will be used when accessing or modifying data storing in the N-Tuple. In addition N-Tuple supports names for columns. These names along with columns' numbers can be used when accessing or modifying data. The names can be specified by a client either at a construction time or be changed later. If a client does not provide these names then N-Tuple constructor would generate default names of the following kind “#<column_number>”. For example:
#0 #1 #2 #3
Users are not allowed to use the '#' symbol in custom names. Any attempt to assign these names will be ignored by the constructor (it will use default names instead) or be turned down by the corresponding “set” method. Here is the corresponding API:
// File: CdbTable/CdbNTupleBase.hh
class CdbNTupleBase {
protected:
CdbNTupleBase( ...,
const std::vector<std::string>& columnNames,
... );
public:
// get methods
void column_names( std::vector<std::string>& vectorOfNames ) const;
CdbStatus column_number( std::string& aName
unsigned int itsNumber ) const;
CdbStatus column_name( unsigned int aNumber,
const std::string& itsName ) const;
// set methods
CdbStatus set_column_name( const std::string& newName,
unsigned int toNumber );
public:
// helper methods
static bool is_system_name( const std::string& aName );
static std::string default_column_name( unsigned int aNumber );
};The last two methods in the above shown interface can be used to verify if a name is a "sysetem" generated default name ( CdbNtupleBase::is_system_name() method) and to produce the default name for specified column number.
The number of rows is implicitly modified by a user when adding new data into an object, or this (modification) can also be done explicitly by resizing the object. Here is a relevant part of the class's interface:
// File: CdbTable/CdbNTuple.hh
template< class T, unsigned int NCOL >
class CdbNTuple ... {
virtual unsigned int rows( ) const = 0;
virtual CdbStatus set_rows( unsigned int newSize ) = 0;
virtual CdbStatus set_rows( unsigned int newSize,
const T& prototypeObject ) = 0;
};
Note, that when resizing N-Tuple the following rules would apply to its data:
If the new size is greater than the old one then new elements will be initialized using default constructor of the elements type unless a prototype object is given. In later case the prototype object would be copied in place of new elements.
If the new size is less than the older one then the removed elements will be destroyed using default destructor of the elements type.
New data can be added into N-Tuple as whole rows. The rows can either be appended by the end of the N-Tuple or be inserted into a specified position of the object. In both cases user has to supply a row as a vector of elements whose type should match the one of the N-Tuple. The total number of rows will be incremented by 1 as a result of these operations.
The length of the vector does not have to be the same as the number of columns an N-Tuple is defined with. If the number of elements in the vector is less than the number of columns in the N-Tuple then the rest of the new row put into the N-Tuple will be filled by elements created using the default constructor of the elements type. If the vector is longer then the rest of the input vector will be simply ignored.
Here is the relevant part of the N-Tuple interface:
// File: CdbTable/CdbNTuple.hh
template< class T, unsigned int NCOL >
class CdbNTuple ... {
virtual CdbStatus append_row( const std::vector<T>& newRow ) = 0;
virtual CdbStatus insert_row( const std::vector<T>& newRow,
unsigned int atRowNumber ) = 0;
};The API of N-Tuple will allow replacing rows at any position:
// File: CdbTable/CdbNTuple.hh
template< class T, unsigned int NCOL >
class CdbNTuple ... {
virtual CdbStatus replace_row( const std::vector<T>& newRow,
unsigned int atRowNumber ) = 0;
};As it's been mentioned earlier (see Overview), the order of rows in a tuple (in its transient and persistent forms) won't change over the lifetime of the tuple unless it's explicitly requested by a user. However certain applications may benefit from knowing that rows in a tuple are in some (desirable for that application) order.
WARNING: Use this feature at your discretion and only if you can guarantee that your application can handle and use the order consistently. Otherwise you may get not what you expect. Also keep in mind that the "order" is not a replacement for a built-in consistent indexing, which may appear in the future as a possible extension to the present API.
The following API will facilitate this operation:
// File: CdbTable/CdbNTuple.hh
template< class T, unsigned int NCOL >
class CdbNTuple ... {
virtual CdbStatus sort( const CdbNTupleIsLessComparator<T,NCOL>* theComparatorPtr ) = 0;
};The method will reorder elements according to a user specified "isLess" relationship between two rows. What's is accepted by the above shown method is a pointer onto the template class, which is meant to be implemented by a real implementation of the comparator. The CdbTable package has a simple implementation of the comparator interface, which may be found sufficient to some applications:
// File: CdbTable/CdbNTupleIsLessComparatorDefaultImpl.hh
template< class T, unsigned int NCOL >
class CdbNTupleIsLessComparatorDefaultImpl : public CdbNTupleIsLessComparator<T,NCOL> {
virtual bool isLess( const std::vector<T>& theLeftRow,
const std::vector<T>& theRightRow ) const;
};This implementation would compare elements of two rows in the "left-to-right" direction starting from a column with the number 0 and using the "operator<", which is supposed to be defined for a type of n-tuple elements. The operation is done much like comparing two bit strings of the same length, in which the left-most column with number 0 is treated as the most significant bit.
The onlyt way to find a row in N-Tuple is to use its number:
// File: CdbTable/CdbNTuple.hh
template< class T, unsigned int NCOL >
class CdbNTuple ... {
virtual CdbStatus get_row( std::vector<T>& theRow,
unsigned int atRowNumber ) const = 0;
};
If an application is only interested in a single element at specified row and column than the following method can be used:
// File: CdbTable/CdbNTuple.hh
template< class T, unsigned int NCOL >
class CdbNTuple ... {
virtual CdbStatus get_element( T& theValue,
unsigned int atColumnNumber,
unsigned int atRowNumber ) const = 0;
virtual CdbStatus get_element( T& theValue,
const std::string& atColumnName,
unsigned int atRowNumber ) const = 0;
};
Certain applications may benefit from extracting the whole column from N-Tuple:
// File: CdbTable/CdbNTuple.hh
template< class T, unsigned int NCOL >
class CdbNTuple ... {
virtual CdbStatus get_column( std::vector<T>& theColumn,
unsigned int atColumnNumber ) const = 0;
virtual CdbStatus get_column( std::vector<T>& theColumn,
const std::string& atColumnName ) const = 0;
};
Other, more efficient ways of finding information in N-Tuple may be added in the future. One possible way would be to build indices for columns to speed up associative data mining operations, like finding a row in which (values of) elements satisfy certain criteria.
The n-tuple API also allows to modify values of elements in an existing n-tuple without changing the number of rows:
// File: CdbTable/CdbNTuple.hh
template< class T, unsigned int NCOL >
class CdbNTuple ... {
virtual CdbStatus set_element( const T& theValue,
unsigned int atColumnNumber,
unsigned int atRowNumber ) = 0;
virtual CdbStatus set_element( const T& theValue,
const std::string& atColumnName,
unsigned int atRowNumber ) = 0;
};
On its input, this algorithm expects two tuples of the same interface type, and it will produce a new "output" tuple with rows copied from both input tuples. The metadata of the "output" tuple will be copied from the "prototype" one. The relative order of rows won't change. All rows from the "prototype" tuple will be copied first. Then all rows found in the "extra" tuple will be appended by the end of the "ouput" tuple.
Here is a formal definition of the algorithm:
// File: CdbTable/CdbNTupleAlgorithms.hh
template < class NTUPLE >
CdbStatus
CdbNTupleMerge( CdbCPtr<NTUPLE>& theOutputPtr,
const CdbCPtr<NTUPLE>& thePrototypePtr,
const CdbCPtr<NTUPLE>& theExtraPtr );
Where NTUPLE is a model of the CdbNTuple interface. It means that the algorithm will work with any class which has a similar interface or which is derived from the CdbNTuple interface.
Since the algorithm deals with counted smart pointers then the following memory consideration should be taken into account when using the algorithm:
The "output" pointer will always be initialized with a newly created tuple. Thus any initial tuple previously pointed by the "output" pointer won't be affected by the algorithm.
The actual (implementation) type of the "output" tuple will be the same as the one of the "protorype" tuple.
The algorithm won't attempt to check if both "prototype" and "extra" pointers point onto the same tuple. If this is going to be the case then the resulting tuple will have a double set of rows.
The "output" pointer may not be a valid (not pointing to any existing tuple) pointer when calling this algorithm.
Both "prototype" and "extra" pointers must be valid pointers.
If the algorithm will fail for some reason and return a status value which would differ from CdbStatus::Success then the "output" pointer won't change.
This algorithm will append all rows from the "input" tuple by the "output" one. The metadata of the "output" tuple won't be affected. The relative order of rows won't change. All the original rows of the "output" tuple won't be affected. All rows found in the "input" tuple will be appended by the end of the "ouput" tuple.
Here is a formal definition of the algorithm:
// File: CdbTable/CdbNTupleAlgorithms.hh
template < class NTUPLE >
CdbStatus
CdbNTupleAppend( CdbCPtr<NTUPLE>& theOutputPtr,
const CdbCPtr<NTUPLE>& theInputPtr );
Where NTUPLE is a model of the CdbNTuple interface. It means that the algorithm will work with any class which has a similar interface or which is derived from the CdbNTuple interface.
Since the algorithm deals with counted smart pointers then the following memory consideration should be taken into account when using the algorithm:
Upon the completion of the algorithm, the "output" pointer will always be pointing onto the same tuple. And that tuple will be modified.
If both "output" and "input" pointers point onto the same tuple then the resulting tuple will contain a double set of rows after the completion of the algorithm.
Both "output" and "input" pointers must be valid pointers.
WARNING: If the algorithm will fail for some reason then the "output" tuple may stay in the unpredicted state!!!
This algorithm will produce a new "output" tuple which will be an exact copy of the "input" one.
Here is a formal definition of the algorithm:
// File: CdbTable/CdbNTupleAlgorithms.hh
template < class NTUPLE >
CdbStatus
CdbNTupleCopy( CdbCPtr<NTUPLE>& theOutputPtr,
const CdbCPtr<NTUPLE>& theInputPtr );
Where NTUPLE is a model of the CdbNTuple interface. It means that the algorithm will work with any class which has a similar interface or which is derived from the CdbNTuple interface.
Since the algorithm deals with counted smart pointers then the following memory consideration should be taken into account when using the algorithm:
The "output" pointer will always be initialized with a newly created tuple. Thus any initial tuple pointed by the "output" pointer won't be affected by the algorithm. The actual (implementation) type of the "output" tuple will be the same as the one of the "input" tuple.
The "output" pointer may not be a valid (not pointing to any tuple) pointer when calling this algorithm.
The "input" pointer must be a valid pointer.
If the algorithm will fail for some reason then the "output" pointer won't change.
The algorithm expects both of its input (known to the algorithm as "left" and "right") tuples to have the same type of elements. The number of columns in the input tuples can be any. This algorithm will produce a new "output" tuple, with a sum of columns from both input tuples. The columns from the "left" tuple will be copied first. The relative order of columns will be preserved in the "output" tuple. The data from rows will be copied accordingly. The metadata of the "output" tuple will be copied from the "left" one.
Here is a formal definition of the algorithm:
// File: CdbTable/CdbNTupleAlgorithms.hh
template < class T,
unsigned int NCOL_LEFT,
unsigned int NCOL_RIGHT >
CdbStatus
CdbNTupleGlue( CdbCPtr< CdbNTuple< T, NCOL_LEFT + NCOL_RIGHT > >& theOutputPtr,
const CdbCPtr< CdbNTuple< T, NCOL_LEFT > >& theLeftPtr,
const CdbCPtr< CdbNTuple< T, NCOL_RIGHT > >& theRightPtr,
const bool allowRenamingOfColumnsFlag = true );
Where T, NCOL_LEFT and NCOL_RIGHT are parameters of tuples to be glued. Note, that unlike other algorithms described in the Algorithms section, this particular one would only deal with tuples of the CdbNTuple class.
The algorithm will also make a best attempt to preserve the original names of columns. An optional parameter "allowRenamingOfColumnsFlag" of the algorithm would control how the algorithm would resolve potential naming conflicts. Here are the rules:
if there is no conflict then all names from both tuples will be copied into the corresponding positions at the "output" tuple
in case of a conflict and if the flag is set to "false" then the algorithm will fail and return the "CdbStatus::ConflictOfParameters" error status.
in case of a conflict and if the flag is set to "true" then the algorithm will preserve original names from the "left" tuple, it will also preserve all non-conflicting names from the "right" tuple, and it will replace all conflicting names from the "right" tuple with default names (see the description of the CdbNTupleBase class for details).
The rules followed by the algorithm when copying data from rows:
the order of rows won't change
each two rows with the same number from both tuples will be "glued" into the corresponding row of the "output" tuple. The relative order of elements in glued rows will be preserved in the resulting row.
missing elements of a shortest (in terms of the number of rows) tuple will be replaced using the default constructor of the elements type
the number of rows in the "output" tuple will be equal to the length (number of rows) of the longest of the input tuples
Since the algorithm deals with counted smart pointers then the following memory consideration should be taken into account when using the algorithm:
the "output" pointer will always be initialized with a newly created tuple. Thus any original tuple pointed by the "output" pointer won't be affected by the algorithm. The actual type of the "output" tuple is implementation (of the algorithm) specific.
the algorithm won't attempt to check if both "left" and "right" pointers point onto the same tuple. If this is going to be the case then the resulting tuple will have a double set of columns
the "output" pointer may not be a valid (not pointing to any tuple) pointer when calling this algorithm.
both "left" and "right" pointers must be valid pointers
if the algorithm will fail for some reason then the "output" pointer won't change
The CdbTable package also provides a facility for printing transient tuples into an output stream specified by a user. If a type of elements of an N-Tuple to be printed has the "operator<<" defined and visible in the scope of the client's code then the following simplified specialized version of the printing facility can be used:
// File: CdbTable/CdbNTuplePrint.hh
template< >
class CdbNTuplePrint< CdbNTuplePrintConverter_UseDefault > {
template < class NTUPLE >
static CdbStatus print( const CdbCPtr<NTUPLE>& thePtr,
const unsigned int theColumnWidth,
const char* theIndent,
const CdbNTuplePrintTypes::FrameType theFrameType,
const CdbNTuplePrintTypes::FrameBorders theFrameBorders );
template < class NTUPLE >
static CdbStatus print( ostream& theOutputStream,
const CdbCPtr<NTUPLE>& thePtr,
const unsigned int theColumnWidth,
const char* theIndent,
const CdbNTuplePrintTypes::FrameType theFrameType,
const CdbNTuplePrintTypes::FrameBorders theFrameBorders );
};The first method will print a tuple onto the Standard output stream ("cout" or "std::cout"). The second one lets a user to specify any other output stream. From the above shown interface it's also seen that the facility provides some (actually - limited) level control over the appearance of tuples. With this interface a user may control the following parameters:
the theColumnWidth is the width of columns. That will be the actual number of characters between any possible borders or column separators. This number applies to all columns of a tuple.
the theIndent is the padding string inserted before each line of a tuple. A non-zero pointer must be used a value of this parameter.
the theFrameType is a choice for a type of a tuple's frame (see its explanation below)
the theFrameBorders is a choice for a type a tuple's borders (see its explanation below)
The available frame type and the frame border parameters are defined by the following class:
// File: CdbTable/CdbNTuplePrintTypes.hh
struct CdbNTuplePrintTypes {
typedef enum { OpenFrame, ClosedFrame } FrameType;
typedef enum { DoubleHashes, SingleXes } FrameBorders;
};More options can be added in the future.
To simplify the printing when a user does not need a fine grain control over the appearance of tuples there are two simple methods with fewer parameters. These methods assume default values for the most of appearance parameters:
// File: CdbTable/CdbNTuplePrint.hh
template< >
class CdbNTuplePrint< CdbNTuplePrintConverter_UseDefault > {
template < class NTUPLE >
static CdbStatus print1( const CdbCPtr<NTUPLE>& thePtr );
template < class NTUPLE >
static CdbStatus print1( ostream& theOutputStream,
const CdbCPtr<NTUPLE>& thePtr );
template < class NTUPLE >
static CdbStatus print2( const CdbCPtr<NTUPLE>& thePtr );
template < class NTUPLE >
static CdbStatus print2( ostream& theOutputStream,
const CdbCPtr<NTUPLE>& thePtr );
};Here are a trivial example illustrating how the simplest form of the printing facility can be used:
CdbCPtr< CdbNTuple< float, 4 > > tuplePtr = ...;
if( CdbStatus::Success != CdbNTuplePrint<>::print1( tuplePtr )) {
cerr << "failed to print the tuple" << endl;
::exit( 1 );
}There are two cases when the printing facility in the previously described form may not work. It's when:
a type of tuple's elements does not have the streaming operator defined, or
an existing streaming operator produces unwanted output for the elements type
To resolve these cases the following extended form of the printing facility can be used:
// File: CdbTable/CdbNTuplePrint.hh
template< class CONVERTER >
class CdbNTuplePrint {
// The rest of the class's interface is the same as in case of default
// conversion policy.
...
};Note, that unlike the earlier described default version of the facility, this particular one allows to pass a special CONVERTER policy to print elements if tuples onto an output stream. The policy class is expected to exhibit the following public interface:
// File: MyPackage/MyCustomConverter.hh
struct MyCustomConverter {
static void convertAndPrint( ostream& theOutputStream,
unsigned int theWidth,
const MyType& theValue );
};Where:
The actual class (shown as MyCustomConverter ) may have any name. It could be also a template.
The name of the printing method (convertAndPrint) as well as its signature are predefined, they must look exactly as shown above (except perhaps a reference to a value of an element, which can be passed by value instead) . And it must be a static method.
The type of elements (shown as MyType) must be an exact type match for the corresponding type of elements of an tuple to be printed
The result is expected to be printed into one line only(!) and take exactly(!) the specified (by the theWidth parameter) number of characters. Otherwise the output will not be well formatted.
The previously described simplified printing forms provided by the print1 and print2 methods are also available.
Here is an example, assuming that there is a class called Complex, and this class has the corresponding conversion policy called ComplexPolicy:
#include "MyPackage/Complex.hh"
#include "MyPackage/ComplexConverter.hh"
CdbCPtr< CdbNTuple< Complex, 4 > > tuplePtr = ...;
if( CdbStatus::Success != CdbNTuplePrint< ComplexConverter >::print1( tuplePtr )) {
cerr << "failed to print the tuple" << endl;
::exit( 1 );
}
Sorry, this feature is not currently implemented.
As it has already been stated in the Overview section, each persistent technology specific implementation of CDB API is supposed to provide:
a persistent "back-end" for N-Nuples
the corresponding bi-directional conversion mechanism/facility (converters) to convert between transient and persistent forms of N-Tuples
Normally a persistent back-end implementation is supposed to be hidden behind the corresponding conversion facility to avoid any direct dependency of clients' code onto a particular implementation of the back-end. According to the current design "philosophy" of converters, they should use parameters (a type of elements and the width) of a tuple to do proper type matching between transient and persistent forms of the tuple. For that reason, the rest of this section won't discuss neither interfaces nor implementations of persistent back-end-s.
At a time when this document was being written there was just one persistent implementation of N-Tuples described below.
The CdbBdbTable package provides Objectivity/DB based implementation of N-Tuples. This persistent technology is also known in CDB API as "Bdb". In that technology all persistent objects stored in CDB must derive (directly or indirectly) from the following base class:
BdbCond/BdbObject.hh
The CdbBdbTable package contains all infrastructure needed to store tuples in CDB in a form of specially designed persistent classes, which won't be discussed here. See the contents of the package if you're interested in specific implementation details. This implementation also imposes certain restrictions on both the configuration and the size of tuples to be stored in CDB.
At the level of the public API available to users, the conversion facility has an asymmetric interface. This is caused by the corresponding asymmetry in the base CDB API used to store and retrieve persistent objects at/from CDB. Both conversion paths are described in subsequent sections below.
This form of the conversion is implemented by mean of the so called "factory" (a specific term used in CDB API) class, which is specialized to deal with N-Tuples. A pointer onto a factory object is passed down (along with a validity interval of an object to be stored) to CDB API when it's time to store a tuple in CDB.
Here is the public interface of the factory:
// File: CdbBdbTable/CdbBdbNTupleFactory.hh
template< class T, unsigned int NCOL >
class CdbBdbNTupleFactory : public CdbBdbObjectFactory {
CdbBdbNTupleFactory( const CdbCPtr< CdbNTuple<T,NCOL> >& thePtr );
};
As it's seen from this interface, a user has to create a factory object by passing it a smart pointer onto a tuple. Here is a simple example illustrating a use of the factory to store a tuple:
#include "CdbTable/CdbNTuple.hh"
#include "CdbBdbTable/CdbBdbNTupleFactory.hh"
#include "CdbBase/CdbCondition.hh"
// Have the transient tuple created
CdbCPtr< CdbNTuple< float, 4 > > tuplePtr = ...;
// Create a factory object
CdbBdbNTupleFactory< float, 4 > factory( tuplePtr );
// Find the condition where the object will be stored
CdbConditionPtr cPtr;
if( CdbStatus::Success != CdbCondition::instance( cPtr, "/emc/EmcFooClassP" )) {
cerr << "failed to find the condition" << endl;
::exit( 1 );
}
// Store the tuple in CDB. This step will also trigger internal conversion
// inside the factory object.
if( CdbStatus::Success != cPtr->storeObject( &factory, ... )) {
cerr << "failed to convert and store the tuple" << endl;
::exit( 1 );
}
An opposite conversion operation is provided via the following utility class:
// File: CdbBdbTable/CdbBdbNTupleP2T.hh
template< >
class CdbBdbNTupleP2T< CdbBdbNTupleConversionRules_UseDefault > {
template< class T, unsigned int NCOL >
static CdbStatus convert( CdbCPtr< CdbNTuple< T, NCOL > >& thePtr,
const ooRef(BdbObject)& thePersRef );
template < class T, unsigned int NCOL >
static CdbStatus convert( CdbCPtr< CdbNTuple< T, NCOL > >& thePtr,
const CdbObjectPtr& theObjectPtr )
};
There are two methods in this class, one accepting a persistent reference onto a very base class required by the "Bdb" persistent technology, and the second one - a smart pointer onto a transient metadata object representing a found persistent object. Here is a simple example illustrating how to use the conversion facility:
#include "CdbBase/CdbCondition.hh"
#include "CdbTable/CdbNTuple.hh"
#include "CdbTable/CdbNTuplePrint.hh"
#include "CdbBdbTable/CdbBdbNTupleP2T.hh"
// Find the condition where the object is supposed to be located
CdbConditionPtr cPtr;
if( CdbStatus::Success != CdbCondition::instance( cPtr, "/emc/EmcFooClassP" )) {
cerr << "failed to find the condition" << endl;
::exit( 1 );
}
// Find a metadata object describing persistent tuple in CDB
CdbObjectPtr oPtr;
if( CdbStatus::Success != cPtr->findObject( oPtr, ... )) {
cerr << "failed to find the persistent tuple" << endl;
::exit( 1 );
}
// Convert the found persistent tuple into its transient form, then print
// its contents.
CdbCPtr< CdbNTuple< float, 4 > > tuplePtr;
if( CdbStatus::Success != CdbBdbNTupleP2T<>::convert( tuplePtr, oPtr )) {
cerr << "failed to convert the found persistent tuple into its transient form" << endl;
::exit( 1 );
}
CdbNTuplePrint<>::print1( tuplePtr );
Some Framework applications may benefit from fetching transient N-Tuples directly via the following proxy:
// File: CdbBdbTable/CdbBdbNTupleProxy.hh
template< class T, unsigned int NCOL >
class CdbBdbNTupleProxy : public CdbBdbProxyBase< CdbCPtr< CdbNTuple< T, NCOL > > > {
CdbBdbNTupleProxy( const char* theConditionPathName,
BdbCondDefStrategy* theStrategy = 0 );
};
This proxy, when used would return (read the following carefully!!!) "pointers onto counted smart pointers". That's how the current proxies in BaBar Framework work. Here is an example illustrating how to create the proxy in a simulated Framework environment and use it:
#include "CdbTable/CdbNTuple.hh"
#include "CdbTable/CdbNTuplePrint.hh"
#include "CdbBdbTable/CdbBdbNTupleProxy.hh"
#include "ProxyDict/IfdSimpleProxyDict.hh"
#include "ProxyDict/Ifd.hh"
#include "ProxyDict/IfdStrKey.hh"
#include "AbsEnv/AbsEnv.hh"
#include "GenEnv/GenEnv.hh"
// Simulate global environment as if we were processing an event whose
// event time was equal to the current wall clock time of our running test.
gblEnv = new AbsEnv( );
gblEnv->setGen( new GenEnv( ));
BdbTime validityTime( BdbTime::now( ));
EidCondKeyTriplet primaryTriplet ( 0, 0, validityTime );
EidCondKeyTriplet backgroundTriplet( 0, 0, validityTime );
gblEnv->getGen( )->setConditionsKeys( &theValidityTime,
&primaryTriplet,
&backgroundTriplet );
gblPEnv = new IfdSimpleProxyDict;
// Register a simple proxy
const char* theConditionName = "/emc/EmcFooClassP";
bool result =
Ifd< CdbCPtr< CdbNTuple<long,3> > >::put( gblPEnv,
new CdbBdbNTupleProxy< long, 3 >( theConditionName ),
IfdStrKey( theConditionName ));
assert( result );
// Try to use the proxy
const CdbCPtr< CdbNTuple<long,3> >* tuplePtrPtr =
Ifd< CdbCPtr< CdbNTuple<long,3> > >::get( gblPEnv,
IfdStrKey( theConditionName ));
if( 0 == tuplePtrPtr ) {
cout << "Ifd<T>::get() failed and returned 0 pointer onto a transient object." << endl;
::exit( 1 );
}
// Turn a "pointer to smart pointer" into a usual "smart pointer". That would simplify
// our communication with the rest of CDB API.
CdbCPtr< CdbNTuple<long,3> > tuplePtr = *tuplePtrPtr;
// Print the tuple
CdbNTuplePrint<>::print1( tuplePtr );
More detailed example can be found in the "cmd_proxy" function of the following test application:
CdbBdbTable/CdbBdbNTupleTest.cc
Look at the implementation of the above explained proxy class for an example on how to write your own proxy.
Sorry, no tools are currently implemented.