Igor Gaponenko
May 3, 2005
1 A justification for translators
1.1 Persistent technology dependencies in the current Framework applications
1.2.1 Subsets of the full production CDB
1.2.2 Migrating CDB to the ROOT I/O
2 Technology-neutral transaction management
2.2 Migrating existing clients
2.2.1 Transaction management in proxies
2.2.2 Transaction management in loaders and standalone applications
3.1 Technology specific translators
3.1.1 Translators for the "Roo" technology
3.1.1.1 Trivial translation: the CdbRooObjectTranslatorR2T class
3.1.2 Translators for the "Bdb" technology
3.1.2.1 Trivial translation: the CdbBdbObjectTranslatorP2T class
4.1 Environment proxy: the CdbEnvProxy class
4.2 Proxies with a non-trivial communication with its translator(-s)
4.4 Many-to-one proxies with a non-trivial communication with its translator(-s)
5 Sample code snippets used throughout the document
An updated version of the original document published. All changes are related to renaming "converters" with "translators" due to a heavily overused word of "converters" in the present CDB API. The relevant CDB API tags are those shown below or newer:
| CdbBase | V01-07-00 |
| CdbBdb | V01-05-00 |
| CdbRoo | V00-01-00 |
| CdbRooTests | V00-00-04 |
The original version of the document titled as "Payload Converters and Technology-Neutral Proxies in CDB" was published.
For well known historical reasons, the large amount of CDB client code implemented in present Framework-based applications still has either direct or indirect dependency on the Objectivity/DB API [OBJECTIVITY/C++ API]. In the past, when Objectivity/DB was essentially the only persistent technology adopted by the BaBar Experiment, this dependency wasn't considered a big problem (even though some might say that this had to be predicted long time ago or, that a way the clients' code is designed and implemented might not be well justified from an architectural point of view). However, in a view of recent dramatic changes in a landscape of the BaBar software [CM2] as well as due to emerging requirements to CDB, the dependency has become a serious limiting factor for improving Framework applications and making them ready for other persistent technologies.
This document is discussing various aspects of the above stated problem and proposing a solution which is considered the most optimal one in the current context.
ATTENTION: Readers should know that all relevant CDB API support classes and interfaces mentioned below have already been implemented as described in the document. Though, the design and the implementation of this part of the CDB API is yet to be finalized. They may change in a course of gaining a better understanding of how to proceed with this project.
Also keep in mind that he document will only discuss read-only clients of the CDB implemented using the BaBar Framework.
One of the cornerstone concepts in the foundation of CDB is a separation of metadata from payload:
DEFINITIONS: We define the payload as (1) a user defined schema of the database, and (2) as a user applications' generated contents of the database. The metadata represents an additional information describing the payload. It includes so called intervals of validity, revisions, conditions, views, origins, etc. The metadata implemented in a form of persistent data structures provides a sort of a framework where the user defined/generated payload gets stored in.
This separation has also been reflected in the current design of CDB API. Most of the API in its part dealing with the metadata is fully technology neutral (at least in terms of direct dependencies). Meanwhile, when it comes to storing or getting the user defined payload then there is a direct and controllable (through specially designed "technological" CDB API extensions) breach to a persistent technology specific API. This decision has been made quite consciously in order to cope with a variety of user defined classes (as well as over 1 million stored objects) inherited by CDB from the previous generation Condition/DB. This dependency is unavoidable - somehow the CDB API has to provide a way to deal with user defined persistent classes (of a whatever persistent technology being used) and payload objects.
Primary sources of the persistent technology specific dependency for Objectivity/DB:
transactions management
persistent classes (the schema) and objects
Direct clients affected by the dependency are:
proxies: in a context of the CDB it's a software component performing the following operations:
starting a technology specific transaction when it's needed to interact with CDB API,
using technology neutral CDB API to locate objects (in a context of relevant conditions) whose validity interval covers a timestamp of a currently processed event,
translating the found objects from a persistent into a transient forms, and
caching translated objects in their transient form along with their intervals of validity in a local cache. All proxies in CDB derive from a special persistent technology specific base class.
Therefore all proxies regardless of what they're doing exhibit the Objectivity/DB API both in their implementation and in their interface.
environment modules: this is a part of BaBar Framework applications responsible for instantiating proxies and putting them into Proxy Dictionary. Usually modules
A schematic representation of the resulting architecture is shown at a figure below.
[FIGURE ]
During the past year we've seen two major developments directly affecting the use CDB and potential trends in its evolution (as a software and as a persistent store). A significant "resistance" of existing clients' code when pushing forward these changes has triggered an analysis of the current model of CDB proxies.
In an attempt to reduce a size of a database installation for a specific use of the CDB, the so called "beta-mini" subset [HOW-TO-SUBSET] has been successfully developed and deployed. The amount of data in the subset is nearly 20 time less than 32 GB found in the today's full production CDB. The subset includes enough conditions (a specific CDB term for: detector alignments, constants, calibrations, etc.) to run a properly configured BetaMiniApp analysis application for a full range of events acquired or modeled in a context of the Experiment. A desire to eliminate difficulties imposed by managing and importing the full size CDB were initial motivations behind this development. Another not-to-be-ignored driving force was a desire expressed by many physicists to be able to run their analysis jobs on notebooks. Clearly, the 32 GB of databases size is "too much" even for hard drives of contemporary notebook computers (which typically have from 40 to 80 GB). Besides, the database keeps growing as new calibrations (both in ONLINE and OFFLINE) are put into it.
A natural solution, on a way of reducing the size of the subset, was to "stripe down" its contents to leave only those conditions which are really needed to run the BetaMiniApp. The application itself also had to be configured not to use conditions beyond the subset. This goal had required to conduct a special investigation to figure out which conditions are relevant in this context and which aren't. In a course of this development it was "(re-)discovered" that we have no fine-grain control over which conditions are really needed for certain applications. That's because there is a thick layer of code in a form of so called "proxies" and "environment modules" laying in between the CDB API and actual recipients/users of the data fetched from CDB.
Another issue which is relevant in the context of the current document is related to the persistent schema. Even though the behavior of the BetaMiniApp has been "striped down" (in terms of used conditions), its code hasn't practically changed. It still depends on code (and persistent classes) of all conditions. This dependency alone is not a big problem on its own, what can be more serious is a questionable maintainability of this code when another persistent technology will be deployed in production.
In addition, we seem to face a strong push toward producing a subset of CDB to run Simulation Production jobs. With a lack of well defined fine-grain control over which conditions and which persistent classes are used by an application it would be hard to move forward of utilizing benefits of the "subsets".
A far more serious limitation of the current architecture of the (CDB) client code was realized when we started moving CDB away from Objectivity/DB to ROOT I/O. A major complication of this migration is that both persistent technologies will co-exist in production applications for at least 1 year (a personal estimate of the author of the document). That's because (due to known restrictions of the ROOT I/O, which isn't really a "database" technology) the migrated CDB will initially be deployed in a read-only form. At the mean time, all critical production CDB installations, where new data will be put into, will still be using CDB in the Objectivity/DB format. As it's been explained in the "dependency" section 1.1 performing the migration in the client code in the "straightforward" way is going to create enormous amount of duplicated code. So, for example, for each condition there will be two proxies, two versions of the corresponding module instantiating the proxy and two versions of a sequence making sure that related module is linked and engaged by an application. Maintaining this code even in a mid-term is going to be a non-trivial and definitively a error prone task.
A general idea is to make proxies technology neutral by generalizing or eliminating the following persistent technology specific operations:
starting a technology specific transaction when it's needed to interact with CDB API
translating the found objects from a persistent into a transient forms
For the transaction management, a problem has been solved with a new CdbTransaction class recently added to the core CDB API. See section 2 of the current document for more details.
For the "persistent-to-transient" translation function of proxies, this function is moved out of proxies into a separate translation facility (including a dictionary of translators) managed via extended core CDB API. The API extension is designed in a way allowing to invoke the (technology-specific) user defined translators by technology-neutral interfaces. It's also used to register the translators in the facility. Proxies are still responsible for triggering the translation in order to produce a valid transient product. In order to get to the right translator, proxies communicate with the facility by mean of two keys:
a type name of an actual persistent object to be translated into a transient form. This information is encoded into a technology-neutral metadata object pointed via the CdbObjectPtr. This pointer is obtained from CDB API as a result of the database lookup operations. This object will be used by the corresponding translator matching a type of the persistent object associated with the metadata object.
a type identifier of a transient object to be produced by the translator. A translation from a transient type into its identifier is performed at a compilation time.
Find more details on translators in the section 3 of the document.
The proposed architecture of new proxies and translators is shown on the following picture:
[FIGURE]
This class acts as a front-end to the transaction management performed in a context of either the current or explicitly specified CDB API implementation. The transaction services provided by this class are based on the resource-acquisition-is-initialization paradigm. The class will ensure a proper transaction upon the class's instantiation (when any of its public constructors is called). A new transaction may be started if none existed before, and it will committed when the class's destructor is being executed in case if the transaction was started by the constructor. The constructor has an optional parameter allowing to specify a desired mode of the transaction. By default it will be the read-only transaction. In addition to the regular start/commit operations, the transaction management class can also be used to invoke the commit-and-hold operation (see the corresponding method in the example below), which can be called to flush results (modified/created persistent object) out of a process's cache (an actual semantic of this operation will depend on an underlying persistent technology and CDB API implementation).
Here is an example of the class's use:
#include "CdbBase/CdbTransaction.hh"... |
The basic model of the class is very similar to the one found in the Objectivity/DB specific implementations of the CDB API:
CdbBdb/CdbBdbTransaction.hh
The main advantage of the new class is that it doesn't expose any persistent technology-specific API-s at a compilation time. That extra flexibility comes comes with a price - now a developer of the proxy should be concerned about a context in which the class is going to be used. By default (it's exactly how it's shown in the above shown example) the transaction will be started n a context of the current CDB API implementation. This implementation is obtained with the following simple operation:
#include "CdbBase/Cdb.hh"...// Get a smart pointer onto the top-level CDB API object. This object// opens a path to the rest of services provided by this implementation// in a technology-neutral way.CdbPtr topLevelApiPtr = Cdb::instance( );.. |
In order to cope with multiple CDB API implementations the CdbTransaction class has two special forms of its constructor. They're both shown on the excerpt from the class's interface (see a header file of the class for more details):
// File: CdbBase/CdbTransaction.hh |
Here is another example illustrating a use of the transaction management class in a specific context:
#include "CdbBase/CdbTransaction.hh"... |
It's also obvious that the above described interface of the CdbTransaction class will also allow managing two or more (one transaction per context) independent transactions simultaneously.
For the most of CDB clients, including proxies, the migration is really trivial.
A way proxies are used in Framework application doesn't assume a presence of two (or more) persistent technologies at a time. An application is normally built with one or another technology, and there is always a concept of the default implementation of the CDB API for that application. This is how all existing proxies are written. This observation simplifies the migration of proxies (and similar applications, which do not want/need to deal with a specific persistent technology) to the following simple rules:
replace all include files from CdbBdb/CdbBdbTransaction.hh with CdbBase/CdbTransaction.hh
replace all instances of the CdbBdbTransaction with CdbTransaction
replace all instances of the BdbcRead and BdbcUpdate constants with CdbTransaction::Read and CdbTransaction::Update respectively
rerun the make-linkfiles script against the package to update the link dependency file
If, for some reason, the context in which a transaction is supposed to be used won't match the one which is actually being used then the problem would be seen as a missing (or improper) transaction, followed by self-explaining complains from the CDB API and (quite possible) an application's crash. Usually this situation has never been considered as a dangerous (critical to a loss of data) one when it comes to reading conditions from the database. Therefore no special reinforcement (at the level of CDB API implementations) beyond what already exists is needed.
The migration rules for this class of applications are in general the same ones as for the Framework proxies. The only obvious exception would be a case of dual-technology applications. Here are possible candidates:
applications meant to exchange data between two implementations of the CDB API. These applications would require two simultaneous transactions be started in the corresponding contexts. Examples: data converters from Objectivity/DB based CDB installations into the ROOT ones.
critical (updating CDB) applications for which it's not possible to guarantee a stable "default" implementation of the CDB API (like in case of proxies). Examples: ONLINE applications updating calibrations in the database.
Should this be considered a problem it's suggested to use one of the additional constructors of the CdbTransaction class to tell it which CDB API implementation is meant to be used. The corresponding example of the client's code can be found at the sectin 2.1.
Translators are special classes extending CDB API and serving as placeholders for the actual code performing the persistent-to-transient transformation for technology-specific persistent objects stored in CDB. Translators are supposed to be registered with CDB API implementations during an application configuration stage (which is inevitably technology specific) and are used in the rest of the application, which now can be made technology-neutral.
The present design of translators is based on the following assumptions/requirements:
each translator is associated with a unique combination of a persistent (input) and a transient (output product) types
translators are registered with CDB API implementations by their technology-neutral interfaces
various CDB API implementations may have independent sets of translators
there should be just one translator with a unique combination of a persistent and transient types per CDB API implementation
the only way to communicate between a translator and its client is to pass a smart pointer onto a found metedata object to be translated (an object of the CdbObjectPtr type)
Translators are registered with CDB API using the following interface:
// File: CdbBase/Cdb.hh |
That CdbObjectTranslator is a very base class of all translators. In addition to this class there is an infrastructure of derived translator classes providing foundation for technology specific translators. At a time of writing this document CDB is implemented in two technologies "Bdb" and "Roo". Each translator object also "knows" which persistent technology it's associated with. Any attempts to register it against an improper CDB API implementation will be stopped by the above shown Cdb::registerTranslator method and signaled by the corresponding error status returned by the method. An object of the CdbObjectTranslator class would also carry the information about a persistent and a transient types this object is associated with. These two types would be translated into the following keys:
a persistent type name
a transient type identifier
The keys will be used to resolve the right translator when the persistent-to-transient translation would be requested for the found metadata object (CdbObjectPtr). These keys will also be used by the Cdb::registerTranslator method to prevent attempts to register more the one translator for a unique combination of keys. See more details in the class's interface:
CdbBase/CdbObjectTranslator.hh
An interface of this class on its own doesn't represent any particular interest neither for developers of new translators nor for clients of translators. However this class is and essential part of the extended CDB API to have translators in the right place in the API's class hierarchy. What developers are supposed to be interested at are technology specific base classes (or complete solutions) for translators. The former are discussed in subsequent sections.
To check if the desired translator already exists the following method can be used:
// File: CdbBase/Cdb.hh |
Note that a value of the transient type identifier is obtained using a special CdbType2Id class template. Here is an example:
#include "CdbBase/Cdb.hh"#include "CdbBase/CdbType2Id.hh"#include "MyPackage/MyTransientClass.hh"// Try to find the translatorCdbCPtr<CdbObjectTranslator> translatorPtr;CdbStatus status = Cdb::instance( )->findTranslator( translatorPtr,CdbType2Id< MyTransientClass >::id( ),"MyPersistentClassR" );if( CdbStatus::Success == status ) cout << "Found" << endl;else if( CdbStatus::NotFound == status ) cout << "Not found" << endl;else cout << "Error" << endl;... |
The next section will discuss persistent technology specific translator classes. They're all deriving from the very base one which has been described above. Then there will be a section on using translators for the persistent-to-transient transformation.
All translators for the "Roo" technology should derive from the following abstract base class:
CdbRoo/CdbRooObjectTranslatorRT.hh
It's a class template defined as follows (only relevant parts of the class's contract are shown):
|
The template has two parameters:
R - an input persistent type. Note that all persistent types in the "Roo" technology must derive from the CdbRooObjectR class. The type compatibility will be checkedt a compilation time.
T - a transient type of the the (output) product. Translators force no restrictions on this type.
An actual user-defined translator should implement the CdbRooObjectTranslatorRT::toTransientRT() method, return CdbStatus::Success if the translation was successful and set up the pointer to point onto a newly created transient object.
For those cases when a trivial translation between a persistent and a transient types would exist as it's shown below:
class TRANSIENT; |
there is a special class template:
CdbRoo/CdbRooObjectTranslatorR2T.hh
Here is an illustration of how this class should be used for:
#include "CdbBase/Cdb.hh"#include "CdbRoo/CdbRooObjectTranslatorR2T.hh"#include "MyTransientPackage/MyTransientClass.hh"#include "MyPersistentPackage/MyPersistentClassR.hh"// Register the translator//// MEMORY MANAGEMENT NOTES://// - if the translator is accepted then its ownership is also passed// from the client's code down to the CDB API//// - if the translator is turned down then it's up to the client's code// to destroy the object.CdbObjectTranslator* translatorPtr =new CdbRooObjectTranslatorR2T< MyPersistentClassR, MyTransientClass >( ); |
Note, that if the translator is turned down by the CDB API then must be properly disposed to avoid memory leaks.
All translators for the "Bdb" technology should derive from the following abstract base class:
CdbBdb/CdbBdbObjectTranslatorPT.hh
It's a class template defined as follows (only relevant parts of the class's contract are shown):
|
The template has two parameters:
P - an input persistent type. Note that all persistent types in the "Bdb" technology must derive from the BdbObject class. The type compatibility will be checked at compilation time.
T - a transient type of the the (output) product. Translators force no restrictions on this type.
An actual user-defined translator should implement the CdbBdbObjectTranslatorPT::toTransientPT() method, return CdbStatus::Success if the translation was successful and set up the pointer to point onto a newly created transient object.
For those cases when a trivial translation between a persistent and a transient types would exist as it's shown below:
class TRANSIENT; |
there is a special class template:
CdbBdb/CdbBdbObjectTranslatorP2T.hh
Here is an illustration of how this class should be used for:
#include "CdbBase/Cdb.hh"#include "CdbBdb/CdbBdbObjectTranslatorP2T.hh"#include "MyTransientPackage/MyTransientClass.hh"#include "MyPersistentPackage/MyPersistentClassP.hh"// Register the translator//// MEMORY MANAGEMENT NOTES://// - if the translator is accepted then its ownership is also passed// from the client's code down to the CDB API//// - if the translator is turned down then it's up to the client's code// to destroy the object.CdbObjectTranslator* translatorPtr =new CdbBdbObjectTranslatorP2T< MyPersistentClassP, MyTransientClass >( ); |
Note, that if the translator is turned down by the CDB API then must be properly disposed to avoid memory leaks.
In order to use a translator, a user should have the following three things ready:
a type of a transient object to be created
a smart pointer (CdbObjectPtr) onto a valid metadata object representing a successfully found persistent object
a proper transaction context (CdbTransaction)
#include "CdbBase/CdbTransaction.hh"#include "CdbBase/CdbObject.hh"#include "MyPackage/MyTransientClass.hh"...// STEP I: Make sure there is a proper transactionCdbTransaction readOnlyTransaction;// STEP II: Find a persistent object (details of how it's done are omitted)CdbObjectPtr objectPtr = ...;// STEP III: Turn the above found persistent object into a transient one.//// In case of the successful completion of the operation the transient// pointer will be initialized to point onto the resulting object.//// NOTE: The ownership will also be returned with the pointer!MyTransientClass* transientPtr = 0;if( CdbStatus::Success != CdbObject::transient( transientPtr, |
Note, that for proxies, finding the metadata objects an providing the transaction will be done by the proxy base class: CdbProxyBase. For them the only relevant part of the above shown example is the last (third) step. See more details on proxies in the section 4 of the document.
Here is what's happening when the "CdbObject::transient()" method gets called:
a type identifier is derived from the type of a transient object pointer (MyTransientClass in the above shown example). The actual a mapping between transient types and their identifiers is done at a compilation time. Also note, that the pointer type should be exactly the one the translator has been designed with.
a type name for the persistent object is obtained from the passed metadata pointer (CdbObjectPtr).
a translator matching both above mentioned types is found (if any) in a dictionary owned by a CDB API implementation. The metadata pointer "knows" which CDB API implementation to look for.
the translator is asked to perform the persistent-to-transient translation. If for some reason the translator would think that the translation is not possible then it will indicate it by returning the corresponding signaling mechanism (See details in a section of this document where the translators are described).
if the translation turns to be successful the transient pointer supplied by a client would be initialized to point onto a newly created transient object. Note, that the ownership is also returned with the pointer. So, don't forget to dispose the object to avoid memory leaks!
One obvious limitations of the current model of translators is a limited communication between clients and translators. In a few words, this communication is the following: a client defines a transient pointer of the expected transient type and passes a reference onto that pointer along with a metadata object pointer to the CDB API for the translation. This model leaves behind two potentially interesting use cases (as well as a combination of both of them):
a client may need to communicate with a translator to control a translation algorithm. Technically it means passing a parameter (more general - parameters) to the translator.
a client may need to produce a transient object from more than one persistent objects. It means that a list of those persistent objects (CdbObjectPtr-s) should be passed passed to the translator (now it's just one).
A work on designing translators to satisfy this use cases is not complete yet. However, when (if) this will be done the current interface of translators won't change.
All technology-neutral proxies derive from the following base class:
CdbBase/CdbProxyBase.hh
This class replaces the similar technology-specific classes:
CdbBdb/CdbBdbProxyBase.hh
CdbRoo/CdbRooProxyBase.hh
A way this new class is designed and implemented covers "almost everything" for what CDB proxies in the BaBar Framework exist. The first missing part, which is up to specific proxy developers, is to establish an appropriate translation path between found persistent objects (available in a form of technology-neutral CdbObjectPtr objects) and a resulting transient product of the proxy. It's a developers' responsibility to trigger this persistent-to-transient translation.
NOTE: For many (of not to say - for most) existing cases even this remaining task has an out-of-box solution. Read more on this in the subsection on "Trivial proxies".
Technically speaking, a proxy developer is supposed to derive from the proxy base class and implement the following method:
|
When this method gets called it receives a list of persistent objects which are supposed to be used to produce a product object of the transient type T (the only parameter of the class template). The list is implemented as a vector of objects of the CdbProxyElement class. For most of existing proxies (the one-to-one proxies with a trivial relation between persistent and transient objects) there will be just one element in the vector. The most essential parts of the element class's contract are shown below:
|
A use (usability) of an information delivered by these methods depends on a particular proxy. In fact, there is no much difference between existing technology-specific and the technology-neutral proxies in this respect. The most important change in the current context (of translators) is about a way persistent objects are made available through this class. For the technology-neutral proxies it's a metadata object pointed via a smart pointer of the CdbObjectPtr type. A developer of custom proxy is supposed to trigger translators for the found metadata object and use the resulting transient objects to produce a transient product of the proxy. In most cases (the above mentioned one-to-one proxies) that transient product is exactly what's returned from the translator.
The second thing to be taken care of by proxy developers is a configuration of proxies. This configuration is essentially done in the same way as for existing technology-specific proxies - through (and by) constructors of proxies. There is one little change though - all so called "strategy" objects are now technology-neutral. Here is a list of those (it shouldn't be too difficult for a reader of the document to find a right one resembling the corresponding technology-specific strategy):
CdbBase/CdbDefBkgFirst.hh
CdbBase/CdbDefBkgFirst.hh
CdbBase/CdbDefEventKey.hh
CdbBase/CdbDefFixTimeStrategy.hh
CdbBase/CdbDefStrategy.hh
A good example of how to build a custom proxy can be found in the out-of-box environment proxy of the one-to-one type discussed in the subsection below.
For those one-to-one proxies for which a transient object returned from a translator is the one to be expected from a proxy there is a trivial out-of-box solution:
CdbBase/CdbEnvProxy.hh
Here is an example of its use:
| < |