SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo
HEPIC E,S & H Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews
Unwrap page!
Computing Search
Who's who?
Meetings
FAQ Homepage
Archive
Environment
Online SW
Offline
Workbook
Simulation
Reconstruction
Data Distribution
Beta
Beta Tools
Event display
Code releases
Databases:
Hot Items!
About Us
Meetings
General DB info
Conditions DB
Event Store
Online DB
Links
Check this page for HTML 4.01 Transitional compliance with the
W3C Validator
(More checks...)

Database Issues

David R. Quarrie

Common Reconstruction Meeting - 25th June 1996

Draft: 24 June, 1996 V0.2

 

Names

Can we come up with a better name than Calibration and Geometry/Alignment Database? It's awfully long-winded and doesn't cover all aspects that have to be recorded as a function of time (e.g. Trigger settings, Slow Controls settings, etc.). How about:

  1. Conditions Database
  2. History Database
  3. Something else?

Let's choose one. I must admit to preferring Conditions Database. I'll use it for the following discussion.


General Information

  • Conditions information can vary at up to a 30min frequency. Thus the primary access key is the date and time interval over which the information is valid.
  • Conditions classes should be designed to contain information that varies at the same time-scale. Thus a class should not contain data members for pedestals, geometry and alignments if these have different time intervals over which they are valid. The database would have to deal with the worst case (the most frequently changing), even though much of the information was invariant. Instead, different classes should be designed for each of these. If a single calibration run allows multiple quantities per channel to be calibrated (e.g. pedestals, thresholds, gains), a single class is appropriate to represent these, otherwise separate classes should be used. In general each class will correspond to a set of channels and so will be a collection object referencing other objects representing the per-channel information.
  • The above does not imply that other classes cannot exist that have accessors for information that varies at different time-scales as opposed to data members.
  • The conditions classes will be stored in a time-ordered collection for each time interval where the information was stable. Even if a single data member varies within this class structure, new objects will have to be created, perhaps sharing the lower level objects that are unchanged, in order to represent the information for the new time interval.
  • The conditions information for a particular event will be accessed on the basis of the date & time of the event (as located via the AbsEvent object). One such tag will be applied to each event as it flows down the data acquisition pipeline. Another tag will presumably be applied as the event is reconstructed to allow the appropriate conditions information for that processing phase to be accessed later. Reprocessing should cause further time tags to be applied to the event. The conditions information object hierarchy appropriate for an event is accessed by querying the appropriate time-sequenced collection for the specified time tag. The conditions information for an individual channel within this hierarchy is accessed by pointer or reference (see later) following within this time-stamped object hierarchy.
  • In general electronics calibrations should attempt to update all channels of the same type within a detector subsystem (DCH, EMC, DRC, etc.). The default granularity for calibrations is the readout controller (since this is the smallest partitionable entity). If partial calibrations are performed for a subset of the channels within a detector, these will be merged with the most recent calibrations for the remaining channels so as to present a complete set to the reconstruction application.

There are three natural time-scales and hierarchies involved in reconstruction information:

  • Information within an event, such as Hits & Digis. Such information is only valid for a single event. The AbsEvent object is used to access this information.
  • Information within a job such as that created by the begin(Job) entrypoints for the active modules. Such information is valid for a single job or may be altered as a result of the user interface. A navigational hierarchy (crystals within the calorimeter etc.) is an example of this.
  • Information from the Conditions database. This is valid for a range of events, but different information is valid for a different time range. Pedestals within a crystal are an example of this.

The AbsEnv object is used to access the latter two types of information.

There is a desire to be able to establish cross-linkages between these hierarchies and time-scales. Thus a Crystal object should be allowed to have accessors for its Digi information as well as it's geometry and calibration information. Establishing such linkages is feasible, but must be restricted to the appropriate validity range. Thus a new event should cause the existing linkages to information within the AbsEvent hierarchy to be broken and re-established if necessary. Similarly for a change in the conditions information if a new event has a time-stamp that corresponds to a different time interval than the previous one.

One possibility is that there is no local caching of these cross-linkages and that the Crystal object always has to access the AbsEnv object for conditions information and ask it for the appropriate geometry and calibration information. It would ask the AbsEvent object in a similar manner for Digi information etc. This might be inefficient.

Conversely, direct references may be allowed, but an invalidate() method may be invoked to nullify such references. Once nullified, the next access can locate the information by interrogating the AbsEnv or AbsEvent object as before, but caching the reference to allow faster accessing thereafter until the reference is invalidated again.

Which of the two approaches should be used will depend on the effectiveness of caching and the complexity of the invalidate() methods.


Approach

In general, the approach will be to hide the database implementation from the application programmer as much as possible. However, there are technical limitations to this that will impact how software will have to be written. In particular, a persistent-capable class will have some constraints put upon it.

In order to minimize the impact of a particular database vendor, the Object Database Management Group (ODMG) has specified a standard API: The Object Database Standard (ODMG-93). All OODBMS vendors are gradually conforming to this standard. Within the limitations imposed by our choice of database vendor, we will adopt ODMG-93 as our database API, whilst attempting to minimize the its impact.

The results of the OODBMS evaluations indicate that Objectivity/DB is a suitable product for both our Conditions database as well as the Event Store. Objectivity does not yet fully conform to ODMG-93, but is moving in that direction. The next version (4.0) will extend the conformance to most of the API apart from the Object Query Language (OQL).


Technical Constraints

  • Persistent capable classes must inherit from class d_Object. This inheritance may occur anywhere in the inheritance tree (e.g. at the top of the tree or as a mix-in at each leaf). There are some limitations that are vendor-specific. [Objectivity currently usesooObj instead of d_Object. Fixed in V4.0. ]
  • Persistent capable classes cannot contain C++ pointers or references to other objects. Instead a smart reference d_Ref<T> must be used (where T is the type of the other object). This allows use of the arrow operator (->) in the same way as a conventional C++ pointer. The dot (.) operator allows information about the smart reference itself to be accessed. [Objectivity currently uses ooRef(T) instead of d_Ref<T>. Fixed in V4.0]
  • A d_Ref<T> reference may span database files and will implicitly open the target database file if appropriate.
  • ODMG-93 supports heterogeneous architectures and specifies that conventional primitive C++ types (int, float, double, etc.) are supported. However, it also defines several fixed-length types that are guaranteed to have the same size on all platforms. These are d_Short, d_Long, d_UShort, d_ULong, d_Float, d_Double, d_Char, d_Octet and d_Boolean. [Objectivity defines int16, int32, etc. Fixed in V4.0. In V3.8 it doesn't support 64-bit integers. I don't know about V4.0]
  • Unions and bit-fields are not supported by ODMG-93 or Objectivity.
  • Static data members are allowed but not stored in the database. [True for ODMG-93. I haven't been able to find this in the Objectivity documentation.]
  • Objectivity has the additional limitations that persistent-capable classes cannot inherit from a virtual base class and that a persistent-capable class cannot be embedded in another persistent capable class. However, a non-persistent capable class can be embedded.
  • A persistent object may contain a d_Ref<T> to a transient object. On read from the database, such references will be NULL. In addition, the d_Object class defines function members d_activate() and d_deactivate() that allow user-hooks to be written to respond to the object entering and leaving the application cache. [Objectivity will support activators & deactivators in V4.0.]
  • Persistent objects are created using an overloaded version of the new operator that specifies the database or other clustering information. [Objectivity specifies the default database - not a transient object - if the normal new operator is used. A NULL database corresponds to a transient object. Don't know whether fixed in V4.0]
  • Persistent objects are deleted by the d_Ref::delete_object() function or by the delete (T*) operator (note use of C++ pointer). Deletion only occurs if the current transaction (see next section) is committed rather than aborted and the current database is open for update access. [Objectivity uses theooDelete() macro instead of d_Ref::delete_object(). Not known whether fixed in V4.0.]
  • Objectivity claims that their persistent new and delete are faster than the transient ones if the database is not updated. Thus, in an exploratory style of analysis, persistent objects should be created and then deleted if discovered not to be of interest, rather than creating transient objects and later causing interesting ones to become persistent (by copying).
  • Associations are an alternative to object references. An association may be uni-directional or bi-directional, and may be one-to-many, many-to-one or many-to-many.
  • An OODBMS uses the concept of transactions to manage concurrent access to the database and to ensure the integrity of the data. A transaction is begun and then either committed (in which case the persistent store is updated to reflect any changes) or aborted (in which case any changes are ignored). Whether a d_Ref<T> remains valid across a transaction boundary is implementation-specific. [Objectivity does maintain validity across transaction boundaries.]
  • ODMG-93 defines a set of collection and container classes. These will be a superset of STL. [The standard is evolving. Objectivity supports Tools.h++. I don't know whether this is currently v6 or v7. In either case it's a special version from Objectivity using modified RogueWave source code.]
  • An object in the database can be located either by name (which has to be globally unique), by making a query against a collection, by navigation via d_Ref<T> objects or a combination of the above. [Objectivity supports all of these, although the query language is not OQL. OQL is itself evolving.]
  • The .hh file for persistent capable classes is no longer the primary definition of the class interface. The Object Definition Language (ODL) file is the primary definition and the .hh file is derived from this. This has implications for code generation from CASE Tools. [Objectivity currently uses a Data Definition Language (DDL) instead of ODL. An ODL compiler is available from an Objectivity distributor. ODL back-ends for Rational Rose and other CASE Tools are available from 3rd party vendors.]
  • Most vendors now support Schema Evolution or Class Versioning. The schema is said to have changed if the class interface (corresponding to the ODL file) has changed in a manner that makes existing persistent objects inaccessible. This includes adding or deleting a data member or changing the class hierarchy. Schema evolution takes existing persistent objects and automatically updates them to reflect these changes. This change is normally irreversible, but may be applied to all objects within the database or incrementally. This should the thought of as an expensive process and should generally be avoided unless absolutely necessary. Class versioning allows multiple versions of schema to exist simultaneously. [Objectivity 4.0 will allow both automatic and non-automatic schema evolution I believe. Evolution can be either for all objects of a particular type, for all objects within a database (or container, which is a subset of a database), or on demand (i.e. dynamically as an application accesses the old objects.]
  • Object Versioning allows a genealogy of objects to be created, specific ones being denoted as the default or being locatable by version number. This is identical in concept to CVS.
  • Objectivity has an intrinsic 14 byte overhead per object stored in the database. The use of associations will increase this. A design that uses many very small objects may adversely impact the overall database size.

A header file that includes versions of the ODMG-93 primitive data types and an implementation of the d_Ref<T> class has been written by the RD45 collaboration.

Objectivity V4.0 is due this summer. I've been waiting until I knew more about it's implications before publicizing the RD45 header file (or one that Chris Day put together).


Reprocessing

Reprocessing of events has two consequences as far as the database is concerned:

  • Multiple copies of reconstructed information may be associated with an event. This is an Event Store issue and will not be discussed further here.
  • Multiple copies of calibration and conditions information, resulting from the reprocessing, may become associated with the event. Furthermore, these might have different validity ranges than the original versions. Object versioning is probably not the appropriate method for dealing with these overlapping sets of conditions information because of the different validity ranges.

Database Updates

The API for updating the database has not yet been defined. Questions that have to be addressed are how to create the objects and how to cleanup references. Using the new operator directly to create persistent objects is probably a bad idea since the database location (or clustering information) needs to be specified. This is better managed by either a factory object or a static function member.

Beginning and terminating transactions will be managed behind the scenes as far as is possible. For read-only access to both event data and conditions data, transactions may span the processing of many events. For update access (where the event store is being updated with reconstruction information, or the reconstruction is being used to generate calibration or alignment updates to the conditions database, transactions will probably have a shorter duration. The overheads involved are still being investigated, although some numbers are already available from the initial evaluations.


Status

A document that summarizes the results of the evaluations is being drafted and a report will be presented at the Dresden Collaboration Meeting in July 1996. A detailed design of the underlying database organization is underway, with the goal being a prototype implementation for some representative conditions classes by November 1996, which is when a database review is scheduled.

 

DB Home | BaBar Home | Computing | Reconstruction | Simulation | Search

e-mail DRQuarrie@LBL.Gov