SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo
HEPIC E,S & H Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews
Unwrap page!
Comp. Search
Who's who?
Meetings
FAQ Homepage
Archive
Environment
Administration
New User Info.
Web Info/Tools
Monitoring
Training
Tools & Utils
Programming
C++ Standard
SRT, AFS, CVS
QA and QC
Remedy
Histogramming
Operations
PromptReco
Simulation Production
Online SW
Dataflow
Detector Control
Evt Processing
Run Control
Calibration
Databases
Offline
Workbook
Coding Standards
Simulation
Reconstruction
Prompt Reco.
BaBar Grid
Data Distribution
Beta & BetaTools
Kanga & Root
Analysis Tools
RooFit Toolkit
Data Management
Data Quality
Event display
Event Browser
Code releases
Databases
Check this page for HTML 4.01 Transitional compliance with the
W3C Validator
(More checks...)
vlink="#800000" leftmargin="10">

ROOT's Scribes and Modules

A. Salnikov

September 26, 1999


I. What it's all about

In this brief report I'm going to describe the ideas and some implementation details of two packages, RooScribes and RooModules, as I understand them. These will probably evolve in the future as some new requirements or details become clearer, so this description can be outdated soon. The best source of information in such cases, and as anywhere else in BaBar, is the source code itself.

For those encountering for the first time with these packages, I can add that these packages are implementing a common set of tools to be used for the I/O into ROOT event store. They are dealing with the event data, as opposed to the conditions data, although some part of the condition data can migrate to the event data in the ROOT-based data store. For the condition data you can look at the excellent material provided by David Kirkby [1].

II. Basic ideas

As the time scale for the whole project is seriously limited, it looked quite natural, and was generally accepted, that we should emphasize on reuse of the ideas and implementations of the existing things related to the event store and condition database. My own list of features we have to implement in the ROOT-based event store looks like this:

  • Schema evolution. Support for this must be included at the design level. ROOT's own support for schema evolution is not quite satisfactory for such big project as ours, so we must do something about it ourselves. Fortunately we have the problem solved in quite general way by Objectivity team, and can reuse that approach, which already proved itself working.
  • Persistent references. Although many basic classes to be used in the analysis do not require persistent references, thinking about the future possible extensions can be useful, and this leads directly to the need to include support for referencing in the design. Event our short-term plans will probably need this feature if we going to implement MC truth information store in the same terms as it was done for Objectivity.
  • Modularity and extensibility. As in any other area in BaBar the tools we produce should be able to integrate easily in the existing BaBar SRT. One of the key words here, I believe, is interdependencies, which should be reduced to the minimum. The I/O part of the Framework responsible for data exchange with the ROOT event store should not depend on any particular data type in the event store, but should allow at the same time addition of new persistent data types as they appear.
  • Incremental event processing and event selection. The idea to load parts of the event data incrementally is not new, and is thought to allow much faster event processing. There is a realization in the Objectivity event store with so called filtering modules and a number of update modules. The approach can be extended to include a preselection directly in the input module, thus removing the need to go through the Framework event() cycle, with the possibility to gain even more speed.

It seems that many of the approaches used in Objectivity event store can be used also for ROOT. As for the code itself, the possibility of reuse is limited seriously by the dependency of practically all code on the lower level technology, i.e. Objectivity data types.

III. RooUtils package

Another package worth to mention here, as it contains some classes which are central to persistence implementation, is RooUtils. It was created to hold the tools and utilities which do not depend on anything except ROOT itself. Now there are few class definitions in there, which are of particular interest for discussion here:

RooRef Implementation of persistent reference. The reference is basically an object ID of the persistent object. The method "UInt_t id() const;" returns an OID of the object.
RooPersObj This is a base class for all persistent classes to be created in the ROOT event store in BaBar. It inherits from ROOT's TObject. The functionality it provides now is to create an unique object IDs for all persistent objects. The uniqueness is guaranteed only between objects in the same session, i.e. there is no guarantee that ROOT objects will have different IDs in two different runs. The IDs are used in the the implementation of persistent references. The method "RooRef refToMe() const;" returns persistent reference to the object. The protected method registerThis(...) is used to register persistent-transient relations (see below).
RooEvtObj<T> This is the next-layer base class for persistent objects, inheriting from RooPersObj. It defines an interface for all persistent classes which can be created from, or converted to the transient class of type T. the interface includes following methods:
  • T* transient( RooEvtObjLocReg& reg ) const = 0
  • bool fillRefs( const T* trans, const RooEvtObjLocReg& aRegistry )
  • bool fillPointers( T* trans, const RooEvtObjLocReg& reg ) const

Additionally to this methods each persistent class should implement a constructor from transient object in the form (assuming you have declared "class XxxDataR : public RooEvtObj<XxxData> {...};"):

  • XxxDataR( const XxxData* trans, RooEvtObjLocReg& reg )
RooEvtObjLocReg This is a registry of relations between transient and persistent objects, used in the implementation of persistent references. It provides bi-directional mapping between persistent references (RooRef) and pointers to the transient objects. The map should be filled in the constructors of the persistent objects and in their transient(...) methods, and will be used to reconstruct transient-to-transient relations in fillPointers(...) method or persistent-to-persistent relations in fillRefs(...) method.

Here is an example (not-working) of the class which utilizes the approach to save references between objects in persistent store:

// 
//  Version 001 of the persistent class XxxDataR
//
class XxxDataR_001 : public RooEvtObj<XxxData> {

public:

  // ====== Constructors ======

  // def.ctor must be provided if you are saving collection of objs.
  XxxDataR_001() : RooEvtObj<XxxData>() {} 

  // ==> ctor from the trans obj and a registry must be there
  XxxDataR_001( XxxData* trans, RooEvtObjLocReg& reg ) 
    : RooEvtObj<XxxData>() 
  {
    // if you care about storing references, do this:
    registerThis ( trans, reg ) ;

    // all other stuff relevant to constructing pers.obj.
  }


  // ==> must provide transient object "ctor"
  XxxData* transient( RooEvtObjLocReg& reg ) const
  {
    // simply create transient from all info you have
    XxxData* trans = .... ;

    // if you care about storing references, do this:
    registerThis ( trans, reg ) ;
  }

  // fillRefs() method must be there too if you really care about
  // storing references, otherwise default implementation is OK
  bool fillRefs ( XxxData* trans, const RooEvtObjLocReg& reg )
  {
    // this is a basic idea how you can make a persistent ref. to other object
    AbsEvtObj* transPointer = trans->getSomePointer() ; // just an example

    if ( transPointer ) {
      // this will work only if registerThis() was called for other object
      RooRef persRef = reg.find( transPointer ) ;
      if ( ! persRef.id() ) return false;
      refToOther = persRef ; // store the reference
    }

    return true ;
  }

  // fillPointers() method must be there too if you really care about
  // storing references, otherwise default implementation is OK
  bool fillPointers ( XxxData* trans, const RooEvtObjLocReg& reg ) const
  {
    // this will work only if registerThis() was called for other object

    if ( refToOther.id() ) {
      AbsEvtObj* transPointer = reg.find( refToOther ) ;
      if ( ! transPointer ) return false ;

      trans->setSomePointer( transPointer ) ; // and set transient reference
    }

    return true ;
  }

  // ...... whatever you want to be here .......

private:

  // ...... some stuff I don't care about

  RooRef refToOther; // persistent reference

  // ROOT specific declarations
  ClassDef(XxxDataR_001,1)

};

But for simple classes, which do not need to care about references, there is no need to override default implementation of fillRefs() and fillPointers(), and to call registerThis() in constructor and transient() method, although the format of the constructor should be the same and transient() also must be supplied.

IV. RooScribes package

RooScribes contains a set of classes performing data transfer between transient event store (AbsEvent) and persistent store (ROOT trees). Data exchange proceeds through the "streams", with stream being just a ROOT tree. Two or more streams can share the same file, in this case there will be more than one trees in the file. Each stream is identified by its name, which is the same as the tree name. Actual job of doing transient-persistent exchange is performed by the "scribes". Scribe is an object knowing which objects it should convert and how to do it. Each scribe corresponds to single object branch (TBranchObject) in the ROOT tree.

The reduced class diagram for this package can be found at this link. The "chief" class on the diagram is RooConversionManager, and the "central" one is RooGenericScribe. RooConversionManager controls the conversion (either way) of the data, and it does this through the list of scribes registered for this job.

IV.1. Making persistent objects with scribes

Each output stream calls conversion manager's convertToPersistent() method. For each such call the manager executes following sequence for all scribes "valid" for this stream:

  1. scribe->attemptTransient(), which creates persistent representation of transient objects in memory,

  2. scribe->fillRefs(), which validates persistent references between persistent objects in the stream,

  3. scribe->store(), which moves persistent data to the external store.

This sequence is different from the sequence used for Objy due to some conceptual differences - Objy's persistent objects are created "already in the store" (actually they are moved there at the end of transaction, but this is not scribe level). As a sequence store() method is not present in the Objy scribes. It is possible to live without it in ROOT too, because the main part of its functionality is executed by stream itself (TTree::Fill method), but it the terms of OO abstractions this method appeared quite naturally, so I prefer to keep it.

IV.2. Making transient objects with scribes

Each input stream in turn calls manager's convertToTransient() method. For each call manager executes such a sequence for all scribes "valid" for this stream:

  1. scribe->attemptTransient(), which fetches data from persistent store and converts them to transient object,

  2. scribe->fillPointers(), which validates transient references (pointers) between transient objects in the stream.

IV.3. Abstract and concrete scribes

Conversion manager works with the objects of class RooGenericScribe, which is an abstract class providing only interface for the operations described above. This is the job of the client's code to create real scribe objects and pass them to the manager. This is usually done in so called "loader modules" (see next sections). To simplify the job for the loader modules, a set of concrete scribe classes was implemented, which should cover most of the required functionality. There exist following concrete classes for scribes:

  • RooDefScribe<T,P> - provides conversion of the single transient object of class T to the single persistent object of class P and back.

  • RooAListScribe<T,P> - provides conversion of the transient collection (HepAList<T>) of transient objects of type T to the persistent collection (TObjArray) of persistent objects of class P and back.

  • RooAListClonesScribe<T,P,I> - the same as above but uses TClonesArray as a persistent collection. This class has third template parameter which is an interface type for the persistent object, usually this is a RooEvtObj<T>.

  • RooCompositeScribe<T,P,I> - provides conversion of the transient collection (HepAList<T>) of transient objects of type T to the composite persistent object of type P, having an interface I. Interface usually will be RooEvtObj<HepAList<T> >.

  • RooAListRCVScribe<T,P> - provides conversion of the transient collection (HepAList<T>) of transient objects of type T to the persistent collection (RooClonesVector<P> from RooUtils package) of persistent objects of class P and back.

  • RooDefSyncScribe<T,P> - this is a special version of RooDefScribe class to provide sync'ing scribes. See below for details.

IV.4. Interfaces and schema evolution

The interfaces to the persistent classes, appeared in the previous section, are the central part of the implementation of schema evolution. The idea is that we can read the data back from the event store without knowing all details of the object layout, using one of its base classes. If all versions of the persistent class (such as XxxData_001, XxxDataR_002, etc.) would have the same base class, then the schema evolution could be realized very easily. This idea (stolen from Objy event store, as usual) is implemented for the ROOT event store as well. The interface class for persistent class, basically, is the minimal set of methods allowing to create a transient representation of the data read from the event store through this interface, i.e. it defines "convertible to transient" type. Two methods are sufficient for this - these are defined in RooEvtObj<T> "T* transient()" and "bool fillPointers()". RooEvtObj<T> defines also fillRefs() method but, strictly speaking, it is not necessary for interface types. This method, as well as the constructor from the transient object, must be implemented by any persistent class used with scribes, so it was included for convenience.

One important note. ROOT plays all possible low-level tricks with the pointers, thus making itself completely non-OO. One particular thing important to us is that it passes pointers to objects as "void*". For our schema evolution to work this implies that pointers to the persistent object and the interface should be physically the same. If they are different the result will be a memory overwriting and all sorts of problems related to this. So, never ever ever ever use multiple inheritance in the persistent classes. Or if you really mean to use it make sure that interface is physically the first thing in the class hierarchy. (I'm not sure about what standard says about memory layout ofthe subobjects, but it seems that on most platforms you can do it placing interface first in the inheritance list: "class XxxDataR_001 : public RooEvtObj<XxxData>, public Whatever {...}".)

IV.5. Sync'ing scribes

As the persistent data will be spread across several ROOT trees and files, which can be produced and distributed separately, we should have a mean to guarantee a consistency of the information in different locations. The idea was to put a separate branch in every tree holding some unique data for each event. During the read stage this data will be read from all trees and checked for consistency. These data can be whatever providing uniqueness, but to avoid unnecessary data in the file, something which is useful for other purposes can be used too, for example event ID. The behavior of the the I/O system is somewhat different w.r.t. these sync'ing data, so a special type of scribes was introduced fro this job - RooDefSyncScribe. On output the scribe of this type will write the same persistent data in all output streams in separate branch. On input the scribe is executed for each input stream and checks that objects from different streams are equal. If the sync'ing object in some stream is different from such object in the first read stream, all the data from the stream will be discarded and warning message will be printed.

V. RooModules package

This package contains a number of Framework modules and their helper classes to organize input and output in the Framework jobs. Organization of the input and output follows closely that of Objy event store, similar classes and modules can be found in BdbModules package. There are four basic modules in the package described in details below, and also a base class for "loader modules".

One of the concerns worth to mention here is that all modules are independent on the specific format of transient or persistent data. I think the best approach is to keep it this way, this will allow us to keep package dependencies more manageable. But this may contradict to the idea to make fast selection of the events based on the content of the data, I'll return to the point below in the discussion.

V.1. Output module

RooEventOutput is a standard output module for the production of ROOT persistent data. It inherits from AppStreamsOutputModule and has practically no new functionality. All real job is done in the output stream objects (RooOutputStream class) which are created with the module's "output" command (RooOutputCommand class). The stream's responsibility is to open the output file, create TTree with the stream's name and call RooConversionManager for each event. The conversion manager object is extracted from the transient event. The creation of the output stream from Tcl script looks like this:

# communicate with the output module
module talkTo RooEventOutput

  # create new output stream and give a destination
  output stream "Tag" $env(EVENT_STORE)/runXXXX-Tag.root

  # associate framework path with this stream. stream will be executed 
  # only when given path is "passed". Path must be already defined.
  output paths  "Tag" "Everything"

  exit

There are two commands necessary to make output - "output stream" and "output paths". First one creates an output stream with the given name and gives it a name of the output file, file naming is discussed below. Second command associates some framework path (which must be created before this command is issued) with the output stream. The stream will be executed only when the path's state is "passed", thus allowing filtering applications to write only selected events.

There can be more than one stream with the same destination file. Following script shows an example of this:

module talkTo RooEventOutput

  # create two output streams with the same destination
  output stream "Tag" $env(EVENT_STORE)/runXXXX-Tag+AOD.root
  output stream "AOD" $env(EVENT_STORE)/runXXXX-Tag+AOD.root

  # associate framework path with these streams.
  output paths  "Tag" "Everything"
  output paths  "AOD" "Everything"

  exit

In this case the name of the destination file you give must be literally the same for both streams, letter after letter, otherwise the result will be complete disaster.

Few remarks about data production in OPR. Our Framework does not allow having two output modules in one job, so, if we are going to produce ROOT data directly in OPR, we'll need an output module which is not an APPOutputModule, which can be executed as a last module in a standard path. This can be achieved by putting all the functionality of AppStreamsOutputModule and RooEventOutput in separate class inheriting from APPModule, for example. One more issue is that OPR runs on many machines. To get one file per run (or one file per few runs, probably) we'll need a special merging application which gathers all output from tens or hundreds of files into one at the end of run.

V.2. Input modules

And of course there exists input module which can read the data produced by the output module. The approach for input is the same as Objy's one - there is a RooEventInput module, RooEventUpdate module with RooCreateCM and a number of loader modules between them in the path. Briefly the responsibilities of this modules are:

  • RooEventInput - to locate next event and prepare input streams for reading,
  • RooCreateCM - to create an instance of the RooConversionManager and put it in the transient event,
  • loader modules - create scribes for the objects they want to fetch (or save as well) and pass this scribes to the conversion manager,
  • RooEventUpdate - to execute all input streams using the data prepared by all previous stages.

More details about all these are found below.

V.3. RooEventInput module

The responsibility of this module is to locate the next event for processing and to pass the information about this event to the input streams. Despite the name there is no real input occuring in this module, it happens down the framework path in RooEventUpdate module. The streams, owned by the RooEventInput are created by the "input stream" command (RooInputCommand class). To pass the streams to RooEventUpdate module this module puts them into transient event.

To decide about next event to process, input module uses "collection" and "selector" abstractions. Collection (RooInputCollection) is just a set of the input files to open. Selector is an object responsible for finding an address (file name and event index) of the next available event.

As we can write the data to many output destinations, we have to be able to read them too at once. To do this the input modules must be able to read from different input files. Hence the full path name of the input file is determined both by the its name in collection and in the stream. Presently the path name is constructed by the concatenation of the collection's name and some suffix defined for the stream, with a dash between and ".root" extension. Here is an example how all this works from Tcl:

module talkTo RooEventInput

  # create two input streams to read from different files
  input stream "Tag" tag
  input stream "AOD" aod

  # associate framework path with these streams.
  collection add $env(EVENT_STORE)/run10001
  collection add $env(EVENT_STORE)/run10002
  collection add $env(EVENT_STORE)/run10003
  collection add $env(EVENT_STORE)/run10004
  collection add $env(EVENT_STORE)/run10005

  select all

  exit

In this case the stream "Tag" will read data from the following files:

    $EVENT_STORE/run10001-tag.root,
    $EVENT_STORE/run10002-tag.root,
    $EVENT_STORE/run10003-tag.root,
    $EVENT_STORE/run10004-tag.root,
    $EVENT_STORE/run10005-tag.root,

and the stream "AOD" will read the files

    $EVENT_STORE/run10001-aod.root,
    $EVENT_STORE/run10002-aod.root,
    $EVENT_STORE/run10003-aod.root,
    $EVENT_STORE/run10004-aod.root,
    $EVENT_STORE/run10005-aod.root.

The logic of constructing the whole path name from the collection name and stream name is inside a separate class - RooDirectorySvc. In principle, this class can be modified to use more sophisticated schemes, i.e. run database, log-books, etc.

Just as the output streams, the input streams can share the same file, in this case the parameter giving the suffix name of the file path should be the same in "input stream" command.

A command "select all" in the above example creates "all events" selector. This command can be omitted, input module will create it itself then. For now this is the only existing selector for input events (RooIputSelectAll class). Other selectors can be easily added by implementing RooInputSelector interface, e.g. selector based on the tag data. (The problem of dependencies comes into play here. The tag-based selector will inevitably depend on tag format, the thing which I want to keep RooModules away from. I can imagine solution when separate package implements a framework module which creates in beginJob() such a selector and gives it to the input module via setSelector() method. The problem here is that this module should know about the input module object, but this can be resolved by careful implementation of AppUserBuild stuff.)

V.4. RooCreateCM module

The whole purpose of this module is to create a conversion manager object and to make this object accessible to all other downstream module by placing it in transient event. The conversion manager is created for every event and its ownership is transfered to AbsEvent, so it gets deleted together with transient event.

V.5. Loader modules

The loader modules represent a client part of the ROOT persistence task. Their responsibility is to create scribes for particular objects to be saved to or fetched from event store, and register the scribes with the conversion manager. The base class RooAbsLoader implements a part of this functionality which registers scribes. Instantiation of the needed scribes should be implemented in the subclass. With the RooAbsLoader scribes are created once for a job in the beginJob() method. A typical example of such module beginJob() method could be like this:

AppResult
XxxRooLoad::beginJob( AbsEvent* anEvent )
{
  // Add all the Scribes to this load that we want
  if ( _readXxx.value() || _writeXxx.value() ) {
    // Create single persistent object scribe:
    RooGenericScribe* scribe = 
        new RooDefScribe< XxxData , XxxDataR_001 >
    ( &_key.value(),        // IfdKey for transient object in AbsEvent
      _stream.value(),      // stream name - string
      _branch.value(),      // branch name - string
      _bufferSize.value(),  // buffer size - integer value
      _splitMode.value() ); // split mode - bool
    
    if ( _readXxx.value() ) {
      addScribeForInput( scribe ) ;
    }
    if ( _writeXxx.value() ) {
      addScribeForOutput( scribe ) ;
    }
  }
  return AppResult::OK ;
}

In general, design of the ROOT loader modules is pretty much the same as it is for Objy, with the exception that ROOT loader modules work only with the scribes, and not with the converters, so the creation of the ROOT loaders should be more or less trivial if one takes as a starting point existing Objy loaders.

V.6. RooEventUpdate module

This is the final point of the ROOT input sequence for event data - all real data loading is performed in this module. But just as a output module this module also does not do any real work itself - instead it fetches the list of input streams stored in the transient event and executes input() method for each stream in the list. This method uses conversion manager to execute all scribes registered for input and associated with this stream (and sync'ing scribes too).

Just as in Objy case, incremental loading of event data is possible if one instantiates multiple copies of RooEventUpdate modules and placing filtering modules and loader modules at right places. But as everyone today is concerned with the speed, providing selectors mentioned above can be somewhat wiser thing to do, I think.

VI. Test packages for scribes and modules

There exist also two packages created to test ROOT scribes and modules, RooScribesTest and RooModulesTest. Here is a brief description of what they do.

VI.1. RooScribesTest package

This packages tests scribes from RooScribes and depends only on it. RooScribesTest introduces RooScribesChkClass class representing transient data, and two versions of persistent data - RooScribesChkClass_001 and RooScribesChkClassR_002. Two applications are built, testRooScribesOutput and testRooScribesInput, to test output and input respectively with the scribes. First application produces a tree containing one event with two branches, a single object and a collection of 10 objects of the last version of persistent data. Objects in a collection have references to each other. Second application reads these data, converts them back to transient form and prints them. Regression test script was created to be executed for RooScribesTest.test target and compare the output of two programs to what is expected.

VI.2. RooModulesTest package

This package uses the transient and persistent classes from RooScribesTest package to check how modules stuff works. It implements three framework modules:

  • RooScribeChkLoad - loader module for RooScribesChkClass. Can work with both versions of persistent data,

  • RooFakeScribeChk - framework module which produces transient objects of RooScribesChkClass and fills transient event,

  • RooFakeReadChk - framework module which reads the transient objects of RooScribesChkClass from transient event.

There are two applications built in this package, one for writing and one for reading, with the Tcl files with a number of parameters to play with. My experience with this stuff demonstrates that schema evolution basically works, one can read any version of the data, provided that class for this version is defined in the application.

VII. Conclusion

All basic stuff related to scribes and input/output modules is implemented and tested in nightly builds. More work is needed to implement selectors which would work with tag data or event collections. No speed tests performed yet (although there are some numbers obtained in simple tests), as on my opinion true conditions for this cannot be achieved with single application running on single host.

References

  1. D. Kirkby, Framework Access to Conditions Data via ROOT, http://www.slac.stanford.edu/~davidk/RooCond/
  2. See also README files in RooUtils, RooScribes, and RooModules packages.
  3. RooScribes reduced class diagram, http://www.slac.stanford.edu/~salnikov/Root/RooScribesDiag.ps.gz