SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo
HEPIC E,S & H Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews
Unwrap page!
Computing Search
Who's who?
Meetings
FAQ Homepage
Archive
Environment
Online SW
Offline
Workbook
Simulation
Reconstruction
Data Distribution
Beta
Beta Tools
Event display
Code releases
Databases:
Hot Items!
About Us
Meetings
General DB info
Conditions DB
Event Store
Online DB
Links
Check this page for HTML 4.01 Transitional compliance with the
W3C Validator
(More checks...)

Event Reprocessing:  Overall Scheme

16 March 1999

Ray F. Cowan


There are two possibilities at present regarding event reprocessing:  event "extension" and event "cloning".  A third case is under consideration, which simply adds a pointer to an existing event to the output collection.

Case 1: "Input collection == Output collection".

If the input and output collections are the same collection, then any new, unique persistent objects produced during processing will be added to the collection. This is called event extension. This happens whenever the collectionName's as specified to the input and output modules are identical.

These new objects are added to the existing treeheaders in the event (if this is the first object to be placed in a given treeheader, the treeheader itself will be created too) . This is subject to the requirement that each object be unique within its treeheader.  Object identity is determined by the Objectivity ooTypeNumber of the class, combined with the value of an optional "secondary key" character string. If an attempt is made to put an object into a treeheader where an object of the same class and with the same secondary key string already exists, an assert failure will occur.  In other words, if you run a processing stage and write to the input collection, then run the same job again, the second pass will fail: the newly-recreated objects already exist in the treeheaders.

Case 2:  "Input collection!= Output collection".

If the input and output collections are different collections, then any persistent objects to be stored will be "written" to the output collection, without changing the input collection. This is called event cloning. This happens whenever the collectionName's as specified to the input and output modules are not identical.

A brief digression regarding event structure as seen by the persistent database is necessary to explain cloning. An "event collection" is a set of persistent pointers to persistent "event objects" in the database. An "event object" is a tree structure with the top node in the tree being a BdbEvent object.  This top node contains a list of persistent pointers to treeheaders. There is one tree header for each data domain within the event (eg tru, raw, sim, drc, ifr, bta, etc.).  A treeheader object contains a list of pointers to persistent data objects, which contain the actual data for the event.  See the figure. 

This means that there is a choice to be made when we "write" an event to an output collection which is different from the input collection: we can either copy actual data objects and replicate the entire event afresh along with any new objects which were added during this processing pass, or we can place pointers in the output event which point to the original data objects in the input event and only create new objects in the database for the objects which were created in this processing pass. Or we can do something in between these two extremes.

What we have chosen to do is to allow the user to select which objects to incorporate in the new output event as pointers, and which ones to create afresh. For ease in management, this choice is made at the treeheader level. The user may specify (as a tcl parameter to the output module) which headers are to be renewed (ie., zero these headers completely and only add objects to them which are produced in the current processing step).  Any existing header that is not renewed will merely be a pointer to the corresponding header from the input event.

NOTE:  when a treeheader is renewed, no checks are made to ensure that any dependent treeheaders that contain data based on the earlier version (i.e., any which contain objects that were created partially or fully using data from a renewed treeheader)  are also renewed. It is up to the user to make sure that what is done is consistent.

VERY IMPORTANT NOTE:  there is no checking done to ensure that a given event occurs in a given collection only once. If you run a job that reads events and writes them to a non-identical collection, and then you run the job again, the same events will be added to the output collection a second time.

Case 3:  "Pointer copy only".    (NOT YET IMPLEMENTED)

In this case, the input collection and output collection are different, as in case 2, but the only action taken is to add pointers to the existing events to the output collection. No event extension or cloning occurs.

Example:  Renewing Treeheaders created by Bear.

Assume you have run a processing pass with SimAppApp followed by BearApp, where  each wrote to the same collection "A".  This could be written as:

xdr file -> SimApp -> Collection A -> BearApp -> Collection A

Then assume you follow this by attempting to run BearApp again reading from collection "A" and writing to a different collection "B":

Collection A -> BearApp ->  Collection B

Then you will encounter an "object collision" failure something like this:

** Fatal Error #2012015: BdbGenericHdr::put(): Collision: object type "DchHitListP" with key "Default" already exists in dch event  header

This happens because the treeheaders from collection "A" have been cloned into the output event, and they already contain this object
from the first processing pass.  To avoid this problem, you must renew the entire set of headers that BearApp will fill.  Typically you
need to include the following in your tcl:

     > module talk BdbEventOutput
     >     renewHdrList set "drc ifr trk bta svt emc dch"
     > exit

Another Important Note:  For this treeheader renewal to work, the treeheaders populated by each processing pass (e.g., SimApp, BearApp,  Beta, etc) must be disjoint between the processing passes.  For example, if BearApp and Beta should share a particular treeheader in the sense that they both insert objects into it, then it will be impossible to rerun Beta without rerunning Bear because there would be no way to wipe Beta's output without also wiping Bear's output in that treeheader as well.

Work In progress:

It has been asked whether it is possible to make event skims, i.e., to output only selected events to an output collection which is different than the input collection. Presently in the Framework this is not possible, except via a possible (untested) kludge that involves disabling the output module, then adding it to a path after a filter module.

A possibly useful extension would be to have a several output modules, each writing to a different collection to perform multiple skims in one pass. It is possible in such cases that the same event might need to be written to more than one output collection.

This also has implications regarding clustering, and whether or not this is feasible only at the output of PR.