Event Reprocessing: Overall Scheme
16 March 1999
Ray F. Cowan
There are two possibilities at present regarding event reprocessing:
event "extension" and event "cloning". A third case is under consideration,
which simply adds a pointer to an existing event to the output collection.
Case 1: "Input collection == Output
collection".
If the input and output collections are the same collection, then
any new, unique persistent objects produced during processing will be added
to the collection. This is called event extension. This happens
whenever the collectionName's as specified
to the input and output modules are identical.
These new objects are added to the existing treeheaders in the event
(if this is the first object to be placed in a given treeheader, the treeheader
itself will be created too) . This is subject to the requirement that each
object be unique within its treeheader. Object identity is determined
by the Objectivity ooTypeNumber of
the class, combined with the value of an optional "secondary key" character
string. If an attempt is made to put an object into a treeheader where
an object of the same class and with the same secondary key string already
exists, an assert failure will occur. In other words, if you run
a processing stage and write to the input collection, then run the same
job again, the second pass will fail: the newly-recreated objects already
exist in the treeheaders.
Case 2: "Input collection!= Output
collection".
If the input and output collections are different collections, then
any persistent objects to be stored will be "written" to the output collection,
without changing the input collection. This is called event cloning.
This happens whenever the collectionName's
as specified to the input and output modules are not identical.
A brief digression regarding event structure as seen by the persistent
database is necessary to explain cloning. An "event collection" is a set
of persistent pointers to persistent "event objects" in the database. An
"event object" is a tree structure with the top node in the tree being
a BdbEvent object. This top node
contains a list of persistent pointers to treeheaders. There is
one tree header for each data domain within the event (eg tru, raw, sim,
drc, ifr, bta, etc.). A treeheader object contains a list of pointers
to persistent data objects, which contain the actual data for the event.
See the figure.
This means that there is a choice to be made when we "write" an event
to an output collection which is different from the input collection: we
can either copy actual data objects and replicate the entire event afresh
along with any new objects which were added during this processing pass,
or we can place pointers in the output event which point to the original
data objects in the input event and only create new objects in the database
for the objects which were created in this processing pass. Or we can do
something in between these two extremes.
What we have chosen to do is to allow the user to select which objects
to incorporate in the new output event as pointers, and which ones to create
afresh. For ease in management, this choice is made at the treeheader level.
The user may specify (as a tcl parameter to the output module) which headers
are to be renewed (ie., zero these headers completely and only add objects
to them which are produced in the current processing step). Any existing
header that is not renewed will merely be a pointer to the corresponding
header from the input event.
NOTE: when a treeheader is renewed, no checks are made
to ensure that any dependent treeheaders that contain data based on the
earlier version (i.e., any which contain objects that were created partially
or fully using data from a renewed treeheader) are also renewed.
It is up to the user to make sure that what is done is consistent.
VERY IMPORTANT NOTE: there is no checking done to ensure
that a given event occurs in a given collection only once. If you run a
job that reads events and writes them to a non-identical collection, and
then you run the job again, the same events will be added to the output
collection a second time.
Case 3: "Pointer copy only". (NOT YET IMPLEMENTED)
In this case, the input collection and output collection are different,
as in case 2, but the only action taken is to add pointers to the existing
events to the output collection. No event extension or cloning occurs.
Example: Renewing Treeheaders created by Bear.
Assume you have run a processing pass with SimAppApp followed by BearApp,
where each wrote to the same collection "A". This could be
written as:
xdr file -> SimApp -> Collection A -> BearApp
-> Collection A
Then assume you follow this by attempting to run BearApp again reading
from collection "A" and writing to a different collection "B":
Collection A -> BearApp -> Collection
B
Then you will encounter an "object collision" failure something like
this:
** Fatal Error #2012015: BdbGenericHdr::put():
Collision: object type "DchHitListP" with key "Default" already exists
in dch event header
This happens because the treeheaders from collection "A" have been cloned
into the output event, and they already contain this object
from the first processing pass. To avoid this problem, you must
renew the entire set of headers that BearApp will fill. Typically
you
need to include the following in your tcl:
> module talk
BdbEventOutput
>
renewHdrList set "drc ifr trk bta svt emc dch"
> exit
Another Important Note: For this treeheader renewal to
work, the treeheaders populated by each processing pass (e.g., SimApp,
BearApp, Beta, etc) must be disjoint between the processing passes.
For example, if BearApp and Beta should share a particular treeheader in
the sense that they both insert objects into it, then it will be impossible
to rerun Beta without rerunning Bear because there would be no way to wipe
Beta's output without also wiping Bear's output in that treeheader as well.
Work In progress:
It has been asked whether it is possible to make event skims, i.e.,
to output only selected events to an output collection which is different
than the input collection. Presently in the Framework this is not possible,
except via a possible (untested) kludge that involves disabling the output
module, then adding it to a path after a filter module.
A possibly useful extension would be to have a several output modules,
each writing to a different collection to perform multiple skims in one
pass. It is possible in such cases that the same event might need to be
written to more than one output collection.
This also has implications regarding clustering, and whether or not
this is feasible only at the output of PR.
|