In classic Kanga, there was little distinction between a collection and a
file. You might use a "collection" like:
/groups/skims/isPhysicsEvents/xxxxx/KanGA/AllEventsKanga
If you add that collection to the input module in your Beta application, you
would wind up reading the file:
$BFROOT/kanga/EventStore/groups/skims/isPhysicsEvents/xxxxx/KanGA/AllEventsKanga-micro.root
That is to say, a "collection" and a "file" had a one-to-one relationship
in classic Kanga. (In implementation it was actually just a single ROOT TTree
within that file.)
In the new CM2 Kanga implementation we wanted to do several things beyond
what existed in classic Kanga. Two that I will discuss here are:
- Allow for multiple data "components" distributed over multiple files
- Allow for "borrowing" of components
These have a few consequences for the users which will be explained in what
follows. (Please be patient, this is the first iteration of this writeup. I'll
improve it in a 2nd iteration based on any feedback you can give me.)
Multiple data "components" in multiple files
You have most probably seen this if you have run Moose. If you specify
an output collection like:
/work/users/elmer/pre14.2.1/mooseevents4
you will most likely find that two files are created:
moosevents.01.root
moosevents.02E.root
Looking more closely at these files with the KanFileUtil utility, you can
see that there are multiple "components", some clustered into one file, some
clustered into the other file:
shire03> KanFileUtil -L -tall /work/users/elmer/pre14.2.1/mooseevents4.01.root
TREE aod tid= 4 cycle= 1 entries= 200
TREE aod__Meta tid= 4 cycle= 2 entries= 1
TREE cnd tid= 3 cycle= 1 entries= 200
TREE cnd__Meta tid= 3 cycle= 2 entries= 1
TREE hdr tid= 0 cycle= 1 entries= 200
TREE hdr__Meta tid= 0 cycle= 2 entries= 1
TREE tag tid= 2 cycle= 1 entries= 200
TREE tag__Meta tid= 2 cycle= 2 entries= 1
TREE tru tid= 5 cycle= 1 entries= 200
TREE tru__Meta tid= 5 cycle= 2 entries= 1
shire03> KanFileUtil -L -tall /work/users/elmer/pre14.2.1/mooseevents4.02E.root
/nfs/kan001/vol6//work/users/elmer/pre14.2.1/mooseevents4.02E.root
TREE esd tid= 6 cycle= 1 entries= 200
TREE esd__Meta tid= 6 cycle= 2 entries= 1
Those components "clustered" into the .01 file are the "micro", while the
component in the .02E file is the "remainder" of the mini. (Recall that the
micro is a subset of the mini.) The motivation for this is to allow you to
import (or keep on disk) the micro without the rest of the mini.
If you wrote enough events (such that these files approached 2GB, ROOT has
a 2GB filesize limit) you would eventually see additional files name like:
moosevents.03E.root
moosevents.04HBCAT.root
and so on.
The bottom line here is that the relationship between "collection" and "files"
is no longer one-to-one.
Borrowing of components from input collections
In the skimming, for example, the application reads one or more collections
(from PR or SP) and writes a number of output skim collections. For each
skim you have a number of possibilities for what you write:
- tagOnly
- pointer
- deepCopyMicro
- deepCopyMini
For example, I wrote a collection with Elf (as we in PR):
shire03> KanCollUtil /work/users/elmer/pre14.2.1/elfevents3
/work/users/elmer/pre14.2.1/elfevents3 (713 events)
and then ran the skim application on it. Among the output collections were
examples of each of the above types of skim output. One example of
"deepCopyMicro" is the output collection:
/work/users/elmer/pre14.2.1/031211-15:23:24/BSemiExclKan
Now I will use again the KanCollUtil utility to illustrate something more
about how the new eventstore is different from the old:
shire03> KanCollUtil -L -f -n 5 /work/users/elmer/pre14.2.1/031211-15:23:24/BSemiExclKan
/work/users/elmer/pre14.2.1/031211-15:23:24/BSemiExclKan (8 events)
EVT 000001 hdr=000:0 usr=000:0 tag=000:0 cnd=000:0 aod=000:0 esd=001:117
EVT 000002 hdr=000:1 usr=000:1 tag=000:1 cnd=000:1 aod=000:1 esd=001:125
EVT 000003 hdr=000:2 usr=000:2 tag=000:2 cnd=000:2 aod=000:2 esd=001:173
EVT 000004 hdr=000:3 usr=000:3 tag=000:3 cnd=000:3 aod=000:3 esd=001:207
EVT 000005 hdr=000:4 usr=000:4 tag=000:4 cnd=000:4 aod=000:4 esd=001:283
LFN 000 /work/users/elmer/pre14.2.1/031211-15:23:24/BSemiExclKan.01.root (owned)
LFN 001 /work/users/elmer/pre14.2.1/elfevents3.02E.root (borrowed)
Each event can consist of multiple components in multiple files (as
described above) and we need to keep track of that. This is done in the
"event header" and the command above just prints this for the first five
events. The things to note are:
- For each event you can see that it knows about a number of "components"
- Next to each component is a code which keeps track of where the relevant
data is located. A code like "tag=000:3" simply means that that tag data
for this event is in a TTree called "tag" in file 000 and the data is
the 3rd entry in the TTree.
- Almost all of the components (hdr+usr+tag+cnd+aod) are in file 000. This
is what is expected as those components constitute the micro and this
skim was requested to be "deepCopyMicro".
- For the esd component, the event headers point sparsely to events in
a file associated with the input Elf/PR collection. The ability to do
is is new to the CM2 Kanga and is what we mean by "borrowing".
Here is an example output for another class of output skims (a so-called
"pointer" skim):
shire03> KanCollUtil -L -f -n 5 /work/users/elmer/pre14.2.1/031211-15:23:24/BRecoToDDstarKan
/work/users/elmer/pre14.2.1/031211-15:23:24/BRecoToDDstarKan (62 events)
EVT 000001 hdr=000:0 usr=000:0 tag=000:0 cnd=000:0 aod=001:15 esd=002:15
EVT 000002 hdr=000:1 usr=000:1 tag=000:1 cnd=000:1 aod=001:18 esd=002:18
EVT 000003 hdr=000:2 usr=000:2 tag=000:2 cnd=000:2 aod=001:21 esd=002:21
EVT 000004 hdr=000:3 usr=000:3 tag=000:3 cnd=000:3 aod=001:26 esd=002:26
EVT 000005 hdr=000:4 usr=000:4 tag=000:4 cnd=000:4 aod=001:35 esd=002:35
LFN 000 /work/users/elmer/pre14.2.1/031211-15:23:24/BRecoToDDstarKan.01.root (owned)
LFN 001 /work/users/elmer/pre14.2.1/elfevents3.01.root (borrowed)
LFN 002 /work/users/elmer/pre14.2.1/elfevents3.02E.root (borrowed)
In this example:
- The usr+tag+cnd components have been written into the skim itself, while
aod+esd are being "borrowed" from the collection which was input to
the skim application.
Not finished... More to come....