Collections, Files and Borrowing

(The Good, the Bad and the Ugly)


In classic Kanga, there was little distinction between a collection and a file. You might use a "collection" like:
 
/groups/skims/isPhysicsEvents/xxxxx/KanGA/AllEventsKanga
If you add that collection to the input module in your Beta application, you would wind up reading the file:
 
$BFROOT/kanga/EventStore/groups/skims/isPhysicsEvents/xxxxx/KanGA/AllEventsKanga-micro.root
That is to say, a "collection" and a "file" had a one-to-one relationship in classic Kanga. (In implementation it was actually just a single ROOT TTree within that file.)

In the new CM2 Kanga implementation we wanted to do several things beyond what existed in classic Kanga. Two that I will discuss here are: These have a few consequences for the users which will be explained in what follows. (Please be patient, this is the first iteration of this writeup. I'll improve it in a 2nd iteration based on any feedback you can give me.)

Multiple data "components" in multiple files

You have most probably seen this if you have run Moose. If you specify an output collection like:
 
/work/users/elmer/pre14.2.1/mooseevents4
you will most likely find that two files are created:
 
moosevents.01.root
moosevents.02E.root
Looking more closely at these files with the KanFileUtil utility, you can see that there are multiple "components", some clustered into one file, some clustered into the other file:
 
shire03> KanFileUtil -L -tall /work/users/elmer/pre14.2.1/mooseevents4.01.root 
   TREE        aod  tid=  4  cycle=  1  entries= 200
   TREE  aod__Meta  tid=  4  cycle=  2  entries= 1
   TREE        cnd  tid=  3  cycle=  1  entries= 200
   TREE  cnd__Meta  tid=  3  cycle=  2  entries= 1
   TREE        hdr  tid=  0  cycle=  1  entries= 200
   TREE  hdr__Meta  tid=  0  cycle=  2  entries= 1
   TREE        tag  tid=  2  cycle=  1  entries= 200
   TREE  tag__Meta  tid=  2  cycle=  2  entries= 1
   TREE        tru  tid=  5  cycle=  1  entries= 200
   TREE  tru__Meta  tid=  5  cycle=  2  entries= 1
shire03> KanFileUtil -L -tall /work/users/elmer/pre14.2.1/mooseevents4.02E.root
/nfs/kan001/vol6//work/users/elmer/pre14.2.1/mooseevents4.02E.root
   TREE        esd  tid=  6  cycle=  1  entries= 200
   TREE  esd__Meta  tid=  6  cycle=  2  entries= 1
Those components "clustered" into the .01 file are the "micro", while the component in the .02E file is the "remainder" of the mini. (Recall that the micro is a subset of the mini.) The motivation for this is to allow you to import (or keep on disk) the micro without the rest of the mini.

If you wrote enough events (such that these files approached 2GB, ROOT has a 2GB filesize limit) you would eventually see additional files name like:
 
moosevents.03E.root
moosevents.04HBCAT.root
and so on.

The bottom line here is that the relationship between "collection" and "files" is no longer one-to-one.

Borrowing of components from input collections

In the skimming, for example, the application reads one or more collections (from PR or SP) and writes a number of output skim collections. For each skim you have a number of possibilities for what you write: For example, I wrote a collection with Elf (as we in PR):
 
shire03> KanCollUtil /work/users/elmer/pre14.2.1/elfevents3
/work/users/elmer/pre14.2.1/elfevents3 (713 events)
and then ran the skim application on it. Among the output collections were examples of each of the above types of skim output. One example of "deepCopyMicro" is the output collection:
 
   /work/users/elmer/pre14.2.1/031211-15:23:24/BSemiExclKan
Now I will use again the KanCollUtil utility to illustrate something more about how the new eventstore is different from the old:
 
shire03> KanCollUtil -L -f -n 5 /work/users/elmer/pre14.2.1/031211-15:23:24/BSemiExclKan
/work/users/elmer/pre14.2.1/031211-15:23:24/BSemiExclKan (8 events)
   EVT 000001  hdr=000:0 usr=000:0 tag=000:0 cnd=000:0 aod=000:0 esd=001:117 
   EVT 000002  hdr=000:1 usr=000:1 tag=000:1 cnd=000:1 aod=000:1 esd=001:125 
   EVT 000003  hdr=000:2 usr=000:2 tag=000:2 cnd=000:2 aod=000:2 esd=001:173 
   EVT 000004  hdr=000:3 usr=000:3 tag=000:3 cnd=000:3 aod=000:3 esd=001:207 
   EVT 000005  hdr=000:4 usr=000:4 tag=000:4 cnd=000:4 aod=000:4 esd=001:283 
   LFN 000  /work/users/elmer/pre14.2.1/031211-15:23:24/BSemiExclKan.01.root (owned)
   LFN 001  /work/users/elmer/pre14.2.1/elfevents3.02E.root (borrowed)
Each event can consist of multiple components in multiple files (as described above) and we need to keep track of that. This is done in the "event header" and the command above just prints this for the first five events. The things to note are: Here is an example output for another class of output skims (a so-called "pointer" skim):
 
shire03> KanCollUtil -L -f -n 5 /work/users/elmer/pre14.2.1/031211-15:23:24/BRecoToDDstarKan
/work/users/elmer/pre14.2.1/031211-15:23:24/BRecoToDDstarKan (62 events)
   EVT 000001  hdr=000:0 usr=000:0 tag=000:0 cnd=000:0 aod=001:15 esd=002:15 
   EVT 000002  hdr=000:1 usr=000:1 tag=000:1 cnd=000:1 aod=001:18 esd=002:18 
   EVT 000003  hdr=000:2 usr=000:2 tag=000:2 cnd=000:2 aod=001:21 esd=002:21 
   EVT 000004  hdr=000:3 usr=000:3 tag=000:3 cnd=000:3 aod=001:26 esd=002:26 
   EVT 000005  hdr=000:4 usr=000:4 tag=000:4 cnd=000:4 aod=001:35 esd=002:35 
   LFN 000  /work/users/elmer/pre14.2.1/031211-15:23:24/BRecoToDDstarKan.01.root (owned)
   LFN 001  /work/users/elmer/pre14.2.1/elfevents3.01.root (borrowed)
   LFN 002  /work/users/elmer/pre14.2.1/elfevents3.02E.root (borrowed)
In this example:

Not finished... More to come....

Last modified 4-May-2004, Peter.Elmer@cern.ch