The Pico Analysis Framework (PAF) is a ROOT based framework to perform physics analyses based on ROOT files. PAF leverages the full power of the ROOT system and is a light-weight product that allows to reuse Kanga data files together with the original Beta analysis libraries. Its main emphasis lies in the interactive regime, hence it is designed for speed. Using a 400 MHz PC the event loop execution frequency ranges between 500 kHz (read no data) down to 2 kHz if you read the complete TAG and micro information. For example: running a tag based selection on two-photon events proceeds in the order of 50 kHz, particle selectors work with up to 30 kHz. On a modern PC it should thus be possible to do data-mining in a set of up to 100 million events in less than 1 hour, i.e., it is possible to start an analysis from the complete dataset in order to learn the correct settings for a perfect skim. The main motivation to develop PAF was twofold
Further advanced features of the PAF system include (Those features are not covered in this basic tutorial):
Several analyses are currently performed with PAF, see e.g., http://www.slac.stanford.edu/~helmut/dsk0k0pi/dsk0k0pi07072k.html or http://www.slac.stanford.edu/BFROOT/www/Organization/CollabMtgs/2000/feb/thursday/miftakov.pdf.
PAF has been mainly developed by the german collaborators and is currently supported by Bochum University. While it is an interesting tool, it is not yet verified for official BaBar use.
This tutorial demonstrates how to run a very simple PAF/Beta job in various modes:
and then, how to run a longer, more complicated job in batch (Exercise 5). The output of this last job can be used with ROOT. The tutorial finishes with homework.
I am assuming that you are already familiar with the ROOT tutorial and the BetaKanga tutorial documentation.
This tutorial is created partially from material found in the BaBar Offline Workbook, the Kanga Home Page, and the Physics Analysis Recipes.
To execute this tutorial successfully, you will need:
At the end of each section you find links to the relevant source codes in case you want to download rather than edit (or just compare in case of trouble).
In Exercise 1 we will:
Let's begin:
PAF> setenv ROOTVER 2.23-12
PAF> cvs -d $BFROOT/unsupported/PAF/repo co PAFMakefiles PAF> PAFMakefiles/addPAF PAF> PAFMakefiles/buildPAF
As this might take some time, a faster way is to alternatively clone my PAF installation rather than doing the addPAF and buildPAF:
PAF> PAFMakefiles/clonePAF
Although PAF does not necessarily need it (programs run out of everywhere), you might want to add and initialize workdir:
PAF> addpkg workdir PAF> gmake workdir.setup
PAF> cp $BFROOT/www/doc/workbook/examples/NTrkExample/NTrkExample.hh PAFUser/ PAF> cp $BFROOT/www/doc/workbook/examples/NTrkExample/NTrkExample.cc PAFUser/
A thorough discussion of the example program is available in the BABAR workbook (NTrkExample.hh, NTrkExample.cc). As we are running the program in the ROOT based PAF framework, however, we have to apply slight modifications to the original program following the guideline in PAF.Beta.html.
First of all, the header files referring to the BABAR framework are not needed any more. Hence we can comment them out safely in PAFUser/NTrkExample.hh and replace them with a few additional PAF header files:
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<-- //------------------------------- // Collaborating Class Headers -- //------------------------------- //#include "Framework/APPModule.hh" //#include "Framework/AbsParmIfdStrKey.hh" #include "PAFAdapters/PAFBbrModule.hh" #include "AbsParm/AbsParmIfdStrKey.hh" #include "PAFAdapters/BbrSyntax.hh" --8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
Secondly, we have to edit PAFUser/NTrkExample.cc and move this class's header to the very end of the include section (take care of renaming the directory to PAFUser):
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<-- //----------------------------------------------------------------------- // Local Macros, Typedefs, Structures, Unions and Forward Declarations -- //----------------------------------------------------------------------- #include "PAFAdapters/PAFBbrAnalysis.hh" #include "PAFUser/NTrkExample.hh" --8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<-- HepTupleManager* manager = TPico::Instance()->GetPersistenceManager(); --8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
PAF> cp PAFUser/WorkBook1Main.cc PAFUser/NTrkExampleMain.cc
Then edit NTrkExampleMain.cc and replace each occurrence of WorkBook1 with NTrkExample.
BINS := NTrkExampleMain PAFroot WorkBook1Main ...
PAF> bsub -q bldrecoq -o Exercise1.log gmake PAFUser.lib PAFUser.bin
.tcl to select the input collection (we will use run
12125). Here we use the skimData
program to query the BABAR run database:
PAF> cd workdir workdir> skimData -s AllEventsKanga -g 12125 -tThis will produce the file
AllEventsKanga.tcl, which we'll use to
define the input. Edit a new file myAnalysis.tcl which holds the following
lines:
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<-- #This is an example TCL file for PAF # 1) Framework related settings # Define the location of the event store (overrides environment) store set /afs/slac.stanford.edu/g/babar/kanga/EventStore # Define the location of the parameter store (overrides environment) param set /afs/slac.stanford.edu/g/babar/unsupported/PAF # Define which conditions database to use RootConditionsFile set /afs/slac.stanford.edu/g/babar/kanga/CondDB/BaBarConditions.root # Set the magnetic field FixedFieldStrength set 1.51007 # Read a TCL file to define the collections tcl set AllEventsKanga.tcl # 2) Module related settings # Define which tracks to use module talk NTrkExample trackCandidates set chargedDefault exit --8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
workdir> NTrkExampleMain -nev 10000
workdir> PAFroot
...
root [0] TFile f("NTrkExample.root"); (read in the file)
root [1] TBrowser t; (display the histogram)
root [2] .q (exit Root)
In PAF setting of parameters can be achieved in three ways. Assume you want to set the strength of the magnetic field. This can be done:
The order of execution (or override) is such that the command line input ranks highest. The most versatile usage is option 1 which allows us to leverage the full power of a high-level programming language to set up and run an application interactively as a macro (see exercise 2). PAF knows four types of parameters: bool, int, double and string.
If you want to address the parameter of a specific module, you can make use of the C++ scope operator to identify the target:
workdir> NTrkExampleMain -nev 10000 -NTrkExample::trackCandidates chargedDefault
Framework related parameters do not need a special scope (although Framework::FixedFieldStrength works). New parameters can be added everywhere dynamically, even from the command line. When the program finishes, all parameter settings that were in actual use are recorded to a file PAFAnalysisSave.tcl for book keeping purposes: The file can be printed or used as TCL input in a further run.
Warning: CINT is not perfect. In some cases it turns out that the
interpreter is not able to properly choose amongst polymorphic functions. The
recommendation therefore is to not use the generic SetParm(key,value) in
a C++ macro but the explicit instances SetBoolParm(key,value),
SetIntParm(key,value), SetDoubleParm(key,value) to nail down the correct
parameter type. The same holds if you plan to retrieve parameter values with
GetParm(key): Use the specific instances GetBoolParm(key), GetIntParm(key),
GetDoubleParm(key) instead.
Working material for this section: NTrkExampleMain.cc, NTrkExample.hh, NTrkExample.cc, myAnalysis.tcl, AllEventsKanga.tcl
PAF> cd PAFUser PAFUser> cp NTrkExample.hh NTrkExample.rdl
Next, we have to edit NTrkExample.rdl and add the following C++ ClassDef macro at the very end of our class definition:
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<-- private: AbsParmIfdStrKey _btaChargedList;//! Do not stream HepHistogram* _numTrkHisto;//! Do not stream public: ClassDef(NTrkExample,1) //Simple Beta example }; --8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<---
As is shown above, we have to put an additional special comment after the data member definitions to tell ROOT that those entries are transient only and should not go into a file (This is because ROOT does not know how to persist a HepHistogram and would bail out). Another general recommendation and a word of warning: Avoid to include BABAR headers in a .rdl file as most BABAR headers are not CINTable and yield strange problems. The same holds for RogueWave. CINT is only 95% C++ and most programmers in BABAR like to write 105% C++! A forward class declaration in place of the include does the job in most cases and is just fine with CINT.
Save the file and edit NTrkExample.cc. Here we have to add another macro at the end of the include section that will be expanded to yield the dictionary implementation:
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<-- ClassImp(NTrkExample) --8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
Edit the file PAFUser_LinkDef.rdl that holds the names of the classes and functions that you want to use interactively. Add an entry for NTrkExample:
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<-- #pragma link C++ class NTrkExample; --8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
Now we are ready to build the interactive PAFUser library. Note that we do not need to link an executable as we want to use the interpreter.
PAFUser> cd .. PAF> bsub -q bldrecoq -o Exercise3.log gmake PAFUser.rootlib
Next, we have to modify our main driver program in order to use it as a named C++ macro. You might want to execute the named macro from workdir. Copy the main program and edit it as follows:
PAFUser> cd workdir workdir> cp ../PAFUser/NTrkExampleMain.cc .
The following clever conditional extension allows to execute the program as a macro and at the same time preserves the ability to compile and link it to an executable, just as we did in the previous exercise. Your file should look like this:
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
// This little main program drives the analysis:
// Parameters are set using SetParm instead of TCL...
#ifndef __CINT__
#include "PAFAdapters/PAFBbrAnalysis.hh"
#include "PAFUser/NTrkExample.hh"
Int_t NTrkExampleMain(int argc, char**argv);
Int_t main(int argc, char**argv) { return NTrkExampleMain(argc, argv); }
#endif
Int_t NTrkExampleMain(int argc=0,char* argv[]=0)
{
// Create an application manager and register services
TPico PAF("Sample Beta Analysis");
// Define output file for histograms etc.
PAF.SetPersistenceManager(new TPicoPersistenceManager("NTrkExample.root"));
// Instantiate the analysis module and set some parameters
PAFModule *theModule = new NTrkExample("NTrkExample","Sample Beta Module");
// Instantiate an analysis and pass the command line
PAFBbrAnalysis *myAnalysis = new PAFBbrAnalysis(argc,argv);
// Set default parameters for the framework
// Read commands from a TCL file
myAnalysis->SetParm("RootConditionsFile","$BFROOT/kanga/CondDB/BaBarConditions.root");
myAnalysis->SetParm("tcl","myAnalysis.tcl");
myAnalysis->SetIntParm("nev",10000); // Run 10000 events
myAnalysis->Add(theModule); // Add the analysis module
PAF.RegisterService((PAFAnalysis*) myAnalysis); // Execute this analysis
PAF.Run(); // Run the analysis
return 0;
}
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
Save the NTrkExampleMain.cc file. Note that CINT wants an explicit cast from PAFBbrAnalysis to PAFAnalysis and SetIntParm() to set the number of events rather than the polymorphic SetParm().
Now we are ready to run the NTrkExample macro from the C++ interpreter! First of all, launch a root session. The PAFroot executable is linked vs. PAF and Beta already and pre-loads the corresponding libraries:
workdir> PAFroot
At the ROOT command prompt, load the PAFUser library, check with the .class directive that our NTrkExample class actually arrived in the interpreter and execute the macro; wait until the job finishes and display the histogram using the object browser:
workdir> PAFroot
root [0] loadSrtLib("PAFUser")
root [1] .class NTrkExample
...
root [2] .x NTrkExampleMain.cc
...
root [3] TFile f("NTrkExample.root"); (read in the file)
root [4] TBrowser t;
root [5] .q
Working material for this section: NTrkExampleMain.cc, NTrkExample.rdl, NTrkExample.cc, PAFUser_LinkDef.rdl
In Exercise 3 we will:
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<-- # Do a fast filtering based on the tag (Do not use blanks in expression) module talk Framework tagcut set TAG_i_nTracks>=6&&TAG_i_nGoodTrkTight>=4&&TAG_b_aJpsiCand exit --8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
or on the command line if you start the program in your workdir, hyphenating the tagcut setting:
workdir> NTrkExampleMain -nev 10000 -tagcut "TAG_i_nTracks>=6&&TAG_i_nGoodTrkTight>=4&&TAG_b_aJpsiCand"
The dynamic cut works at a speed of up to 40 kHz with Kanga files and up to 100 kHz with PAF native files.
The speed of the micro can be tuned by using PAF classes that
make use of
so-called smart members (A PAF invention). The basic idea is that data stay
untouched on disk or tape unless they come into use. Let us replace the
BtaCandidate with its PAF twin, the PAFCandidate and the HepAList with its PAF
twin, the PAFList. Edit NTrkExample.cc and make the following modification:
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<-- // get list of input track candidates //HepAList<BtaCandidate>* trkList; //getTmpAList (anEvent, trkList, _btaChargedList.value() ); //histogram number of tracks in event //_numTrkHisto->accumulate( trkList->length() );
// get list of input track candidates static PAFList *pafList = new PAFList; TPico::Instance()->GetEventManager()->FillCharged(pafList); //histogram number of tracks in event _numTrkHisto->accumulate( pafList->GetNumberOfCandidates() ); --8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
Compile, link and run the program:
PAF> bsub -q bldrecoq -o Exercise2.log gmake PAFUser.lib PAFUser.bin
Run the Program in your workdir:
workdir> NTrkExampleMain -nev 10000
The program now runs at approximately five times the speed as compared to the original situation.
workdir> Skim -tcl AllEventsKanga.tcl -nev 10000 -Skim::name $MYWORK/test
Note that it is possible to override the name parameter for the output file from the command line interactively, using the C++ scope operator in conjunction with the module name (This is some kind of magic and works with each module's parameters): In order to avoid conflicts with your AFS quota, the Skim has been directed to write the output for 10000 events into your work area (The files consume about 20 MB). Here you find two files testTag.root and testAod.root that you can now use to run your program. In order to do so, you might want to change the name of the collection in your AllEventsKanga.tcl file to $MYWORK/test. However, as we want to look at a single run only it is enough to comment out the tcl directive in the myAnalysis.tcl file
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<-- # Read a TCL file to define the collections #tcl set AllEventsKanga.tcl --8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
and specify the input collection on the fly using PAF's file parameter instead
workdir> NTrkExampleMain -file $MYWORK/test
Starting at about 100 Hz using the original implementation, the track
lists are now filled at a speed of more than 2000 Hz from the micro after
tuning.
Working material for this section: NTrkExample.cc
// get list of input track candidates HepAList<BtaCandidate>* trkList; trkList = Ifd< IfdHepAList<BtaCandidate> >::get(anEvent, "BtaMicroCandidates" );
This does the same like the antique form
getTmpAList(anEvent, trkList, "BtaMicroCandidates" );´
In PAF, both instances of data access are supported and all relevant ProxyDict entries are generated if you request classic ProxyDict support with the boolean command line option "-proxy true".
Now let's extend our NTrkExample to generate a momentum spectrum of charged tracks. Edit PAFUser/NTrkExample.rdl to add a further histogram:
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<-- HepHistogram* _pHisto;//! Do not stream --8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
Then add a few lines of code to the beginJob() function in PAFUser/NTrkExample.cc
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
// book the momentum distribution histogram
_pHisto = manager->histogram("momentum", 100, 0.0, 5.0 );
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
and a few lines of ProxyDict and looping code to the event() function (at the place where you commented out beforehand)
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
// get list of input track candidates
HepAList<BtaCandidate>* trkList;
//getTmpAList (anEvent, trkList, _btaChargedList.value() );
trkList = Ifd< IfdHepAList<BtaCandidate> >::get(anEvent, _btaChargedList.value());
//histogram number of tracks in event
_numTrkHisto->accumulate( trkList->length() );
// Loop over track candidates to plot momentum
HepAListIterator<BtaCandidate> iterTrk(*trkList);
BtaCandidate* trk;
while ( 0 != ( trk = iterTrk()) ) {
_pHisto->accumulate( trk->p() );
}
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
Compile and link the executable
PAF> bsub -q bldrecoq -o Exercise4.log gmake PAFUser.lib PAFUser.bin
Run the Program in your workdir (Edit myAnalysis.tcl first and reactivate the line "tcl set AllEventsKanga.tcl" to read the Kanga file):
workdir> NTrkExampleMain -nev 1000 -NTrkExample::proxy true
In addition to the classical ProxyDict, PAF has implemented a smart ProxyDict that generates the entries automatically if and only if they are requested (This goes considerably faster than generating entries deliberately over and over again for each event). In order to make use of the smart ProxyDict, edit PAFUser/NTrkExample.cc and make the following modification:
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<-- #include "PAFAdapters/PAFIfd.hh" ...
trkList = PAFIfd< IfdHepAList<BtaCandidate> >::Get(anEvent, _btaChargedList.value() ); --8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
That's it (Note the capital "Get"). You can now build and execute the program without the -proxy option.
The PAF ProxyDict is even smart enough to dynamically generate candidate lists on actual demand using a TPicoPidSelector or a BetaPidSelector, i.e. the following works:
#include "PAFBetaPid/PAFPidMuonMicroSelector.hh"
...
static PAFPidMuonMicroSelector muonSelector("muonSelector");
trkList = PAFIfd< IfdHepAList<BtaCandidate> >::Get(muonSelector,
"myMuons" );
Note that the execution of the muon selector is triggered if and only if you actually need the muon list. After generation of the list it is put into the proxy dictionary and can be referred to in other modules using the name "myMuons" (NB. PAFPidMuonMicroSelector is just a wrapper class to give interactive access to the original BABAR PidMuonMicroSelector).
You may have noticed that the program does not go very fast (30 Hz with Ifd, about 100 Hz with PAFIfd). As we already saw in the tuning section, you might gain speed using a PAF replacement for the candidate loop. Add the corresponding header files, comment out the ProxyDict call and replace the loop with
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<-- #include "PAFManager/PAFListIterator.hh" #include "PAFManager/PAFCandidate.hh" ...
// get list of input track candidates
//HepAList<BtaCandidate>* trkList;
//trkList = PAFIfd< IfdHepAList<BtaCandidate> >::get(anEvent, _btaChargedList.value());
//histogram number of tracks in event
//_numTrkHisto->accumulate( trkList->length() );
// Loop over track candidates to plot momentum
//HepAListIterator<BtaCandidate> iterTrk(*trkList);
//BtaCandidate* trk;
//while ( 0 != ( trk = iterTrk()) ) {
// _pHisto->accumulate( trk->p() );
//}
// get list of input track candidates
static PAFList *pafList = new PAFList;
TPico::Instance()->GetEventManager()->FillCharged(pafList);
//histogram number of tracks in event
_numTrkHisto->accumulate( pafList->GetNumberOfCandidates() );
// Loop over track candidates to plot momentum
PAFListIterator iterTrk(*pafList);
PAFCandidate* trk;
while ( 0 != ( trk = iterTrk.Next()) ) {
_pHisto->accumulate( trk->p() );
}
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
Rebuild the program. The program should now run considerably faster, around
500 Hz from a Kanga file and at 2200 Hz from PAF files.
Working material for this section: NTrkExample.hh,
NTrkExample.cc
JpsiK0sTagFilter.cc, JpsiK0sMicroFilter.cc
and JpsiK0sAnalysis.cc from the BetaExamples package. You find an already PAForized version of the files in the
PAFUser directory together with the main driver program PAFUser/JpsiK0sMain.cc.
Edit the driver program, disable the tag filter and enable the micro
filter. As you see in the code fragment, a sequence in PAF is simply defined by
sequentially adding modules to the analysis.--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
PAFModule *tagModule = new JpsiK0sTagFilter("JpsiK0sTagFilter","Tag filter beta module");
tagModule->Disable();
PAFModule *microModule = new JpsiK0sMicroFilter("JpsiK0sMicroFilter","Micro filter beta module");
microModule->Enable();
PAFModule *analModule = new JpsiK0sAnalysis("JpsiK0sAnalysis","Analysis beta module");
analModule->Disable();
// Instantiate an analysis and pass the command line
PAFBbrAnalysis *myAnalysis = new PAFBbrAnalysis(argc,argv);
myAnalysis->SetParm("tcl","myAnalysis.tcl");
myAnalysis->Add(tagModule); // Add the tag filter module
myAnalysis->Add(microModule); // Add the micro filter module
myAnalysis->Add(analModule); // Add the analysis module
PAF.RegisterService((PAFAnalysis*)myAnalysis); // Execute this analysis
PAF.Run(); // Run the analysis
--8<--8<--8<--8<--8<--8<-- Code Snippets --8<--8<--8<--8<--8<--8<--
PAF> bsub -q bldrecoq -o Exercise5.log gmake PAFUser.lib PAFUser.bin
workdir directory.
PAF> cd workdirTo run the job with the same collection as defined above, type:
workdir> bsub -q kanga -o Exercise5.log JpsiK0sMain -tcl AllEventsKanga.tcl -nev 1000The batch queue
kanga is specially set up to have access to a
local copy of the kanga datasets, and therefore runs more efficiently.
JpsiK0s.root. Working material for this section: JpsiK0sMain.cc
Version 1.6, Last modified: 18 August 2000