SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo CM2 logo
HEPIC E,S & H Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews

Reskimming

Last Updated: 25 May 2004
Guglielmo De Nardo, SLAC


Introduction

Reskimming is one of the key functionalities of the Analysis Model. Users are expected to run over production skims, make a preselection suitable for their analysis and write the selected events in their AWG or their private output collections.
The output collection can borrow the Aod ( the "micro" ) and the Esd (the rest of the "mini" ) components from the original collection ( pointer skim ), therefore saving disk space, or can clone the components in the output collection ( deep-copy skim ), which is need in order to copy the collections to others sites later.
Either the skim is a pointer or a deep copy, the output collections can be augmented with additional custom informations: composite BtaCandidates created by the user analysis modules and user defined data, which are attached to the whole event or to the composite BtaCandidates. Further possible customizations will be discussed later.

Schematically, a reskimming job is set up to

Because of its nature and purpose, while skims production is managed centrally, the reskimming process will not.
Instead, Analysis Working Groups and final users have to properly configure their packages to do so. Reskimming is our Analysis Model replacement to the AWG massive N-tuple productions, which should be abandoned.

User packages and reskimming jobs

Existing user packages ( like BRecoUser, CharmUser and so on ) have to be upgraded to support reskimming. This is very easy to do, and it is limited to update the AppUserBuild.cc file to include the BetaMiniWriteSequence and update link_<PackageName>.mk to manage the new dependecies of the binary files.
If you already did something like this once, it should look just obvious to you. Since BetaMiniUser package already supports reskimming, you can use it as a reference.

To run a reskimming job, the users execute the user package application, the equivalent of BetaMiniApp in the user package, on a job configuration tcl file ( snippet ). The single snippet

For example, the following snippet

#-------- job configuration file  -------

set stream BToSomething
set ConfigPatch MC
set MCTruth true
    
set inputList input_collection
set outputCollection /work/users/user_id/output_collection

lappend outputBtaCandidates MyList1 MyList2 ... MyListN
lappend outputCndUsrBlocks  MyUserBlock1 MyUserBlock2 ... MyUserBlockN
set     outputEvtUsrBlocks  MyEventUserBlock

sourceFoundFile UserPackage/UserPackageProduction.tcl

#----------------------------------------
defines completely the configuration to run on a Monte Carlo collection and produce an output collection.

Notes

At the end of the snippet, the production script <UserPackage>/<UserPackage>Production.tcl is sourced, where UserPackage is the actual package being used. The purpose of the production file consists in

A production file that defines the interface used by the snippet example above should look like
#-------- UserPackage Production file  -------
sourceFoundFile ErrLogger/ErrLog.tcl
sourceFoundFile FrameScripts/FwkCfgVar.tcl
sourceFoundFile FrameScripts/talkto.tcl
sourceFoundFile FrameScripts/setProduction.tcl

FwkCfgListRequire inputList ;# input collection list FwkCfgVar stream ;# stream ( i.e. ) analysis to run FwkCfgVar outputCollection "" ;# Output collection name ;# if (empty) default value is overidden ;# the MiniWriteSequence is added by runSkim FwkCfgVar outputBtaCandidates "" ;# composite BtaCandidate lists to persist FwkCfgVar outputCndUsrBlocks "" ;# Candidate User Data Blocks to persist FwkCfgVar outputEvtUsrBlocks "" ;# Event User Data Blocks to persist FwkCfgVar FwkCfgVar components "deepCopyMicro" ;# components to write in the event store
sourceFoundFile BetaMiniSequences/BetaMiniSequence.tcl
runSkim $stream UserPackage $outputCollection $components $outputBtaCandidates / $outputCndUsrBlocks $outputEvtUsrBlocks #--------------------------------------------
Notes

The physics configuration file should look like

# ------ BToSomething Physics configuration file ------
FwkCfgVar extPar defaultval # set in the job config. snippet

sourceFoundFile CompositionSequences/WhateverINeed.tcl
sequence append BToSomethingPhysics WhateverINeed

module enable myModule # it can be the Filter that selects the events
talkto myModule {
  par1 set val1
  par2 set $extPar  # the value can be set in the job config. snippet
}
sequence append BToSomethingPhysics myModule
# -----------------------------------------------------

Notes

runSkim interface

runSkim	<stream> <package> 
            [outputColl] [components]
            [cndLists] [cndUsrData] [eventUsrData] 
            [cndConfigOptions]

<stream>
Identifies the analysis. runSkim creates a path named <stream>Path and append all the needed sequences to it.

<package>
Identifies the package name where the <stream>Physics.tcl file has to be looked for. if the <package>/<stream>Physics.tcl is found, a sequence named <stream>Physics is automatically created and added to the path.

[outputColl]
Sets the output collection name. If present, a BetaMiniWriteSequence<stream> is configured and added to the path. The default is to not have an output collection and, consequently, a write sequence, allowing the user to run test jobs without writing an output collection.

[components]
Components to write in the output collections. Available options are
pointer borrow all mini data ( mini include micro )
deepCopyMicro write micro data and borrow rest of the mini
deepCopyMini write all the components
tagOnly write just the tag and borrow all the rest of the mini
explicit list of components explicit list of component to write ( except usr, see Notes below )

Notes

[cndLists]
tcl list to set the names of the composite BtaCandidate lists to write

[cndUsrData] [evtUsrData]
tcl lists to set the names of BtaCandidate-level and Event-level User Data blocks to write

[cndConfigOptions]
tcl list to allow one to specify the vertexing and four-momentum cached information for BtaCandidates and to configure the branch structure for different particle types. The options take keyword=value pairs.

Keyword: cndStoreOpt, allowed options are:
RecoP4 cache 4-momentum for all reconstructed BtaCandidates
CompositeP4 cache 4-momentum for all persisted composite BtaCandidates
CompositeVtx cache vertex position for all persisted composite BtaCandidates
CompositeDaughters store cached information also for all the daughters of persisted composites
<Name> <types in PDT> <cache variables> To store candidates by particle type (see example below)

Keyword: trkFitType, allowed options are:
All Store track fits for all stable particle mass hypotheses
any of Electron, Muon, Pion, Kaon, Proton Store the track fits for any of the specified particle types
BtaCandidate Store track fits specified by the type of the track-based BtaCandidates being stored

Keyword: trkFitStorage, allowed options are:
ZAxis Store track fits at the point of closest approach to the z axis
CandPoint Store track fits at the point of closest approach to the BtaCandidate production vertex
None Don't store track fits (usable only when storing the full mini and reading back in refit mode)

Keyword: cluster, allowed options are:
any of Esd, Aod, Tru, Tag, Cnd Cluster the specified components into their own file, e.g., cluster={Esd, Aod}. By default, all components are clustered into the same file

The default is to cache nothing. All the options above all cumulable. That is, all possible combinations of the options can be passed as a tcl list. For example, one can do

lappend cndOptions "cndStoreOpt=CompositeVtx"
lappend cndOptions "cndStoreOpt=RecoP4"
lappend cndOptions "cndStoreOpt=CompositeP4"
lappend cndOptions "cndStoreOpt=Pion pi+ pi- p4"
lappend cndOptions "cndStoreOpt=Kaon K+  K-  p4"
lappend cndOptions "cndStoreOpt=Proton   p+  anti-p- p4"
lappend cndOptions "cndStoreOpt=B0   B0  anti-B0 p4 vtx"
lappend cndOptions "cndStoreOpt=D0   D0  anti-D0 p4 vtx"
lappend cndOptions "trkFitType=All"
lappend cndOptions "trkFitStorage=CandPoint"

... ...
runSkim	<stream> <package> 
            [outputColl] [components]
            [cndLists] [cndUsrData] [eventUsrData] 
            $cndOptions

FAQ

Q: Why should I use two different files, one for general production configuration and another one for my Physics Analysis configuration? I want to put everything in a single file!

A: It separates configuration aspects common to all the analysis in the area of interest of the package from your analysis-specific configurations.
For example, you may have more than one analysis using the same package thar may share the some tools and scripts. Or you may have different configurations for the same analysis, and you can switch from one to without modifying the existing configurations.