SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo
Workbook HEPIC Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews
Unwrap page!
Wkbk. Search
Wkbk. Sitemap
Introduction
Non SLAC
HOWTO's
Introduction
Logging In
QuickTour
Detector
Info Resources
Software Infrastructure
CM2 Introduction
Unix
OO
SRT
Objectivity
Event Store
Framework
Beta
Modifying Code
Writing and Editing
Compiling
Debugging
Analysis
Framework II
Analysis
Find Data
Batch Processing
PAW
PAW II
ROOT I
ROOT II
ROOT III
Advanced Infrastructure
New Releases
Workdir
Main Packages
Event Displays
Gen/Sim/Reco
Contributing Software
SRT and CVS
Coding
Advanced Topics
Make CM2 Ntuples
New Packages
New Packages 2
Persistent Classes
Java
Site Installation
Check this page for HTML 4.01 Transitional compliance with the
W3C Validator
(More checks...)

Workbook for Offline Users: How to generate a set of CM2 ntuples using SimpleComposition and BtaTupleMaker

Introduction

This tutorial will describe how to generate a set of analysis ntuples using SimpleComposition and BtaTupleMaker. These were written by Chris Roat and Chih-hsiang Cheng respectively as a way of simplifying physics analysis on BaBar. They can be used for many publication-quality BaBar analyses.

To do your own analysis, you will need to start from an existing skim, or produce your own one. There are something like 120 skims available, so you may find that an existing one is suitable. Ask your convener, or others in your AWG. You should get your AWG and its conveners involved at the earliest possible stage, and make regular reports throughout your analysis. My observation is that analyses that don't interact regularly with their AWGs never get finished.

In general, the ntuples you make should be stored on your own disk resources - your laptop, desktop, or university, for example. Keep two copies - this type of disk space is cheap. (I just bought a 120 GB external HD suitable for backups for $70 after rebate). On the other hand, disk space at SLAC is extremely expensive (~$2k for the same disk), and limited. (It is also way better disk.)

It is easy to produce large sets of ntuples with these tools. If you do this, you will be frustrated by how long it takes to make each and every plot when analyzing them on your laptop. Work a bit with signal and background MC to tighten your selection before making all your ntuples. For an exclusive B decay, I would aim for a few tens of GB at most (including both data and MC); my last couple have been a few GB.

In this tutorial, we will start with a simple case, B+ --> Jpsi K+, with Jpsi --> mu+ mu-. This way we can go through the full process without getting too bogged down in the complexities of Composition. We will then go back and do a full B--> Jpsi pi+ pi- K analysis, including charged and neutral modes, and various intermediate resonances, in the second part of this tutorial. 

Before doing this workbook section, you should work through the introductory sections. I would suggest Logging In, A Quick Tour of the BaBar Offline World, Information Resources, CM2 Introduction, Unix for BaBar, Event Store: BaBar Database Summary, Frameworks - the Enviroment for Running, and Batch Processing. There are other sections on Paw (1, 2) and Root(1, 2, 3), which may be useful.

This tutorial is written assuming you are running on a yakut machine at SLAC. It is often faster to work at a different Tier A site, but you will need to use different commands for the batch system and for scratch space. It is based on analysis-26 and release 16. (Analysis-26 is built to run on Scientific Linux; otherwise, they are identical). Commands that the user might enter are written in bold, blue to make them easier to find on later browsing.

Note you should ssh yakut.slac.stanford.edu. Don't ssh to a specific yakut - different ones have different operating systems. If you get complaints from ssh about changing IP addresses, you need to install the full list of SLAC unix sites into your .ssh/known_hosts file; see http://www.slac.stanford.edu/comp/unix/ssh.html. If you get complaints about "keys of a different kind are already known for this host", try ssh -1 yakut.slac.stanford.edu.

Here are a few useful links:

OK, let's begin!


Setting up your release

Build your executable in analysis-26.

You will need about 450 MB of disk space for your release, including libraries and executable. To check your disk space, use the command:


fs listquota
If you don't have that much, put in the request before you begin to AFS-req. Now type:


newrel -t analysis-26 Rel26
cd Rel26
srtpath
<enter>  <enter>
cond16boot
addpkg workdir
gmake workdir.setup

            
This will put the library and executable in your own disk space. You should definitely do this when you are producing ntuples for a real analysis. If you are just working through a tutorial for a few days, you could put them in an AFS build scratch area (e.g. /afs/slac/g/babar/build/h/hearty)

newrel -s my-build-scratch-dir -t analysis-26 Rel26
            
If you don't have such an area, run the script bbrCreateBuildArea to create it. You cannot use this area for ntuples or log files or such. Conversely, you probably don't want to use your normal NFS scratch space (e.g. $BFROOT/work/h/hearty) for your release because of poor performance if someone else is heavily using the same disk space by, for example, writing ntuples from 100 batch jobs simultaneously.

If you use a scratch area, your executable will vanish six days after you create it, and your jobs will all crash, and you won't know why. This is probably OK, if you are just working through the tutorial. 

We are now ready to add tags. Check the Extra Tags web site to see if you need any extra tags to make your code work propertly. At the time this tutorial is being written, (September 2005), there are no required extra tags for analysis-26. However, there are a number of "optional" extra tags that are relevant for this analysis:

addpkg BbkTools V00-00-41 
addpkg BbkUserTools V00-00-26
addpkg UsrTools V00-07-13 
addpkg SmpCalculation V00-01-08
addpkg BtaTupleMaker V00-02-07-03
As the extra tags page reminds you, you need to use "checkdep" to check for any dependencies whenever you add new tags:

checkdep
This will give you a long message:
Using glimpse index of 16.1.5

checkdep: Error: Old Tag!! BbkTools V00-00-41 (Release uses V00-01-09)

cvs diff: tag V00-00-41 is not in file BbkTools/BbkConditionsTable.pm
cvs diff: tag V00-00-41 is not in file BbkTools/BbkDataDatasetTable.pm
cvs diff: tag V00-00-41 is not in file BbkTools/BbkDataDsEntityTable.pm
cvs diff: tag V00-01-09 is not in file BbkTools/BbkDbiAccessor.pm
cvs diff: tag V00-01-09 is not in file BbkTools/BbkDynamicClass.pm
cvs diff: tag V00-01-09 is not in file BbkTools/BbkTestAccessor
cvs diff: Diffing BbkTools
cvs diff: Diffing BbkUserTools
cvs diff: Diffing BtaTupleMaker
cvs diff: tag V00-01-03 is not in file SmpCalculation/SmpCalcBSFlightSig.hh
cvs diff: tag V00-01-03 is not in file SmpCalculation/SmpCalcDauFlightSig.hh
cvs diff: tag V00-01-08 is not in file SmpCalculation/SmpCalcFlightSig.hh
cvs diff: tag V00-01-08 is not in file SmpCalculation/SmpCalcHellTransAngles.cc
cvs diff: tag V00-01-08 is not in file SmpCalculation/SmpCalcHellTransAngles.hh
cvs diff: tag V00-01-08 is not in file SmpCalculation/SmpCalcPhiTr.hh
cvs diff: tag V00-01-08 is not in file SmpCalculation/SmpCalcTheta1.hh
cvs diff: tag V00-01-08 is not in file SmpCalculation/SmpCalcThetaTr.hh
cvs diff: Diffing SmpCalculation
cvs diff: tag V00-07-04-01 is not in file UsrTools/UsrMakerTools.cc
cvs diff: tag V00-07-04-01 is not in file UsrTools/UsrMakerTools.hh
cvs diff: Diffing UsrTools
cvs diff: Diffing workdir
cvs diff: Diffing workdir/kumac

checkdep: You should add for recompilation:
FilterTools
You can ignore the error message -- it refers to an old version of BbkTools that was added to analysis-26 to fix a bug in the newer tag. But we will follow the instructions and add FilterTools:

addpkg FilterTools
Now when you checkdep again, there will be no more tags to add.

Before compiling and linking, create your .bbobjy file in your release directory. Use the command,

GetFDID
to find the 4-digit FDID numbers assigned to you, and then make a file called .bbobjy in Rel16, with the single line:
FD_NUMBER = ****
where **** is one of those numbers.

Now you are ready to compile and link:


bsub -q bldrecoq -o all.log gmake all
or
gmake all
Some people have had problems with gmake all due to Objectivity issues. Unless analysis-24 or 25, you don't need to database importt, so if you know what you are doing, you can get away with
gmake lib
gmake BtaTupleMaker.bin
Afterwards, check that you actually got library and binary files for each package, and that the dates and sizes are reasonable. If anything is funny, check for error messages in all.log for that package. If you compile a second time, be sure to use a different name than all.log, or else delete the file first.
ls -l lib/$BFARCH

total 36268
-rw-r--r--    1 penguin  br        7923368 Sep 29 14:30 libBtaTupleMaker.a
-rw-r--r--    1 penguin  br       14184168 Sep 29 14:33 libFilterTools.a
-rw-r--r--    1 penguin  br        1599076 Sep 29 14:33 libSmpCalculation.a
-rw-r--r--    1 penguin  br       13376790 Sep 29 14:35 libUsrTools.a
drwxr-xr-x   10 penguin  br           4096 Sep 29 14:11 templates

ls -l bin/$BFARCH

total 56836
-rwxr-xr-x    1 penguin  br           2375 Sep 29 14:35 BbkGetConnectInfo
-rwxr-xr-x    1 penguin  br           8290 Sep 29 14:35 BbkListTables
-rwxr-xr-x    1 penguin  br          16277 Sep 29 14:35 BbkMakeDataset
-rwxr-xr-x    1 penguin  br          10935 Sep 29 14:35 BbkModifyDseFlags
-rwxr-xr-x    1 penguin  br           1242 Sep 29 14:35 BbkMySqlDump
-rwxr-xr-x    1 penguin  br           5021 Sep 29 14:35 BbkSPModes
-rwxr-xr-x    1 penguin  br           9722 Sep 29 14:35 BbkSqlPlus
-rwxr-xr-x    1 penguin  br           8111 Sep 29 14:35 BbkSqlShell
-rwxr-xr-x    1 penguin  br           2921 Sep 29 14:35 BbkTestAccessor
-rwxr-xr-x    1 penguin  br           1984 Sep 29 14:35 BbkTestConfiguration
-rwxr-xr-x    1 penguin  br           1711 Sep 29 14:35 BbkTestConnect
-rwxr-xr-x    1 penguin  br           1107 Sep 29 14:35 BbkTestOptionsManager
-rwxr-xr-x    1 penguin  br            416 Sep 29 14:35 BbkTestTime
-rwxr-xr-x    1 penguin  br       58000116 Sep 29 14:37 BtaTupleApp
-rw-r--r--    1 penguin  br            604 Sep 29 14:37 Index
-rwxr-xr-x    1 penguin  br           6076 Sep 29 14:35 relBbkDatasetHistory
-rwxr-xr-x    1 penguin  br           7872 Sep 29 14:35 relBbkDatasetTcl
-rwxr-xr-x    1 penguin  br           7698 Sep 29 14:35 relBbkExpertTcl
-rwxr-xr-x    1 penguin  br          10709 Sep 29 14:35 relBbkUser
            
BtaTupleApp is our analysis executable (it is the code that creates and fills the ntuple). The other objects in the bin directory are automatically created by "gmake all", but you do not need them for this tutorial.

A typical problem is that your disk fills up and your binary doesn't get created so that when you run, you get the binary of the same name in the release, which does not have the tags we just added. Beware also of core files which can be created from an unsuccessful compilation or failure of a job to run - these can take up vast quantities of space and may need to be deleted to carry on. So you should always be sure to check the size and date of the binary:

That is it - unless new problems are found, and new tags created, you will not have to compile/link code again to analyze Runs 1-4 data.

Recall that every time you log on, you need to run srtpath and cond16boot from your release directory (i.e., Rel26).


Analysis Code

Analysis Code - General

Our core analysis code (the part used for every job we run) consists of a single tcl file. There is an additional small tcl file (a "snippet") that specializes the code for each job, setting file names and so forth. There are four components to this core code: a general part that is the same for everyone, a tag-filter specific to this analysis, the SimpleComposition section, and the ntuple-dumping (BtaTupleMaker) part. These will be the topics of following sections. Here is the code we will have at the end of the initial section (note the non-standard file name and extension, Analysis-Simple.tcl.txt. copy this file to your workdir and rename it Analysis.tcl.

Let's go through the first part of Analysis.tcl. Quotes from this file are shown in green to distinguish them from sample output, commands that you might enter, etc.

#..Analysis.tcl

# # Main tcl file for B+ --> Jpsi K+ [Jpsi --> mu+ mu-] workbook tutorial

#..General setup needed in all jobs
sourceFoundFile ErrLogger/ErrLog.tcl
sourceFoundFile FrameScripts/FwkCfgVar.tcl
sourceFoundFile FrameScripts/talkto.tcl
Lines starting with # are comments. sourceFoundFile executes the tcl in the specified package or gives an error if it can't find it. If you have checked out the package (i.e., you did addpkg xxx) it will use that version, otherwise the version in the release. The path for package xxx is workdir/PARENT/xxx.

These three files turn on error logging, and define Framework Configuration Variables (FwkCfgVar), and "talkto", which we will use to communicate with code modules.

The following are the Framework Configuration Variables for our analysis. They allow you to adjust parameters from one job to another. For example, you will want to change the ntuple file name, the number of events to run, and whether or not the input is data or MC. The first couple (objectivity vs Kanga, and what type of data to use) are not generally adjusted, but are here if you did need to. We will discuss the specific meaning of each as they come up in the tutorial.

The format is: FwkCfgVar variable_name default_value i.e., the default value is set in Analysis.tcl. You can override the default in the tcl snippet for a particular job:

set variable_name value_for_this_job 

With FwkCfgVars, you will not need to use environmental variables, which were an on-going source of problems and confusion.


#-------------- FwkCfgVars needed to control this job --------------------
set ProdTclOnly true
#..allowed values of BetaMiniReadPersistence are "Kan", "Bdb"
FwkCfgVar BetaMiniReadPersistence Kan
#..allowed values of levelOfDetail are "micro", "cache", "extend" or "refit"
FwkCfgVar levelOfDetail "cache"
#..allowed values of ConfigPatch are "Run2" or "MC".
FwkCfgVar ConfigPatch "MC"
#..Filter on tag bits by default
FwkCfgVar FilterOnTag   "true"
#..Print Frequency
FwkCfgVar PrintFreq 1000
#..Ntuple type and name 
FwkCfgVar BetaMiniTuple "root"
FwkCfgVar histFileName "Tutorial.root"
#..Number of Events defaults to 0 (run on full tcl file)
FwkCfgVar NEvents 0
There is a standard set of physics code you will need, which includes particle ID and standard track definitions. It includes a couple of things we don't need, which we might disable later. btaMiniPhysics.tcl is the standard set to use, but you can get more (btaMiniPhysProdSequence.tcl) or less (btaMini.tcl) if it suits you better.

#--------------------------------------------------------------------------
#..General physics sequences
#..set to 'trace' to get info on a configuration problem
ErrLoggingLevel warning
#..btaMiniPhysics is the basic physics sequences;
sourceFoundFile BetaMiniUser/btaMiniPhysics.tcl
At the end of the Analysis.tcl file, we use some of the FwkCfgVars to control the actual running of the job. printFreq prints a line into your log file at the specified interval. Useful to track progress in batch jobs, but you don't want to bury important log file information in millions of such lines.

If $NEvents is 0, all events in the collection are run. If it is negative, the "ev beg" command is not executed, so get just get a framework prompt after Analysis.tcl is finished. This is useful if you want to "mod talk" to a module before running.

I generally include a "path list" so I can check the log file to see what was actually run in the job.


#-----------------------------------------------------------------------
#..Run time options
mod talk EvtCounter
  printFreq set $PrintFreq
exit
path list
if { $NEvents>=0 } {
    ev beg -nev $NEvents
    exit
}
This general part of Analysis.tcl was derived from MyMiniAnalysis.tcl in package BetaMiniUser. If you are switching to a new release, or want extra information, take a look.

Analysis Code - Tag Filter (and a review of data processing)

It is useful to briefly review data processing to put the idea of a Tag Filter into context. Data is processed in PR ("prompt reconstruction"), which creates the AllEvents collections from the XTC files. For new data, this is done in Padova; for reprocessing, it can be done at SLAC or another Tier A site as well.

At a later stage, the skim executable is run. It includes a block of physics code which sets a large number (>100) of booleans (tag variables) to be true or false. Based on these tag variables and other criteria, such as trigger or BGFilter booleans, events are placed into various collections called skims. Every event, whether selected by a skim or not, is placed into the AllEventsSkim collection. It differs from the original AllEvents collection because it contains the tag variables set by the skim executable.

For "deep copy" skims, the event is copied into the new collection. We can end up with multiple copies of the event on disk, so we try to not to do this for skims that select a significant fraction of all events. For these large skims, we create "pointer collections", which consist of pointers to the corresponding events, which are physically located in the AllEventsSkim collection.

Several versions of the skim executable can be run on the same data set, as people get new ideas for skims. For example, there is the original Release 14 skim of runs 1 - 4, identified by "R14" in the file name of the collection. A small number of skims were then rerun on Runs 1 - 3, again in Release 14, and are identified by "R14a" in the file name. Similarly, the reskim of runs 1 - 5 in Release 16 is identified by "R16a" in the file name.

In SP6 and earlier, the block of physics code used in the skim executable is also run as part of Moose (the executable used to produce MC), so the resulting AllEvents collections contain the corresponding booleans. Note that the full skim executable is not run - there are no skims created. These tag bits, of course, are the ones for the release used to generate the MC in the first place. For example, SP6 corresponds to the "R14a" set in data, while SP5 corresponds to Release 12. (See the skims page for the details of each release). Therefore, the tag bit you are interested in may not exist in a particular MC collection, or it may exist, but correspond to a different version of the PID selectors than the current release.

OK, we are ready to tackle Tag Filters. A Tag Filter allows you to check the values of the tag variables and read the full event from disk only if the desired ones are true. This can easily save an order of magnitude in wall-clock time compared to running directly on the AllEventsSkim collection in data or the AllEvents collection in SP. Of course, it isn't useful if the tag variable doesn't exit in the collection, such as in the data AllEvents collection. Or, as mention above, it may exist in an SP AllEvents collection, but correspond to something slightly different than what you expect.

Tag filters may not help at all if you are running on a skim, or they may help a bit. For example, the Jpsitoll skim includes both Jpsi to l+l- and Psi2s to l+l- tag bits. If you are only interested in Psi2s decays, you can gain with a tag filter.

The Tag Filter in Analysis.tcl checks for the two Jpsi to l+l- tag bits. The option andList set xxxx also exists. I have set an option to crash the job if the tag bits don't exist in the event, so that I will know there is something wrong with the collection I am using. Note that the Tag Filter is appended to the sequence (i.e., actually executed) only if the FwkCfgVar FilterOnTag is equal to "true". From above, you can see this is the default. The JpsiELoose and JpsiMuLoose tag bits have been present since Release 8, so we can use this filter on SP4 or later MC.


#----------------------------------------------------------------------------
#..Use $FilterOnTag to disable tag filtering if desired.
module clone TagFilterByName TagJpsill
module talk TagJpsill
   orList set JpsiELoose
   orList set JpsiMuLoose
   assertIfMissing set true
exit
if { $FilterOnTag=="true" } {
    sequence append BetaMiniReadSequence -a KanEventUpdateTag TagJpsill
}
The nano also contains some integers (e.g. numbers of tracks) and floats (e.g. R2All). You can filter on these as well, although you will need to add a couple of tags, recompile and relink. See the hypernews posting on this subject.

Filtering on Signal MC

Data and generic MC are both routinely skimmed, so you should be able to find the collections you want, containing the newest tag bits. Signal MC (which I will defined in the book keeping section later) is not, so you need a different strategy. Here are a few options:

  • Signal MC modes are skimmed, if an AWG requests it. So pass along your  requests to your convener. You need to do this before the appropriate deadline, normally in the middle of the skimming cycle.
  • If you are confident that you understand the tag bits in the SP AllEvents collection, and are not sensitive to changes in PID selectors or EMC calibration, go ahead and use them.
  • If the selection you apply in SimpleComposition when creating your ntuple is tighter than the criteria used to create the tag bit, and your signal MC collections are not too large, just run on the AllEvents collection and ignore the tag bits.
  • With a small amount of work, you can add your actual skim selection code to your path as a filter. I have not tried this myself but Will Roethel indicates in this email that it is not too difficult.
  • You could also just run the Skimming executable yourself.

Analysis Code - Candidate composition using SimpleComposition

It might be useful to check the SimpleComposition web site while working on this section. For now, we will do only B+ --> Jpsi K+, with Jpsi --> mu+ mu-. Once everything is working, we will expand to the full analysis. For educational purposes, we are creating our own lists from scratch. If you are writing code for  a skim, you should instead start from existing lists defined in the SimpleComposition package. This way, we only need to do the combinatorics once per event, instead of doing it once per skim. Note you should NEVER "talkto" one of these central selectors-you would then be modifying the list for everyone, not just your skim. Instead, you should refine or sublist the general list.  

The first step is easy - we create a sequence called AnalysisSequence and add it to the path, which is called Everything.


#--------------------------------------------------------------------------
#..Use SimpleComposition to make the candidates of interest
#..Create Analysis sequence and append it to the path.
#  All the composition modules get added to this sequence
sequence create AnalysisSequence
path append Everything AnalysisSequence
Now let's make a Jpsi from mu+ mu-.

We use the neural-net muon selector, which is the recommended one. There are actually 8 levels available, with names like muNNxxx or muNNxxxFakeRate. (The latter aims for lower fake rate at the cost of lower efficiency). For plots of performance, see the release 16 selectors webpage. The list of recommended selectors is at selector_changes.html.


#------------------ Jpsi Lists --------------------------------------
#..Basic Jpsi to mu mu list, no mass constraint (for ntuple).
#  We may want to add a cut on the chisquare of the fit as well.
mod clone SmpMakerDefiner MyJpsiToMuMu
seq append AnalysisSequence MyJpsiToMuMu
talkto MyJpsiToMuMu {
    decayMode set "J/psi -> mu+ mu-"
    daughterListNames  set muNNVeryLoose
    daughterListNames  set muNNVeryLoose
    fittingAlgorithm   set "Cascade"
    fitConstraints     set "Geo"
    preFitSelectors    set "Mass 2.8:3.4"
    postFitSelectors   set "Mass 2.9:3.3"
}
The names in the decayMode must match PDT syntax; see the file pdt.table in the package PDT (workdir/PARENT/PDT/pdt.table).  We are using the Cascade fitter, and applying a constraint ("Geo") to force the two daughters to come from a common vertex. We are not applying a mass constraint. Cascade is a good choice for standard fitting of charged tracks. If you are working with something that has a non-negligible lifetime, TreeFitter may be more appropriate. See the fittersAndConstraints webpage. The question of what fitter to use under what circumstances can be tricky, and if you are not sure, I would suggest posting a question to the Vertexing and Composition HN.

We will now constrain the mass of the Jpsi candidates to the PDG value. In general, to get better resolution on the B candidate, you should always mass-constrain daughters with width narrower than detector resolution: pi0, eta, eta', Ks, D, D*, Ds, Ds*, Jpsi, chi_c1, chi_c2, psi2s. (I may have missed some here.)

We could have this in the step above by including the line fitConstraints set "Mass". We don't, because we want to calculate the unconstrained mass and store it in the ntuple for use in distinguishing signal from background.


#..Now add the mass constraint
mod clone SmpRefitterDefiner MyJpsiToMuMuMass
seq append AnalysisSequence MyJpsiToMuMuMass
talkto MyJpsiToMuMuMass {
    unrefinedListName  set MyJpsiToMuMu
    fittingAlgorithm   set "Cascade"
    fitConstraints     set "Geo"
    fitConstraints     set "Mass"
}
There are pitfalls when creating self-conjugate states or decays to CP eigenstates, which we will discuss when we re-visit SimpleComposition below.

We make the B+ candidates by combining a Jpsi list and a K+ list:


#------------------------ B+ ---------------------------------------
#..B+ --> Jpsi K+
mod clone SmpMakerDefiner BchtoJpsiKch
seq append AnalysisSequence BchtoJpsiKch
talkto BchtoJpsiKch {
    decayMode             set "B+ -> J/psi K+"
    daughterListNames     set "MyJpsiToMuMuMass"
    daughterListNames     set "KLHVeryLoose"
    fittingAlgorithm      set "Cascade"
    fitConstraints        set "Geo"
    preFitSelectors       set "DeltaE -0.20:0.20"
    preFitSelectors       set "Mes 5.19:5.30"
    postFitSelectors      set "ProbChiSq 0.001:"
    postFitSelectors      set "DeltaE -0.12:0.12"
    postFitSelectors      set "Mes 5.20:5.30"
    postFitSelectors      set "CmsCosTheta"
    postFitSelectors      set "Mmiss"
    createUsrData         set true
}
To save some CPU time, we make loose cuts on DeltaE and Mes before we do the fit.

The last line stores all Selector quantities as user data associated with the B candidate. If we were writing a skim (or sub skim), we could write out the list of B candidates and the associated user data in the selected events. In our case, we will store the quantities in the ntuple. This is why we list postFitSelectors such as CmsCosTheta - we want it calculated and stored, even if we don't want to cut on it right now.

Mmiss is a kinematic variable that partners with the candidate mass, in the same way Mes and DeltaE go together. It may be a better choice if your final state contains exactly one poorly measured particle. Wouter Hulsbergen gave an interesting talk about the different kinematic variables.

Note that using a Selector to add UsrData can be inefficient, particularly if you end up having to rerun the kinematic fits at the end just to create UsrData. (This can happen if you combine separate B0 and B+ lists to form the list to store in the ntuple, and then want UsrData attached to this combined list).  Chung Khim Lae is working on a tcl-based tool to add UsrData to lists, which is the better way to do this.


Analysis Code - Corrections to MC

When running on MC you need to consider corrections for differences with data with respect to tracking, photon and pi0 reconstruction, and particle ID.

There are a variety of related techniques for correcting for PID data/MC differences. You should use only one of these - in other words, don't apply PID tweaking at a sub skim level, then make an ntuple set storing PID weights. See pidonmc.html. Post questions to the Particle ID tools HN.  The ntuple structure below uses PID weights, the ratio of the efficiency for a track to satisfy the specified selector in real data to that for simulated data. It is a function of the track momentum, theta, and phi.  You should check the status is 1 (OK) before using the weight. And you should ask around to make sure there really are PID weights in the conditions database if you are running on new data. (The weight requires that the PID group has analyzed both data and MC for a run period).

You need to tell the selectors to operate in "weight"  mode (as opposed to "tweak" or "kill", for example). Somewhere in your Analysis.tcl, before "ev beg", place:

pidCfg_mode weight *

The ascii tables used in earlier analysis releases probably don't work, and if you are upgrading your tcl, you will need to switch to the conditions database using this syntax. 

The tracking correction consists of a weight that gives the relative efficiency for the MC track compared to data. The correction is available from the conditions database; no tcl is required on your part.  BtaTupleMaker knows how to read in the correction table and store it in the ntuple, as described below. See the tracking effiency web page for more details. Note that no correction is required to the tracking resolution.

The efficiency corrections for photons and pi0's are simple correction calculations that are not applied at run time. See the neutrals web page for details.  However, it is necessary to apply additional calibrators (corrections to energy) at run time. In general, these may be applied to either data or MC. Once activated, the core neutrals code will handle both cases correctly. To do so,  place the following lines somewhere before "ev beg":

talkto EmcNeutCorrLoader {
correctionOn set true
endcapShift set true
}



Analysis Code - Ntuple Structure

Take a look at the BtaTupleMaker web page above to understand the options and syntax below, which I will not explain.

First, add the ntuple-dumping module to the path:


#--------------------------------------------------------------------
#..Use BtuTupleMaker to write out ntuples for SimpleComposition job
path append Everything BtuTupleMaker
We determine the quantities to be stored in the ntuple by setting tcl parameters that control the behavior of BtuTupleMaker. (Note that while the executable made is called BtaTupleApp, and it is made using a gmake BtaTupleMaker command, a module called BtuTupleMaker is actually called and appended to your path.) I'll intersperse comments with the code below.

talkto BtuTupleMaker {
These first block contains information per event (not per candidate). I keep the center-of-mass four-momentum, the beam spot, primary vertex, and a couple of other items from the Tag (number of charged tracks, and R2 calculated using both tracks and neutral clusters).

By default, every event that BtuTupleMaker sees (every event passing the TagFilter, if you have one) is dumped to the ntuple. This can increase the size by a substantial factor, so set this option to false, unless you have a really good reason. (For example, I sometimes keep every signal MC event so that I can study the MC truth distributions).


#..Event information to dump
    eventBlockContents   set "EventID CMp4 BeamSpot"
    eventTagsInt set "nTracks"
    eventTagsFloat set "R2All xPrimaryVtx yPrimaryVtx zPrimaryVtx"
    writeEveryEvent set false
You definitely want to keep the MC truth. You don't need to do anything differently when running on data - the block will still be created, so that your code structure can be the same - but the number of entries in the block will always be 0.

#..MC truth info
    fillMC set true
    mcBlockContents set "Mass CMMomentum Momentum Vertex"
Now we are at the heart of the code. BtuTupleMaker starts from a specified list, in our case, of B candidates. If you were doing a recoil analysis, where you fully reconstruct both B's in the event, you could form Upsilon(4S) candidates from the two and store that list. If your analysis were of inclusive Jpsi, that would be the list.

BtuTupleMaker stores information for particles on the list and for all its daughters (and granddaughters, and so forth), and includes links between the ntuple blocks for each particle type. Every particle type in the decay chain must be assigned to an ntuple block. You can have more than one particle type per block, if you prefer. For example, I prefer to put both B+ and B0 candidates into a single "B" block; others put them in separate blocks.

Don't worry if you forget the ntpBlockConfigs command for a particular particle type - the code will just abort and give you an error message; it won't do anything wrong. Note that ntuples are packed, so that increasing the maximum size of the block increases the ntuple size only for those events with extra candidates.

The ntpBlockContents command specifies the items to store. UsrData(BchtoJpsiKch) stores the User data for our B list, which is created by the "createUsrData set true" command in the SimpleComposition block.


#..Particle blocks to store
    listToDump set BchtoJpsiKch
    ntpBlockConfigs set "B-      B    2      50"
    ntpBlockContents set "B: MCIdx Mass Momentum CMMomentum Vertex VtxChi2 
UsrData(BchtoJpsiKch)"
(where the last line above was split for formatting purposes and should be entered as a single line). We don't have any Usr data for the Jpsi list, but we do want to store the unconstrained Jpsi mass. This is an unusual item, in the sense that it is not obtained directly from the Jpsi list used to make the B candidates (MyJpsitoMuMuMass), but rather from MyJpsitoMuMu. However, the ntuAuxListContents command exists for this purpose. BtuTupleMaker will search through the list you specify and find the corresponding Jpsi, and store the requested quantities, mass in this case. The "Unc" is just a text string to allow you to distinguish ntuple quantities from this auxilary list.

    ntpBlockConfigs set "J/psi   Jpsi   2      50"
    ntpBlockContents set "Jpsi: MCIdx Mass Momentum Vertex VtxChi2"
    ntpAuxListContents set "Jpsi: MyJpsiToMuMu : Unc : Mass"
The interesting thing in the K+ and mu+ blocks are the PIDWeights that are stored for each MC candidate. (Weights are 1 for data). When filling histograms with MC data, weight the event with the product of the PID weights for the involved tracks (and the corresponding tracking effiency weights) to correct for MC/data differences. You get not only the weight in your ntuple but also its uncertainty and a status integer. You should check the status is 1 (OK) before using the weight.

    ntpBlockConfigs set "K+      K      0      50"
    ntpBlockContents set "K: MCIdx Momentum PIDWeight(KLHVeryLoose,KLHLoose,KLHTight)"
    ntpBlockConfigs set "mu+      mu      0      50"
    ntpBlockContents set "mu: MCIdx Momentum PIDWeight(muNNVeryLoose,muNNLoose)"
I find it useful to store every track and gamma in the event, in addition to those used in forming a B candidate. It can allow you to recover from a variety of mistakes. But you should test how much this option increases the size of your ntuples.

#..Want to save all CalorNeutrals in the gamma block
    ntpBlockConfigs set "gamma   gamma  0      60"
    ntpBlockContents set "gamma: MCIdx Momentum"
    gamExtraContents set EMC
    fillAllCandsInList set "gamma CalorNeutral"
#..TRK block. Save all of them as well.
    fillAllCandsInList set "TRK ChargedTracks"
#..remember to change this back to K pi mu e
    ntpBlockToTrk   set "K mu"
    ntpBlockContents set "TRK: Doca DocaXY"
    trkExtraContents set "BitMap:pSelectorsMap,KSelectorsMap,piSelectorsMap,
                                            muSelectorsMap,eSelectorsMap,TracksMap"
    trkExtraContents set HOTS
    trkExtraContents set Eff:ave
}
(where in the above the first trkExtraContents line was only split for formatting purposes, and should be entered as a single line). In the track block you can see the tracking efficiency weight (Eff:ave), which corrects for the MC/data difference in the efficiency for this track to be reconstructed as a GoodTracksLoose in MC and data. A GoodTracksLoose track is required to come from the near the interaction point and to have a minimum number of drift chamber hits, so that there is a good momentum measurement. In most cases, you should require your tracks to be GoodTracksLoose. If your tracks have low pt --- slow pions from D* decays, for example --- use GoodTracksVeryLoose instead. If they don't come from the origin --- pions from Ks decays, for example --- then just use ChargedTracks. You can't use the tracking efficiency tables for anything other than GoodTracksLoose, but flat corrections (i.e., not a function of particular track in question) are available from the tracking efficiency web page

If you aren't using GoodTracksLoose, consider checking the pt distribution of your candidates that don't satisfy GoodTracksLoose. Any track with pt more than ~120 MeV ought to have drift chamber hits if it is measured correctly.

You select the track type in SimpleComposition by sublisting, which we will discuss later, or by using the TracksMap bitmap. The bits set to 1 in this word indicate which Track requirements are satisfied by the candidate. The safest way to get the correspondence between the bits and the track types is to talk to module that creates this work, TrkMicroDispatch, and type "show". (More on this later).

You can like-wise find out which muon ID selectors selected this using the muSelectorsMap bitmap. To get the correspondence between these bits and the selectors, "mod talk" to MuonMicroDispatch. (We will do this later). The bitmap is useful if you want to tighten the criteria you used to make your B candidates without having to rerun your jobs.


Preparing to run some jobs

We will next run some sample jobs on MC samples. We will use book keeping tools to locate the collections.

Some subdirectories

I find it convenient to create a sub directory of workdir called maketcl, which I use for creating the tcl files and other book keeping activities. I then create another subdirectory of workdir called tcl, which contains soft links to the tcl files that are actually located in maketcl. I do this so that I can then delete the links in "tcl" as the corresponding job successfully finishes, without losing the orginal file itself. But feel free to do whatever you like best.
mkdir maketcl
mkdir tcl
            
We will also need directories to store the log files and ntuples you create. You should probably put these in your scratch (work) area and make soft links to workdir - you won't have enough space in your own area. Recall that files in this area vanish after 6 days. Be sure to move both the ntuples and log files to permanent storage within that time.

mkdir $BFROOT/work/h/hearty/Rel26/
mkdir $BFROOT/work/h/hearty/Rel26/log
mkdir $BFROOT/work/h/hearty/Rel26/hbookdir
ln -s $BFROOT/work/h/hearty/Rel26/log log
ln -s $BFROOT/work/h/hearty/Rel26/hbookdir hbookdir
I will be disappointed with you if you actually create your directories in /h/hearty.

I find it useful to have a couple of other directories called "failed" and "successful": the batch system writes to "log" and "hbookdir", then I manually move the files to "failed" or "successful" as appropriate.

mkdir $BFROOT/work/h/hearty/Rel26/failed
mkdir $BFROOT/work/h/hearty/Rel26/successful
ln -s $BFROOT/work/h/hearty/Rel26/failed failed
ln -s $BFROOT/work/h/hearty/Rel26/successful succesful
Other people keep all of their log files and ntuples in their local area. To do this, you will want to gzip everything as it is created, and keep a careful tab on your disk space. You can use commands like zgrep, zcat, zless and so forth to work with gzipped files.

While we are at it, let's create a subdirectory for tcl snippets, which we will need shortly

mkdir snippets
Will Roethel has writtn a prototype "analysis task management" system available that is undoubtedly a way better way to work. See SJMMain.htm

I haven't tried it yet.

Since many of the book keeping commands used are obscure and difficult to remember, I recommend keeping a record of them. I use a file called commands.txt in maketcl.

Finding the available MC modes

You will need to know the mode number of the MC sample you intend to use. You can get some information by browsing the old SP5 inventory web site:

(Recall that SP5 corresponds to runs 1 - 3, SP6 is run4, and SP7 is run 5. SP8 will cover the full data set.)

You should also ask your conveners and other colleagues in your AWG.

More generally, you can use BbkSPModes to get lists of different types of MC. You can search the descriptions (also called "runtypes") of all modes by:


BbkSPModes --runtype "J/psi"
BbkSPModes --runtype "Jpsi"
To search in the titles of the decay (.dec) files:

BbkSPModes --decfile "Jpsi"
The resulting output can be quite large. It is perhaps easier to write this to a file so you can look at it in your editor:

BbkSPModes --decfile "Jpsi" > Jpsi-2.txt
BbkSPModes --decfile "jpsi" >> Jpsi-2.txt
Browsing through this, we find mode 1121, which is Inclusive Jpsi (generic B events that contain a Jpsi --> e+e- or mu+mu- decay), and 989, which is the exclusive decay B- --> Jpsi K-. Mode 4817, B+ --> X3872 K+ might be interesting as well.

You will undoubtedly want generic MC as well. Here are the mode numbers and the cross sections, which you will need for calculating the equivalent luminosity.

B+B-         1235   550 pb
B0B0 1237 550 pb
cc 1005 1300 pb
uds 998 2090 pb
tau tau kk2f 3429 890 pb
mu mu kk2f 3981 1150 pb

Making tcl files for the selected modes

Generic MC is skimmed, while non-generic is generally not skimmed. If you really need to have another mode skimmed, talk to your convener.

It is very much faster to run on skimmed generic MC than on the AllEventsSkim collection - you can save an order of magnitude in time. To make tcl files for the Jpsitoll skim of generic mode 1235 (B+B-) in Run 1, there are several possible BbkDatasetTcl commands you could use:


BbkDatasetTcl -t -ds SP-1235-Jpsitoll-Run1-R16a
or
BbkDatasetTcl -t 100k -ds SP-1235-Jpsitoll-Run1-R16a
or
BbkDatasetTcl -t 100k --splitruns -ds SP-1235-Jpsitoll-Run1-R16a
Eventually we will use the third command, but let's look at all three. Make and go to a directory called temp in your workdir:

mkdir temp
cd temp
Now try the first command:
BbkDatasetTcl -t -ds SP-1235-Jpsitoll-Run1-R16a

BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a.tcl
Selected 9 collections, 1385035/27176000 events, ~0.0/pb
            
This command produced the file SP-1235-Jpsitoll-Run1-R16a.tcl. The Jpsitoll skim contains 1385035 events, from an original 27176000 B+B- events in Run 1. (So the skim rate is 5.1%; the equivalent luminosity, assuming a B+B- cross section of 0.54 nb, is 50.3 fb-1, 2.6x the luminosity for Run 1.)

Make sure you keep track of these numbers - you will need them when you want to scale your MC to the data luminosity. Write the numbers in your log book, or put the output in your commands.txt file. In any case, be sure to keep your tcl files. The tcl file SP-1235-Jpsitoll-Run1-R16a.tcl is not a practical way to analyze data - your job will certainly run out of CPU time before it can go through the full file. So we come to our second BbkDatasetTcl command:


BbkDatasetTcl -t 100k -ds SP-1235-Jpsitoll-Run1-R16a


BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-1.tcl (1 collections, 121187/2316000 events, ~0.0/pb)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-2.tcl (1 collections, 215235/4150000 events, ~0.0/pb)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-3.tcl (1 collections, 155716/3026000 events, ~0.0/pb)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-4.tcl (1 collections, 149685/2928000 events, ~0.0/pb)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-5.tcl (1 collections, 158301/3120000 events, ~0.0/pb)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-6.tcl (1 collections, 147325/2930000 events, ~0.0/pb)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-7.tcl (1 collections, 252653/5026000 events, ~0.0/pb)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-8.tcl (1 collections, 33861/670000 events, ~0.0/pb)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-9.tcl (1 collections, 151072/3010000 events, ~0.0/pb)
Selected 9 collections, 1385035/27176000 events, ~0.0/pb
            
This version of the command splits the events in several tcl files. However, it will not split collections, so the resulting number of events in each file can vary quite a bit. No matter how small your request, the files can still be larger than what you require. To get around this, you use the third BbkDatasetTcl command to request the collections to be broken into smaller pieces:
BbkDatasetTcl -t 100k --splitruns -ds SP-1235-Jpsitoll-Run1-R16a

BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-1.tcl (1 collections, 100000 events)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-2.tcl (2 collections, 100000 events)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-3.tcl (1 collections, 100000 events)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-4.tcl (2 collections, 100000 events)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-5.tcl (2 collections, 100000 events)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-6.tcl (1 collections, 100000 events)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-7.tcl (2 collections, 100000 events)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-8.tcl (1 collections, 100000 events)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-9.tcl (2 collections, 100000 events)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-10.tcl (2 collections, 100000 events)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-11.tcl (1 collections, 100000 events)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-12.tcl (1 collections, 100000 events)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-13.tcl (3 collections, 100000 events)
BbkDatasetTcl: wrote SP-1235-Jpsitoll-Run1-R16a-14.tcl (1 collections, 85035 events)
Selected 9 collections, 1385035/27176000 events, ~0.0/pb
            
If you prefer a different file name, use the option --basename blahblah.

How do you decide how many events to put in a tcl file? There are a few criteria. If you are writing hbook ntuples, you should keep them less than 30 MB per job, or you risk corrupting them. (Root can handle bigger files, I believe). Your job can fail due to CPU time limits on the batch queue, or wall-clock limits. Try a couple of your longest jobs before you submit a whole bunch.  The  CPU limit for the kanga queue, which you should normally use, is 720 "slac minutes" bqueues -l kanga). However, 1 CPU min on the batch machine generally counts as more than 1 "slac minute".  To get the scale factor CPUF, for batch machine barb0309 (for example), type bhosts -l barb0309. So for this machine, the actual CPU limit is 720*60/2.11 = 20,474 sec.  You can use other queues - short, long, or xlong, for examples - but  you will be sharing the resources with other SLAC experiments like Glast and ILC. 

Now that you've experimented a bit with the different BbkDatasetTcl commands, you can choose the ones you want to use. In this case we will use the third BbkDatasetTcl command, with the "--splitruns" option to get the same number of events in each file. Also, we will use the first BbkDatasetTcl command to make one big tcl file with the information about the whole set. First, you can delete your experimentation directory:


rm -r temp
Next, go to your maketcl directory, and issue the commands:

BbkDatasetTcl -t -ds SP-1235-Jpsitoll-Run1-R16a --basename info-SP-1235-Jpsitoll-Run1-R16a
BbkDatasetTcl -t 100k --splitruns -ds SP-1235-Jpsitoll-Run1-R16a
            

Exclusive decay mode MC is generally not skimmed, so there is no skim name in the collections. You will probably need to put fewer events per job as well. Let's make some B+ --> Jpsi K+ signal MC tcl files:


BbkDatasetTcl -t -ds SP-989-Run1 --basename info-SP-989-Run1
BbkDatasetTcl -t 10k --splitruns -ds SP-989-Run1
Remember to copy and paste these commands into your "commands.txt" file. And if you are using the maketcl and tcl structure that I use, make soft links to the tcl files you made:

cd ../tcl
ln -s ../maketcl/SP-1235*.tcl .
ln -s ../maketcl/SP-989*.tcl .

Splitting Skimmed Signal MC into Run Blocks

When skimmed collections are created for generic MC, the different run blocks are separated into different collections. Unfortunately, doing the same for signal MC would create too many small files for the file handling system to efficiently deal with. So a particular skimmed Signal MC collection could contain events from both Runs 1 and 2, for example. (Run 4, which is SP6 instead of SP5, is separated). In our case - and in most cases - this doesn't matter, but if you ever do need to split a collection by date, you can use BbkExpertTcl to select by "condalias", the month to which the MC corresponds. For example (you do not need to issue this command for the tutorial):


BbkExpertTcl --basename SP-1059-AllEventsSkim-Run1-R16a 
--condalias_select=200002-200010 SP-1059-AllEventsSkim-R16a
with the condalias_select set to the range of the specific run period.

begin end
Run1 200002 200010
Run2 200102 200206
Run3 200212 200306
Run4 200309 200407

tcl files for data; luminosity and B counting

The commands for data are essentially the same. Check the Data Quality page for information on the latest datasets.

To make tcl files, use BbkDatasetTcl as before. The format of the dataset names is slightly different - in particular, data can be on peak or off peak, unlike MC.


BbkDatasetTcl -ds Jpsitoll-Run1-OnPeak-R16a --basename info-Jpsitoll-Run1-OnPeak-R16a
BbkDatasetTcl -t 100k --splitruns -ds Jpsitoll-Run1-OnPeak-R16a
Depending on which skim you are using, yours might be -R14a or -R16b instead. Again, you will need soft links to these tcl files:

cd ../tcl
ln -s ../maketcl/Jpsitoll*.tcl .

To get the B counting (and exact luminosity) for your sample, you can use the BbkLumi script on your single info-Jpsitoll-Run1-OnPeak-R16a.tcl file (from maketcl):


BbkLumi -t info-Jpsitoll-Run1-OnPeak-R16a.tcl

===> info-Jpsitoll-Run1-OnPeak-R16a.tcl

Using B Counting release 16 from collections in TCL file
Using collections:
 /store/PRskims/R12/16.0.2e/Jpsitoll/28/Jpsitoll_2899
 /store/PRskims/R12/16.0.2e/Jpsitoll/29/Jpsitoll_2900
 /store/PRskims/R12/16.0.2e/Jpsitoll/29/Jpsitoll_2901
 /store/PRskims/R12/16.0.2e/Jpsitoll/29/Jpsitoll_2902
 /store/PRskims/R12/16.0.2e/Jpsitoll/29/Jpsitoll_2903
 /store/PRskims/R12/16.0.2e/Jpsitoll/29/Jpsitoll_2904
 /store/PRskims/R12/16.0.2e/Jpsitoll/29/Jpsitoll_2906
 /store/PRskims/R12/16.0.2e/Jpsitoll/29/Jpsitoll_2907
 /store/PRskims/R12/16.0.2e/Jpsitoll/29/Jpsitoll_2908
 /store/PRskims/R12/16.0.2e/Jpsitoll/29/Jpsitoll_2909
 /store/PRskims/R12/16.0.2e/Jpsitoll/29/Jpsitoll_2910
 /store/PRskims/R12/16.0.2e/Jpsitoll/29/Jpsitoll_2911
 /store/PRskims/R12/16.0.2e/Jpsitoll/29/Jpsitoll_2924
 /store/PRskims/R12/16.0.2e/Jpsitoll/29/Jpsitoll_2925
==============================================
Run by penguin at Sat Sep 24 17:34:11 2005
First run = 9932 : Last Run 17106
== Your Run Selection Summary =============
 ***** NOTE only runs in B-counting release 16 considered *****
 ***** Use --OPR or --L3 options to see runs without B-counting *****
 Number of Data Runs                 3206
 Number of Contributing Runs         3206
-------------------------------------------
 Y(4s)   Resonance            ON         OFF
 Number  Recorded           3206           0

== Your Luminosity (pb-1) Summary =========
 Y(4s)  Resonance             ON         OFF
 Lumi   Processed          19293.553      0.000

== Number of BBBar Events Summary =========
             Number    |             ERROR
                       |   (stat.)   (syst.)   (total)
Total       21051921.9 |    24149.6   231571.1   232827.0

==For On / Off subtraction======
Nmumu(ON)            =    9405607.0 +/-       3066.9 (stat)
Nmumu(OFF)           =          0.0 +/-          0.0 (stat)
Nmh(ON)              =   66275935.0 +/-       8141.0 (stat)
Nmh(OFF)             =          0.0 +/-          0.0 (stat)
            
The BbkLumi script knows to remove runs that are declared bad in the tcl file. Be sure to check that all the runs have B counting information - you will see a warning message if any are missing.

Updating your tcl files

Some time after you start your analysis, it may be that more data or MC have been skimmed, and you would like to make tcl files from just these collections. To do this, check the time stamp in last line of the last tcl file of the previous set. For example, in SP-1235-Jpsitoll-Run1-R16a-14.tcl it is
## Last collection added to dataset: 2005/05/31-17:12:15-PDT
Then request only collections created after this time, and start the numbering of these new tcl files at 15 (the last set ended at 14).

BbkDatasetTcl -t 100k,15 --splitruns -ds SP-1235-Jpsitoll-Run1-R16a 
-s 2005/05/31-17:12:16-PDT
Here a second had been added to the time stamp to be sure you don't get the same collections back.

Making tcl snippets

For each job you run, you need not just the tcl file we just made, but a tcl "snippet" to set the FwkCfgVar flags to the values appropriate for this job. This could be tedious to do manually, so with Enrico Robutti's help, I wrote a perl script called make_snippet. Copy this to your workdir.

To make snippets for the tcl files we just created:


make_snippet -MC -tcldir snippets -logdir log -ntpdir hbookdir tcl/SP-1235*.tcl
make_snippet -MC -tcldir snippets -logdir log -ntpdir hbookdir tcl/SP-989*.tcl
make_snippet -data -tcldir snippets -logdir log -ntpdir hbookdir tcl/Jpsitoll*.tcl
If you skip the flags specifying the directories, everything will end up in workdir. Type make_snippet -h if you can't remember the command options.

For each filename.tcl, it creates a file run_filename.tcl. Let's look at one:

cat snippets/run_SP-1235-Jpsitoll-Run1-R16a-1.tcl

#..See Analysis.tcl for description of FwkCfgVars.
sourceFoundFile tcl/SP-1235-Jpsitoll-Run1-R16a-1.tcl
set ConfigPatch "MC"
set FilterOnTag "false"
set BetaMiniTuple "hbook"
set histFileName hbookdir/SP-1235-Jpsitoll-Run1-R16a-1.hbook
set NEvents 0
sourceFoundFile Analysis.tcl
            
You should edit make_snippet to match your own taste. root vs hbook, for example, or FilterOnTag "true". The selection between MC and data is done via an option flag of make_snippet - you don't need to edit this.

make_snippet also creates a script to submit all 13 jobs to the batch queue, called sub_SP-1235-Jpsitoll-Run1-R16a-1 in this case. We will come back to it when we talk about the batch system.


Running jobs

Running jobs interactively

Check your path

Before running an actual job, it is a good idea to look at your path. You might notice things are missing, or extra things you don't need. We will use run_SP-1235-Jpsitoll-Run1-R16a-1.tcl. By default, the NEvents flag is set to run on all events ("0"). To set up your job, but not execute an "ev beg" command, edit the file and change NEvents to -1.

To run the job (from workdir), type


BtaTupleApp snippets/run_SP-1235-Jpsitoll-Run1-R16a-1.tcl
Here is the resulting output, ending at a framework prompt: path.txt.

The first part lists the values of various FwkCfgVars. Notice, for example, that PrintFreq is not set in the snippet, so we just get the default value of 1000.

Most of the output is the path list, which starts with Everything and ends with BtuTupleMaker. Read through the one-line descriptions to get an idea of what is actually happening in your job. Note your AnalysisSequence is a pretty small part of the whole operation.

While we are here, let's check the meaning of the muon ID bit map:

mod talk MuonMicroDispatch
MuonMicroDispatch> show

Current value of item(s) in the "MuonMicroDispatch" module:

      Value of verbose for module MuonMicroDispatch: f
      Value of production for module MuonMicroDispatch: f
      Value of enableFrames for module MuonMicroDispatch: f
      Value of outputMap for module MuonMicroDispatch: IfdStrKey(muSelectorsMap)
      Value of inputList for module MuonMicroDispatch: IfdStrKey(ChargedTracks)
      Value of inputMaps for module MuonMicroDispatch:
inputMaps[0]=muMicroMinimumIonizing
inputMaps[1]=muMicroVeryLoose
inputMaps[2]=muMicroLoose
inputMaps[3]=muMicroTight
inputMaps[4]=muMicroVeryTight
inputMaps[5]=muNNVeryLoose
inputMaps[6]=muNNLoose
inputMaps[7]=muNNTight
inputMaps[8]=muNNVeryTight
inputMaps[9]=muNNVeryLooseFakeRate
inputMaps[10]=muNNLooseFakeRate
inputMaps[11]=muNNTightFakeRate
inputMaps[12]=muNNVeryTightFakeRate
inputMaps[13]=muLikeVeryLoose
inputMaps[14]=muLikeLoose
inputMaps[15]=muLikeTight
            
I print out this sort of thing and put it in my log book. (How did I know this was the correct module to check? From the description in the path list.) You can do the same thing for the other four charged particles, and for TrkMicroDispatch, which fills the track bit map.

There are a lot of sequences listed under SimpleComposition, which you might think you want to disable to save time. In fact, a key feature of SimpleComposition is that it doesn't make the lists unless they are requested somewhere, so there is essentially no overhead.

After you are finished browsing, type exit to end your talk with MuonMicroDispatch, and exit again to exit the framework (ie, end this job).

Run an interactive job

Edit run_SP-1235-Jpsitoll-Run1-R16a-1.tcl and set the number of events to 1000. It is easiest to run the job in background mode so that you get a log file.

BtaTupleApp snippets/run_SP-1235-Jpsitoll-Run1-R16a-1.tcl >& 
log/SP-1235-Jpsitoll-Run1-R16a-1.log &
(where the above command should be executed as a single line). Here is the resulting file: SP-1235-Jpsitoll-Run1-R16a-1.log

There are quite few comments in here that look alarming, but in fact, this job completed successfully.

From the log file, you can get the total number of events processed and the CPU time:

EvtCounter:EvtCounter:     total number of events=1000
total number of events processed=1000
total number of events skipped=0
EvtCounter:Total CPU usage: 123 User: 119 System: 4
Also a report from the Tag Filter, which is pretty boring in this case, since we didn't turn it on:
TagJpsill:TagJpsill: endJob summary:
Events processed: 0
Events passed : 0
Events prescaled: 0

Submit a batch job

Edit run_SP-989-Run1-1.tcl to run on 1000 events, then submit it to the kanga queue, which you should be using for analysis at SLAC. (Other Tier A sites will have other commands.)

bsub -q kanga -o log/SP-989-Run1-1.log ../bin/$BFARCH/BtaTupleApp
snippets/run_SP-989-Run1-1.tcl 
            
(where the above command should be entered as a single line).

Check on the status of your job:

bjobs

JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
710693  hearty  RUN   kanga      yakut05     barb0309    *un4-1.tcl Feb  8 14:58
Or check the log file
bpeek 710693 | tail 
(or bpeek -f 710693) Here is the resulting log file: SP-989-Run1-1.log

There is extra information at the end with respect to the interactive job. "Successfully completed" is what you like to see at the end.

Dealing with many batch jobs

The script created by make_snippet, sub_SP-1235-Jpsitoll-Run1-R16a-1 for example, will submit all the jobs for you at once. You should probably not put more than a couple of hundred jobs in the queue at one time.

source sub_SP-1235-Jpsitoll-Run1-R16a-1 
As the jobs finish, move the success ones into the successful directory (grep for "Successfully completed") and the failures (grep for "exit code" or "Abort") into the failed directory. If you have a few mysterious failures, you might try just resubmitting them.

You might find it useful to make some small, temporary, scripts to do this sort of thing. For example,


grep "Successfully completed" log/*.log > move-2
cat move-2

log/SP-989-Run1-1.log:Successfully completed.
            
Then edit move-2 to change lines like
log/SP-989-Run1-1.log:Successfully completed.
into
mv log/SP-989-Run1-1.log successful/
mv hbookdir/SP-989-Run1-1.hbook successful/
/bin/rm tcl/SP-989-Run1-1.tcl
/bin/rm snippets/run_SP-989-Run1-1.tcl
gzip maketcl/SP-989-Run1-1.tcl
To use the script, just say source move-2

As I mentioned before, there might be a real task manager to do this sort of thing for you.

Be sure to keep track of all your failures so that you can correct the luminosity weighting of your MC sample.

If you are using hbook files (vs root), you will want to gzip them before you transfer them to your local disks. Either way, you will definitely want to gzip the log files.

Ntuple Structure

Take a look at the resulting ntuple (now located in the "successful" directory). To start paw, type:
paw
and then "enter" to accept the default workstation type. Now, to look at your ntuple, do:
PAW > hist/file 1 successful/SP-989-Run1-1.hbook 
PAW > nt/list


 ===> Directory :
          3 (N)   myNtuple
          2 (N)   Analysis Ntuple
          1 (N)   SmpListInfo
            
This gives you a list of ntuples in your hbook file. The one you are interested in is myNtuple. To see a list of its contents, type:
PAW > nt/print 3
            
Here is the printout of the ntuple structure: ntuple-structure.txt

You will want to navigate between the different blocks and different candidates.

BLund tells you what type of B this candidate is

Bd1Lund is the particle type of the first daughter

How do you know what the numbers mean? One way to find out is to look in the file:

workdir/PARENT/PDT/pdt.table
The lund ID numbers of the different particles are given in the fifth column. For example, a J/psi has ID number 443.

Bd1Idx is the index of that daughter in the relevant block (Jpsi in this case). Note that the indices all use C++ numbering, so that Bd1Idx = 0 refers to the first entry in the Jpsi block. If you are actually going to use paw, you will need to add 1 to all the indices. You will probably get this wrong numerous times.

You can similarly navigate from the Jpsi block to the mu block (or e block, once you add Jpsi --> e e decays), and from the mu block to the TRK block.

When using electrons with bremsstrahlung recovery (in which radiated photons are added back into the reconstructed electron), the navigation is a bit tricky. The brem-recovered electron is a composite, consisting of an electron plus one, two, or three photons. So it does not point to the TRK block (the index is -1); you need to navigate to the daughter electron, which does point to the TRK block. This is the only case where a daughter of a particle is the same type of particle.

The Root III chapter of the workbook gives an example of an analysis on a similar type of ntuple. Note that the ntuples in that chapter were produced using the old CompositionUtil package, and have the opposite indexing problem of the current ntuples (i.e., in Root, you need to subtract 1 from all indices).

The Paw II chapter discusses macros a bit. Here's a fortran macro called smallcode.f, which you can copy to your workdir. When doing analyses in paw, I use nt/loop.

PAW > hist/file 1 successful/SP-989-Run1-1.hbook
PAW > nt/loop 3 smallcode.f(0)
PAW > h/pl 1000
            

This will give you a nice plot of the Jpsi mass for 1000 events.

On some systems, you can say nt/loop 3 smallcode.f77(0) to compile the code before it is run.

Note on MC truth Matching

The MC block contains the full MC truth for the event at the four-vector level, and is very useful for categorizing different types of backgrounds, for example.

The names such as JpsiMCIdx connect the reconstructed candidate to the MC truth block. However, the MC matching is a bit dicey for any state that can radiate. (And we may be using Photos not just for leptons, but also for hadrons in some of our MC). For example, if you plot JpsiMCIdx, you will see that it is -1 in many cases (i.e., no MC truth match was found). But if you plot the lund ID of the MC-truth matched partners of the muon daughters of the Jpsi, you see that they are actually muons:


nt/plot 3.mclund(muMcidx(1)+1)
nt/plot 3.mclund(muMcidx(2)+1)

(Why muons? Because the plots show that all entries fall in the bins [-12,-13] and [13, 14] - that is, they have ID numbers of +/-13. And looking in workdir/PARENT/PDT/pdt.table, you see that 13 = mu- and -13 = mu+.)

The problem is that in the MC truth, the Jpsi actually decayed to mu+ mu- gamma, while the reconstructed Jpsi contains only the muons. In principal, this gamma could be 1 MeV, and of no real consequence. The lesson is that you cannot blindly use the MC truth match for composites. Actually, it is probably better to avoid MC truth matching entirely if you can. (In this case, if I did need to match, I would say the Jpsi is matched if the two muons daughter match the MC-truth daughters of the Jpsi).

Next steps

Here are a few things to try before we go on to the more complex SimpleComposition case.

Make a decent Jpsi mass plot (with binning suitable for the detector resolution), and observe how it changes if you require muNNLoose on both legs instead of VeryLoose. It might be interesting to check the scatterplot of momentum vs theta for your muons to see where they fall in the detector.

What happens if you require GoodTracksLoose for both muon legs and the Kaon? You can do this by checking the bitmap, but you should also try sublisting in SimpleComposition. Another approach (suitable for your own analysis code, not for skimming), would be to change the input lists to the muon and kaon selectors. Look through the path to figure out where to do this.

Perhaps it would be worth adding a cut on the quality of the vertex of the Jpsi. Note that we don't want to make a cut on the chisquare of the mass-constrained Jpsi - this would be like making a cut on the Jpsi mass. You will need to add this quantity to your ntuple, and will need to compare signal MC to data to see if there is anything to gain.

Here is an example of a slightly more complicated analysis. It will be the basis of the next section of this tutorial.

Self-Conjugate Modes and Clones in SimpleComposition

Before expanding our analysis, we should discuss the issues of self-conjugate modes and clones, which can be quite confusing if you don't know the underlying issues. HN is full of questions, including some from me. The tag of SimpleComposition in analysis-26 has fixed many of the issues of self-conjugates but it is worth being aware of the issues."

Self-Conjugate modes

If you build the decay A --> B C in SimpleComposition, it will, in most cases, automatically build A-bar --> B-bar C-bar. If B = B-bar, or C = C-bar (Jpsi, for example, or Ks), it knows to use the B or C list instead of the B-bar or C-bar. So if you ask for B+ --> Jpsi K+, you will also get B- --> Jpsi K-.

You will not get the A-bar list, however, if A = A-bar. There can be consequences, if you are not careful. Consider, for example, creating a Jpsi from one muon required to satisfy muNNLoose, and the other muNNVeryLoose

mod clone SmpMakerDefiner BadJpsiToMuMu
seq append AnalysisSequence BadJpsiToMuMu
talkto BadJpsiToMuMu {
decayMode set "J/psi -> mu+ mu-"
daughterListNames set muNNLoose
daughterListNames set muNNVeryLoose
fittingAlgorithm set "Cascade"
fitConstraints set "Geo"
preFitSelectors set "Mass 2.8:3.4"
postFitSelectors set "Mass 2.9:3.3"
}
The problem is that SimpleComposition will do what you asked - require the mu+ to be a muNNLoose and the mu- to be a muNNVeryLoose. It does not do the charge-conjugate list, J/psi --> mu- mu+. If you want this case, you need to create it yourself and merge the two. Personally, I find it easier just to use the same list when a particle appears twice in the final state.

Clones

A related topic is that of clones. When SimpleComposition merges two lists --- the A and A-bar lists, for example --- it checks for clones and, by default, removes them. Two candidates are considered to be clones if the final state contains the same candidates (in any order) with the same mass hypotheses. Note that it does not check whether or not the initial state is the same.

As an example, you make a D0 --> pi+ pi- list. SimpleComposition will also make a D0-bar --> pi- pi+ lists, but will discover upon merging the two lists that it has the same pair of tracks on both lists. It will then delete all of the D0-bar candidates. So when you reconstruct B+ --> D0 K+ decays, you will end up only with B+ candidates, and no B- candidates. Note that if you use different lists for the pi- and pi+, it will delete only some of the D0-bar candidates, so you will end up with wrong answers, but perhaps not obviously wrong answers.

To override this default behaviour, add to your module the line

disableCloneCheck set true
Note that if you were merging D0 --> pi+ pi- (both GoodTracksLoose) and D0 --> K+ K- (also both GoodTracksLoose), you will keep both candidates, even if you have used the same tracks. This is because the candidates have a different mass hypothesis.

Your Analysis

Let's end with a few suggestions about your own analysis.
  • Discuss it with your conveners and the AWG before you get started, and frequently thereafter.
  • Consider posting your analysis tcl and list of tags to your AWG hypernews before you make a lot of ntuples - this might save you a lot of problems.
  • Along the same line, check every quantity in your ntuple on a signal MC and a data job before you start.

Back to Workbook Front Page

Author: Chris Hearty
Last modified: September 2005
Last significant update: September 2005