Workbook for BaBar Offline Users - Generation, Simulation and Reconstruction
In the Quicktour, you learned
how to run BetaMiniApp, a Beta application. For Beta applications,
the input is a collection from the CM2 event store. These collections
have already undergone several stages of reconstruction. In this section
you will learn about and how to use the three executables that produce the
collections in the event store.
Contents
Reconstruction of real data
The journey of real data begins in the detector. A high-energy
e+e- collision results in a shower of particles, which spread out,
interact with and pass through the various layers of the detector.
Thus the first format of data is raw detector signals. Next, the raw
detector signals are digitized. Finally, events that pass the Trigger
(a very loose filter) are put in XTC files to await reconstruction
(actually, "prompt reconstruction").
The program that performs the reconstruction of real data is called Elf
(executable = ElfUserXtcApp). Reconstruction takes place in three steps.
First, the hits are reconstructed into the basic objects corresponding
to individual particles: tracks in the SVT and DCH, and clusters in the
EMC and IFR. Second, particle identification (PID) algorithms are used to
assign probable identities to the particles. This produces the AllEvents
data set. Finally, tagging creates a database of tag bits, simple boolean
or boolean-like flags which are useful for quick data skims. The result
is the AllEventsSkim data set.
Production and reconstruction of
simulated data
The aim of simulation production is to create simulated (Monte Carlo)
collections that mimic real data collections as closely as possible.
Therefore, it is not enough just to generate a given decay -- one must
also simulate the propagation of the particles through
the detector, and the detector response to those events.
Several stages are required to produce these simulated data:
- Generation of the underlying physics event;
- Particle transport and calculation of the idealized energy
deposits in the detector;
- Overlaying of backgrounds and digitization of the energy deposits;
- Reconstruction of the event
The last step, reconstruction, is performed by an executable called
BearApp from the Bear package. Bear is the simulation equivalent of Elf
package for real data reconstruction. Like Elf, it takes collections
of digits and runs the full reconstruction chain, invoking the
reconstruction modules within the SVT, DCH, DRC, EMC and IFR
sub-systems. And as in Elf, the output from Bear are collections
designed to be used in physics analyses (i.e. accessible to
Beta). The only important
difference is that Bear is for Monte Carlo collections, and Elf is for
real data collections.
The other steps of course exist on the simulation side only, since
for real data there is no need to simulate events and the detector
response.
The first two steps -- generation and propagation of
particles -- are performed by a program called Bogus. (The actual
package name is BgsApp, which is also the name of the executable.)
BOGUS, the 'BaBar Object-oriented Geant4-based
Unified Simulation' is BaBar's detector simulation layer over
GEANT4 running in the BaBar Framework. The output of Bogus is in a data structure
called GHits, which lists the (idealized) energy deposited by the particles
passing through the detector, and the location of each energy deposit.
Stage 3 takes these idealized GHits and applies digitization, that
is, it transforms them into signals which look like the real
data collected by the detector electronics. At this stage also
backgrounds are overlaid; the final output looks as similar as
possible to real data collected by the detector. The package for
this stage is called SimApp (executable = SimAppApp).
In the past, the simulation was always performed by the
3-step Bgs-SimApp-Bear procedure. However, the majority of users do
not need the intermediate collections
produced by Bogus and SimApp -- all they want are the Bear output
collections. Therefore, a new program called Moose (Monolithic
Object-Oriented Simulation Executable) was created. Moose does
all three stages in one step.
Now most users use Moose instead of the 3-step procedure.
The only reasons a person might want to run the 3-step procedure
instead would be (a) to study the intermediate collections (output
collections of Bogus or SimApp), (b) to run software from before release 12,
or (c) to test Moose against the 3-step method.
Skim production
The CM2 event store also contains skims, subsets of the
AllEventsSkim collection that contain a given set of tag bits. Skims are
produced by running the SkimMiniApp executable (from the SkimMini package)
over the AllEventsSkim collection. SkimMiniApp can be run over
the output collections from Elf or Bear, provided they contain the
required tag bits.
The names of the skims can be found in FilterTools/defineMiniSkims.tcl.
The main code for the skims is often in other packages, but you can find
it by looking at the corresponding FilterTools/XXXPath.tcl file,
where XXX is the name of the skim.
Skims in BaBar Webpage
Reconstructing the raw data that comes from the detector or the
simulation. Gives more detail about the reconstruction software, and
the objects (e.g., tracks and clusters).
BaBar Reconstruction Software Webpage
As you saw above, BaBar reconstruction is performed by two packages:
Elf for real data, and Bear (or Moose, which includes Bear) for
simulated data. The output of Elf and Bear (or Moose) are CM2
collections, ready to be analyzed with Beta applications.
The reconstruction algorithms are usually classes within modules,
placed in the appropriate XxxReco and XxxPid packages (where Xxx =
sub-system name). The Elf and Moose applications select which
reconstruction modules are to be run via a Framework/tcl
interface, described earlier. The types of objects produced by the
reconstruction algorithms are usually defined in the appropriate
XxxData packages. Each object has data and code needed to provide the
information the user wants and is loaded into the event store for
easy access.
What Elf and Bear (or Moose) load to the event store
Elf and Bear (or Moose) load all of the important reconstruction
objects from each sub-system into the event store so that they can
be accessed easily later, within a Beta analysis program for example.
A complete list of what Elf and Bear (or Moose) store in the event
would be too long to put here, but a few important examples are tracks
from the SVT and DCH reconstruction algorithms (TrkRecoTrk objects),
DRC candidate tracks (DrcTracks), candidates identified in the EMC
(EmcCands) and IFR clusters.
The main task of the reconstruction software is to take the
hits (digis) in the different subdetectors, and reconstruct
them into the basic particle objects: tracks in the SVT and DCH,
and clusters in the EMC and IFR. Then particle identification (PID)
algorithms are used to assign probable identities to the particles.
This section describes how this is done in more detail.
The SvtReco and SvtData packages contain code for some of the
reconstruction of tracks in the SVT.
Related documents
Track Reconstruction page
The reconstruction and particle identification algorithms for the
calorimeter are in the EmcReco and EmcPid packages respectively. The
important data objects created are EmcClusters, EmcBumps and EmcCands.
An EmcCluster is made up from a continuously connected region of
EmcDigis (where connected means that either the sides or the corners
of the EmcDigi locations (single crystals) are touching). Each cluster
is split into local maxima known as bumps (EmcBumps). Both type of
objects represent energy deposits in the calorimeter and for each type
you can ask how much energy was deposited and where. These energy
deposits are produced either by neutral or charged particles (tracks),
so it is important to know whether a track belongs to a cluster/bump.
Tracks are extrapolated from the drift chamber and are intersected
with the EMC. Track-cluster/bump algorithms are used to find out
which clusters/bumps are matched to tracks. The clusters/bumps that
are matched to tracks are considered charged candidates, while those
that are not matched are considered neutral candidates. These
candidates are objects of the type EmcCand. An EmcCand contains the
cluster/bump and the charged track if there is a match. Particle
identification information, such as the probability of a candidate
being an electron or charged pion, is added to each EmcCand by
algorithms in the EmcPid package.
Related Documents:
EMC software page.
Instrumented flux return output - neutral and charged clusters
Related Documents:
IFR software (CHEP 98 paper, Luca Lista et al)
Related Documents:
BAD 116: Kaon Selection at BaBar
BAD 165: An Event Likelihood Algorithm for DIRC Based Particle Identification
BAD 23: A Maximum Likelihood Method for DIRC Particle Identification
Particle ID from the EMC - e, gamma and pi0
Particle identification for the EMC is done by the EmcPid package.
The Pid output is put in EmcCands and can be accessed via EmcPidInfo
objects. Each EmcPidInfo object stores the statisical information
(significance levels and likelihoods) for the candidate of being of a
particular type (electron or pion, for example). The variable and the
name of the algorithm used to get the statistical information are also
stored in the EmcPidInfo objects. Neutral candidates, such as pi0's
and gammas, are made from clusters not matched to charged tracks,
while charged candidates, such as electrons and charged pions, are
made from clusters matched to charged tracks. Different discriminating
variables are used to identify the candidates. The E/p ratio (energy
of cluster/ momentum of track) is used to identify
electrons/positrons, while only the energy deposited by a cluster is
used to identify muons and other minimum ionizing particles. The
photon and pi0 reconstruction algorithms look at how the energy within
a cluster is distributed about a given set of axes (second moments).
Currently, each algorithm designed to identify a particular particle
type only produces statistical results for that hypothesis. This is
inadequate for likelihood analyses, where the statistical information
about whether the E/p value of a candidate is electron-like, pion-like
or muon-like, for example, is required (only the electron-like value
is supplied). This situation should improve in the near future.
Muon ID from the IFR
Related Documents:
BAD 474: Studies of A Neural Net Based Muon Selector for the BaBar Experiment
BAD 60: Muon Identification in the BaBar Experiment
Particle ID at BaBar (CHEP 98 paper, Gautier HdM et al, draft due late August)
Data begins as signals in the BaBar detector, and must pass
through several stages before it is translated into a collection in
the BaBar event store. The signals are digitized and reconstructed
into tracks and clusters. Particle ID algorithms identify each track or
cluster as a candidate for a given type of particle. Once the final-state
particles are identified, they are used reconstruct other particles in the
decay chain.
Simulated data consists of generated particles, each with their own
identities and four-vectors, which propagate through and leave signals in
a GEANT4-simulated BaBar detector. These signals have the same format as
the signals left by real data in a real detector, so from that point
on reconstruction is the same as for real data.
Skims are subsets of the full data set that contain particular
tag bits needed for a given analysis.
So far, this page has described how the reconstruction applications
work. Now it is time to see for yourself. The next section will teach
you how to run the Moose, Elf, and SkimMini applications.
Related Documents:
The following examples will show you how to run the main executables
for the packages used to produce CM2 collections: Moose, Elf, and SkimMini.
They are for the most part identical to the examples in the CM2 intro doc.
Note that in practice, user collections (ones you make yourself) are
for testing purposes only. For a real analysis, you are
expected to use only official BaBar collections from the CM2 event store.
Elf, Moose, and SkimMini are set up to write output collections.
To make a space for these collections, enter the command:
> KanUserAdmin createuser
This will create a directory /work/users/<username> where
you can put your collections. The following examples are for
a user with username elephant. To make them work for you,
you will of course need to put in your own username instead
of elephant, and your own initial instead of e.
So far in the Workbook, you have always used analysis-30 as your
test release. analysis-xx releases are the recommended releases
for running Beta applications like BetaMiniApp. However, for
simulation production, reconstruction and skimming, you should
instead use a recommended production release.
Production releases for reconstruction are built regularly and stored
in the directory $BFROOT/reco. A full description of the tags and changes
for each release can be found at the Summary of all lettered N.M.Lx releases,
or at the Software Release Announcements HN forum for the main N.M.L
releases. To find out which release is recommended for your specific
application, chec the relevant HN forum: Simulation Production
for Moose, Design of the Common Reconstruction Software for Elf, and Skim Task Force for SkimMini.
For simplicity, the following examples all use the same production release
18.6.1d. Note, however, that production releases become obsolete rather
quickly. This tutorial was created in January 2006, but within a few
months you may need to use a later release in order for the tutorial
to work.
To begin, set up your test release:
> newrel -s $BFROOT/work/p/penguin -t 18.6.1d my-18.6.1d
> cd my-18.6.1d
Set up your path:
my-18.6.1d> srtpath <enter> <enter>
Setup the correct conditions database:
my-18.6.1d> cond18boot
Check out and set up the workdir package:
my-18.6.1d> addpkg workdir
my-18.6.1d> cd workdir
my-18.6.1d> gmake setup
See the Quicktour for more
information about any of these commands.
Purpose
This is the application used to produce MC events, including generation,
simulation, digitization , and reconstruction.
Input
The user specifies a decay file for an exclusive or inclusive decay process.
Output
The output is the same as the output of the reconstruction program. In
addition, the tagbits used to define the skims for physics and detector
studies are also stored in the output collection. The format of the output
collection is a root file.
Requirements
- Decay file for your favorite decay mode. These can be found
in workdir/PARENT/ProdDecayFiles
- A tcl file to configure correctly the I/O (see the example below).
Example
In workdir, create a tcl "snippet" with your favorite options for
each job you want to run. Here is an example mymoose.tcl:
# -------- mymoose.tcl: begin -----------
set ProdTclOnly true
set RUNNUM 987654
set CONDALIAS Aug2001
set SimAppBkndInputCollection /store/SP/BkgTriggers/BkgTriggers_200108_V01
set SimAppBkndFirstEvent 1
set SimAppDigiMix true
set SimAppKanDigiMix true
set NEVENT 100
set UDECAY PARENT/ProdDecayFiles/B0B0bar_generic.dec
set MooseHBookFile myMoose.hbook
set MooseOutputCollection /work/users/elephant/MooseCol
mod talk KanEventOutput
allowDirectoryCreation set true
exit
sourceFoundFile Moose/MooseProduction.tcl
# -------- mymoose.tcl: end -----------
As you can see the tcl snippet is quite simple, it just sets a few
configuration parameters specific to a particular job you would like to run:
- RUNNUM - Set the (arbitrary) run number. Also used as the seed for
the random number generator
- CONDALIAS - set the data taking period for which you want to generate MC.
- NEVENT - number of events to generate.
- UDECAY - The input to Moose is a dec file from the the release's
ProdDecayFiles directory. There are different dec files for different
decay modes. Here you have chosen B0B0bar_generic.dec, so you will
be generating generic B0B0bar events.
- SimAppDigiMix - enables/disables background mixing. Must be set true
for Moose to work. (Default is false.) Background trigger
event mixing, or "digimixing", overlays radiation, beam gas, beam wall
events, detector noise, and other sources.
- SimAppBkndInputCollection - set the collection to use for background
mixing. Must be consistent with the CONDALIAS setting (note that
Aug2001 = 200108). For a list of background collections use the command
"BbkSqlShell select * from bbrmdc.prod_bkg".
- SimAppBkndFirstEvent - set the first event from the background collection to use
- SimAppKanDigiMix - use a CM2 (true) or BDB (false) background collection.
- MooseHBookFile - name of the monitoring plots file. The extension (.root
or .hbook) tells Moose what kind of file to produce (ROOT or PAW).
- MooseOutputCollection - name of the output collection. You should put
your collections in the space created with KanUserAdmin, as shown above. That is, the name of the output collection should
begin with /work/users/elephant/.
- The KanEventOutput "module talk" that sets "allowDirectoryCreation" to
true allows any necessary subdirectories in your /work/users/elephant
area to be created automatically.
- sourceFoundFile Moose/MooseProduction.tcl passes the job
to MooseProduction.tcl
Look in MooseProduction.tcl to see some of the other available options.
Note: If you copy-and-paste this tcl file, don't forget to change "elephant" to your
own username!
Now the command to run your job is:
workdir> MooseApp mymoose.tcl
This will run your job interactively. Since you are running only 100
events, this is not a problem. For long jobs, however, it is best to submit
your jobs to the batch queue, like this:
workdir> bsub -q long -o MyMoose.log MooseApp mymoose.tcl
Once the job is done, go to the next section to learn how to examine
your CM2 collections.
Related Documents:
Moose: BaBar Simulation Application The original example
from the CM2 intro
doc, on which this example is based.
You already used KanUserAdmin to create the collection space
/work/users/elephant. KanUserAdmin is also the tool that you use to keep
track of the output collections in your collection space.
To see a list of your collections, enter the command:
> KanUserAdmin list /work/users/elephant
/work/users/elephant/MooseCol
So you have one collection, /work/users/elephant/MooseCol.
The collections produced by Moose are just ordinary collections,
so you can run on them with BetaMiniApp. All you have to do is
change the input collection to KanEventInput in the
Quicktour:
> mod talk KanEventInput
KanEventInput> input add /work/users/elephant/MooseCol
KanEventInput> exit
If you want to delete your collection, the command is:
KanUserAdmin delete /work/users/elephant/MooseCol
Note that Moose will not overwrite collections that already exist,
so if you want to rerun Moose you must either set a new collection
name or delete the old collection first.
To learn more about KanUserAdmin and its options, use the help option:
KanUserAdmin help
Normally, KanUserAdmin is all you need to manage your collections.
However, it does not tell you very much about them. If you want to
know more about your user collections, or any other CM2 collections,
the tool you need is KanCollUtil.
For example, you can see the number of events in your collection,
KanCollUtil /work/users/elephant/MooseCol
/work/users/elephant/MooseCol (100 events)
see a list of the actual files,
KanCollUtil -L /work/users/elephant/MooseCol
/work/users/elephant/MooseCol (100 events)
LFN 000 /work/users/elephant/MooseCol.01.root (owned)
LFN 001 /work/users/elephant/MooseCol.02E.root (owned)
and see the physical location of the files (it's NOT /work/users/elephant):
KanCollUtil -P /work/users/elephant/MooseCol
/work/users/elephant/MooseCol (100 events)
PFN 000 /nfs/kan001/vol6//work/users/elephant/MooseCol.01.root
PFN 001 /nfs/kan001/vol6//work/users/elephant/MooseCol.02E.root
You can see that your collection is made up of two ROOT files.
"MooseCol.01.root" is the micro part of the mini collection,
and "MooseCol.02E.root" is the rest.
(See the Event Store
section for more information about these terms.)
Larger collections are made of more than two files.
For more information about KanUtils, check out:
Utilities for accessing
CM2 collections
(from the CM2 intro doc)
Purpose
This is the application used to reconstruct the events recorded in
IR-2 with the online system.
Input
The output of the online system is stored in the xtc files which are
used as input to run Elf.
Output
The output of the reconstruction are charged tracks and neutral particles
that are then used to perform physics analysis. The output of Elf is by
default a root file. You can still (for test purposes) store the output as a
collection in objectivity, but this requires some modification of the default
tcl parameters.
Requirements
- The xtc file for a run. These are usually in /nfs/farm/babar/tcfiles.
- A tcl file to configure correctly the I/O (see the example below).
Example
Because it takes up so much space, raw data is kept on tape and must
be "staged in" before being used. So to begin, you need to stage the xtc
file for the run you are interested in:
workdir> tcstage 0020029-001
The system might respond,
File /nfs/babar/tcfiles/babar-0020029-001.xtc is already on disk
in which case you are ready to go. Or it could take a while, during which
you might see responses from the system like
Fetching runs-0020000/babar-0020029-001.xtc from HPSS thru tcstage ...
tcstage: Waiting for 1 other clients to finish ...
before you are told that the run has been successfully staged to disk,
with a response like:
File /nfs/babar/tcfiles/babar-0020029-001.xtc staged from HPSS
This indicates that the file has been staged and is ready to use.
Next, in workdir, you must create a tcl snippet with your favorite
options for each job you want to run. Here is an example myelf.tcl:
# -------- myelf.tcl: begin -----------
set ElfConfigPatchSet Run2
set ElfHistoFileName myElf.hbook
set ElfOutputCollection /work/users/elephant/ElfCol
sourceFoundFile Elf/ElfProduction.tcl
# -------- myelf.tcl: end -----------
As you can see the tcl snippet is quite simple, it just sets a few
configuration parameters specific to a particular job you would like to run:
- ElfConfigPatchSet - the config patch: MC for MC, Run1 for Run1, Run2 for Run2/3/4
- ElfHistoFileName - name of the output hbook file
- ElfOutputCollection - name of the output collection
The full list of available options is documented in Elf/ElfProduction.tcl
which is always kept up to date.
The run command is (all one line):
workdir> ElfUserXtcApp -n 100 -s 20 -f
/nfs/farm/babar/tcfiles/babar-0020029-001.xtc myelf.tcl
where the command line options are:
- -n: total number of events to run, including those skipped
- -s: events to skip. In this case you run on events 21 - 100
- -f: location of XTC file
Once the job is done, use the instructions in the
KanUtils section to examine your new CM2 collection.
Releated Documents:
Elf: BaBar Event Reconstruction Application The original
example from the CM2 intro doc, on which this example is based.
Purpose
Make skim collections by filtering on the tagbits. Optionally write user
configured (and skim-specific) lists of composite candidates as well as user
data to the skim output.
Input
You can run on an existing data or MC collection produced with Elf, Moose, or another SkimMiniApp job.
Output
A collection in the CM2 eventstore.
Requirements
- A collection for data or MC events
- A tcl file to configure correctly the I/O (see the example below).
Example
To begin, create a directory for all of the skims you are
going to produce:
KanUserAdmin mkdir /work/users/elephant/SkimDir
Next, in workdir you need to create a tcl snippet with your
favorite options for each job you want to run. Here is an example
myskim.tcl:
# -------- myskim.tcl: begin -----------
set SkimNEvent 100
set SkimsToRun all
set SkimsToWrite all
set SkimOutputDir /work/users/elephant/SkimDir
#set SkimMC yes
set SkimInputCollection /store/PR/R18/AllEvents/0001/81/18.1.0c/AllEvents_00018190_18.1.0cV00
sourceFoundFile SkimMini/SkimMiniProduction.tcl
# -------- myskim.tcl: end -----------
Where the "set SkimInput Collection" collection should be all given
on one line - it is only split here for formatting purposes.
As you can see the tcl snippet is quite simple, it just sets a few
configuration parameters specific to a particular job you would like to run:
- SkimNEvent - number of input events to read. If this isn't set all events
from the input colletion are read.
- SkimsToRun - Skims for which their selection code should run. Possible
values are {all, none, or specific skims}. You can determine
the name of possible skims from FilterTools/defineMiniSkims.tcl.
- SkimsToWrite - Skims to write. Possible values are {all, none, or
a list of specific skims}. If this isn't set the default
is "all".
- SkimOutputDir - output directory (in collection space) where the output
skim collections are to be written. Here you put your
collections in the /work/users/elephant/SkimDir
directory you created above.
- SkimMC - set this for MC skims
- SkimInputCollection - input collection you want to skim
The full list of available options (and most up-to-date documentation of
them) can be found in SkimMini/SkimMiniProduction.tcl.
You are now ready to run:
$ SkimMiniApp myskim.tcl
Once the job is done, check your SkimDir directory:
KanUserAdmin list /work/users/penguin/SkimDir
Because you set "SkimsToRun all" and "SkimsToWrite all", the system should
respond with a long list of skims.
SkimMiniApp: BaBar Skimming Application The original
example from the CM2 intro doc, on which this example is based.
If you have made it this far, then give yourself a pat on the back.
You have successfully run the programs for the four main BaBar packages:
BetaMini, Moose, Elf, and SkimMini.
General related documents
Authors:
John Back,
Shahram Rahatlou
Contributors:
Bob Jacobsen,
Peter Elmer,
Last modification: 13 April 2006
Last significant update: 25 July 2005
|