Data Management in the BaBar Event Store
To avoid having data disappear off disk
when the staging system gets busy, we can manually
load both complete runs and specific collections to disk. The
following sections describe that process.
The amount of disk space used by these collections is normally reported
every other week to the Physics/Reconstruction/Simulation
Forum.
Complete Runs
We explicitly manage the raw and reco information for complete runs
that is kept on disk. The aim is to provide stability for people
who need large samples of raw and reco to study, while still being able
to accommodate new specific requests.
Several web pages maintain the lists of on-disk data. At the moment this service is not available for physboot2 and newer federations.
For physboot, please see:
http://www.slac.stanford.edu/babar-internal/keptdata/ana3/requests.html.
For analboot2, please see:
http://www.slac.stanford.edu/babar-internal/keptdata/ana2/requests.html.
For data in sp3analboot, please see:
http://www.slac.stanford.edu/babar-internal/keptdata/sp3/requests.html
For data in simuboot (sp4analboot), please see:
http://www.slac.stanford.edu/babar-internal/keptdata/sp4/requests.html
If you discover that one of these collections does not have all the
listed data on disk, that's an error; please let
us know.
We do the actually loading, etc, during the scheduled event
store outages.
Please make requests by at least 12 hours before
the outage starts so we have
time to handle them.
Collections of Specific Digis
The above method stores large numbers of contiguous events on disk.
This is only efficient if you want to look at a large fraction of those
events. For certain purposes, you want to take a comparatively sparse
collection of events and reprocess them from digis. We provide
"digi copies of collections" to make this more efficient. Once you
have a collection of events, even if quite sparse, we can load a copy
of just the digis from just these events onto disk, where they will
stay until you delete them.
Note that we are currently only copying the raw data (digis); you
can then run Bear on these collections as often as needed to reprocess
the data for your own use.
Note that we are currently only doing this for the
analboot2 federation.
Paul Raines has provided a web-based system for making and checking on requests.
Its available from
http://www.slac.stanford.edu/babar-internal/collreq/requests.html.
You need a BaBar account name and password to use it.
To check on a particular run or collection, enter the run range and press show.
The status codes are:
- REQUESTED - in the queue
- PENDING - actually being processed
- DONE - finished and thought to be OK
- FAILED - failed in processing, will be retried
- ON HOLD - failed twice, now pending some intervention
- CANCELLED - has been removed from the queue
To add a collection to the list to be copied, use the "Make request" button.
It will open a panel in which to enter the input and output collections, and
the run number of the start of the data. We use that run number to optimize the
order in which we process tapes.
Generally, the output digi collections should have "digis" in their collection
names.
For example, copy from "/groups/GroupName/collection" to "/groups/GroupName/digis/collection";
see also the web page above for examples.
There are also command-line tools for making large numbers of requests. If you'd like
to use these, please contact me directly.
Temporarily staging data to disk
The "collstagein" command can temporarily load data to disk. The "-help"
option will describe usage and options, additional information is also available in an initial
Hypernews
post. Please
search Hypernews
for more recent updates.
A typical command to stage the raw data for collection /users/me/myevents would be
collstagein -wait -include raw /users/me/myevents
Note that you need to have OO_FD_BOOT set (via e.g. the analboot2 command) and
have done a srtpath to 8.6.2d or more recent release.
Disk Usage
There is a
rough report
available that contains information
the disk space used by a particular user or group, but you have to dig for it. Look for your group name.
The "aio" files are the "all-in-one" data produced by the digi-copying process and similar.
You can also use the "lscoll" script to get summaries. For example
lscoll /groups/.../digis/...
will list all the groups that have copied digis using the standard names
(not all have, though), along with the total number of events and collections.
Page maintained by Bob Jacobsen, Bob_Jacobsen@lbl.gov
Last updated July 5, 2001
|