analysis fds | outage schedule | locked fds| data mgmt | server assignments

Search | Site Map .

Event Store Data Management

Normally, only micro and tag level data is kept on disk permanently. Raw, rec, esd (mini) data is kept on tape and must be staged in before the job is run.

There are three ways of getting staged data on disk. They offer different approaches, based on your needs.

Temporarily staging data to disk

The collstagein command can temporarily load data to disk. This is a solution when you only want to run a job on a specific collection once.

The -help option will describe usage and options. Additional information is available in Selected Application Section (see collstagein).

A typical command to stage the raw data for collection /users/me/myevents is

  collstagein -wait -include raw /users/me/myevents

When data is staged, e-mail will be sent to you (e-mail will be sent for each host the data is staged in).

To use this command in a script that you submit to a batch queue, there is an option -wait, which blocks the exist until all the files appear on disk. This option doesn't time out the script in case if the data can't be staged.

The data you staged using the collstagein command becomes eligible for purging off disk 12 hours after the last access to it. Actual purging time depends on overall disk usage and other staging activity. The rule is: the oldest eligible for purging is purged first when there is a need to make room for new data.

Therefore one should not assume that the data is on disk the day after the collstagein command was run. The best way of ensuring that the data is on disk at job run time is to use collstagein with the -wait option in a script for batch submission.

Long Term Kept Data

You can use KeptData service when you need a collection to be available on disk for a relatively long time (weeks or months), or when you need to access data several times during shorter period (few days). To use KeptData service, for either submitting a request or checking what's already on disk, go to the following URLs (command line version is not available yet):

For physboot1 see:

  http://www.slac.stanford.edu/babar-internal/keptdata/ana3/requests.html.

For physboot2 see:

  http://www.slac.stanford.edu/babar-internal/keptdata/ana3/requests.html.

For analboot2 see:

  http://www.slac.stanford.edu/babar-internal/keptdata/ana2/requests.html.

For sp3analboot see:

  http://www.slac.stanford.edu/babar-internal/keptdata/sp3/requests.html

For simuboot (slac data only) see:

  http://www.slac.stanford.edu/babar-internal/keptdata/sp4/requests.html

Data from newer physics federations and MC data produced by remote sites is not covered by this server. It is being worked on.

If you discover that one of the collections listed on KeptData pages does not have all the listed data on disk, that's an error. Please let us know.

Collections of Specific Digis

The above method stores large numbers of contiguous events on disk. This is only efficient if you want to look at a large fraction of those events. For certain purposes, you want to take a comparatively sparse collection of events and reprocess them from digis. We provide "digi copies of collections" to make this more efficient. Once you have a collection of events, even if quite sparse, we can load a copy of just the digis from just these events onto disk, where they will stay until you delete them. This is also useful when you want to export such skim collections to your institution. Note that we are currently only copying the raw data (digis); you can then run Bear on these collections as often as needed to reprocess the data for your own use.

Note that we are currently only doing this for the analboot2 federation.

Paul Raines has provided a web-based system for making and checking the requests. It's available from http://www.slac.stanford.edu/babar-internal/collreq/requests.html. You need a BaBar account name and password to use it.

To check a particular run or collection, enter the run range and press show. The status codes are:

To add a collection to the list to be copied, use the Make request button. It will open a panel in which to enter the input and output collections, and the run number of the start of the data. We use that run number to optimize the order in which we process tapes. Generally, the output digi collections should have digis in their collection names. For example, copy from /groups/GroupName/collection to /groups/GroupName/digis/collection. See also the information above for examples.

There are also command-line tools for making large numbers of requests. If you'd like to use these, please contact Bob Jacobsen directly.

 


BaBar Public Site | SLAC | News | Links | Who's Who | Contact Us

Page Owner: Jacek Becla
Last Update: June 13, 2002