[WARNING] This page is still under construction. The old documentation, despite being rather out of date, may still be of some help.

Importing BaBar Data to a Local Site

This page describes the procedures for importing data from SLAC to your local site, as well as a few hints for managing the files on your local disk. It assumes that you have installed a local bookkeeping database. It is also possible to use the SLAC bookkeeping database to manage the import (not documented here), but without any record of which collections are available locally, so this option is only suitable for small sites which make one-off transfers.

There are many ways of selecting which collections to import. Here we outline a fairly general procedure that should simplify a setup where you wish to import data as is becomes available at SLAC (eg. from skimming).

In summary, we select which datasets are required, and use that to select files. The selected files are imported to local disk, prepared for analysis use (eg. inserted into xrootd), and then the collection is marked as local in the bookkeeping.

Summary

Selecting Datasets

A dataset can be selected for local use by setting the ds_is_local flag. This makes it visible to BbkDatasetTcl and will be used to select files to import.
BbkUser --site=MY_SITE --dbuser=bfactory --dbname=bbkr18 \
        --ds_name=A0-Run6-OnPeak-R22d --set ds_is_local=1
Note that the base dataset name should be used, not a tag like A0-Run6-OnPeak-R22d-v07. This command can be run whenever a new dataset is required, or a more general selection can be used, and the command put in a cron job. Eg.
BbkUser --site=MY_SITE --dbuser=bfactory --dbname=bbkr18 \
        --ds_is_local=0 --ds_type=PRskims,SPskims -i ds_skim=local_skims.lis \
        ds_name --set ds_is_local=1
The --ds_is_local=0 selection ensures that already-selected datasets are not changed unneccessarily. --ds_type=PRskims,SPskims selects skim datasets (not PR or SP). -i ds_skim=local_skims.lis selects the skims listed in the file local_skims.lis (one skim name per line). This command lists the names of the selected datasets (to keep a record in the cron job logfile) as well as changing ds_is_local.

Datasets can be selected by ds_name, ds_type, ds_skim (or ds_stream), ds_on_peak, ds_run_cycle, ds_skim_cycle, ds_modenum, or ds_is_signal (run BbkUser --list-columns ds for a full list). Most of these can be specified as a list (like the --ds_type selection above) with wildcards (eg. --ds_name='A0-Run*-R22d', though the same selection could be obtained with --ds_type=PRskims --ds_skim=A0 --ds_skim_cycle=R22d, which is faster and probably more robust).

Selecting Files

Datasets consist of a list of collections, and each collection consists of one or more files. It is the files that we import. We can use the datasets' ds_is_local flags to select the files needed for the required datasets. Files are selected with the file_status flag.
BbkUser --site=MY_SITE --dbuser=bfactory --dbname=bbkr18 \
        --file_status=2 --ds_is_local=1 --components=E --set file_status=-1

BbkUser --site=MY_SITE --dbuser=bfactory --dbname=bbkr18 \
        --file_status=2 --ds_is_local=1 --components=^E --set file_status=1M \
        dse_type skim dse_modenum components \
        dse_id dse_lumi events file_id gbytes --summary
New files start with a file_status value of 2, so these commands select on that. --ds_is_local=1 limits the changes to the datasets that we require.

Most analysis users don't require the mini information, so we mark files with only mini information (--components=E) with --set file_status=-1. NB. it is important not to leave the mini files with status 2 here, otherwise the system won't later know to ignore these files when determining whether the collection is complete. If the mini files are required (eg. for PR skimming (not SP), or Event Display detail), then just leave out this line and remove the --components=^E on the second command.

The remaining files (with header, tag bit, and micro information) in our datasets are selected with the second command. It uses --set file_status=1M. Files with statuses beginning with 1 are flagged for import, with the second character specifying the priority (A=top, Z=bottom).

This BbkUser command also prints a summary of the files selected for the cron job's log file.

File Status Codes

The standard values for file_status flag are:

file_status values
2Initial status - file disposition to be determined
1to import
1Ato import with top priority
...
1Zto import with lowest priority
0on disk (or at least availably locally)
-1don't import

but you can also use other values for other purposes, eg.

Other suggested uses of file_status
3file imported - to be checked
4file imported OK - to be swept into xrootd system
0Tfile is on tape

Any file_status value starting with 0 indicates that the file is local.

Importing

Importing with Multiple Processes

Importing to Multiple Disks


Valid HTML 4.01 Transitional! Valid CSS! Best viewed with ANY browser! http://www.slac.stanford.edu/BFROOT/www/Computing/Offline/DataDist/bbk-import.html created 21st January 2008 by
Tim Adye, <T.J.Adye@rl.ac.uk>