This page describes the procedures for importing data from SLAC to your local site, as well as a few hints for managing the files on your local disk. It assumes that you have installed a local bookkeeping database. It is also possible to use the SLAC bookkeeping database to manage the import (not documented here), but without any record of which collections are available locally, so this option is only suitable for small sites which make one-off transfers.
There are many ways of selecting which collections to import. Here we outline a fairly general procedure that should simplify a setup where you wish to import data as is becomes available at SLAC (eg. from skimming).
In summary, we select which datasets are required, and use that to select files. The selected files are imported to local disk, prepared for analysis use (eg. inserted into xrootd), and then the collection is marked as local in the bookkeeping.
ds_is_local flag.
This makes it visible to BbkDatasetTcl and will be used to select files to import.
BbkUser --site=MY_SITE --dbuser=bfactory --dbname=bbkr18 \
--ds_name=A0-Run6-OnPeak-R22d --set ds_is_local=1
Note that the base dataset name should be used, not a tag like A0-Run6-OnPeak-R22d-v07.
This command can be run whenever a new dataset is required, or
a more general selection can be used, and the command put in a cron job. Eg.
BbkUser --site=MY_SITE --dbuser=bfactory --dbname=bbkr18 \
--ds_is_local=0 --ds_type=PRskims,SPskims -i ds_skim=local_skims.lis \
ds_name --set ds_is_local=1
The --ds_is_local=0 selection ensures that already-selected datasets
are not changed unneccessarily. --ds_type=PRskims,SPskims selects
skim datasets (not PR or SP). -i ds_skim=local_skims.lis selects the skims
listed in the file local_skims.lis (one skim name per line). This command lists the names of the selected
datasets (to keep a record in the cron job logfile) as well as changing ds_is_local.
Datasets can be selected by ds_name, ds_type, ds_skim (or ds_stream),
ds_on_peak, ds_run_cycle, ds_skim_cycle, ds_modenum, or
ds_is_signal (run BbkUser --list-columns ds for a full list).
Most of these can be specified as a list (like the --ds_type selection above)
with wildcards (eg. --ds_name='A0-Run*-R22d', though the same selection
could be obtained with --ds_type=PRskims --ds_skim=A0 --ds_skim_cycle=R22d,
which is faster and probably more robust).
ds_is_local flags
to select the files needed for the required datasets.
Files are selected with the file_status flag.
BbkUser --site=MY_SITE --dbuser=bfactory --dbname=bbkr18 \
--file_status=2 --ds_is_local=1 --components=E --set file_status=-1
BbkUser --site=MY_SITE --dbuser=bfactory --dbname=bbkr18 \
--file_status=2 --ds_is_local=1 --components=^E --set file_status=1M \
dse_type skim dse_modenum components \
dse_id dse_lumi events file_id gbytes --summary
New files start with a file_status value of 2, so these commands
select on that. --ds_is_local=1 limits the changes to the
datasets that we require.
Most analysis users don't require the mini information, so we mark
files with only mini information (--components=E) with
--set file_status=-1.
NB. it is important not to leave the mini files with status 2 here,
otherwise the system won't later know to ignore these files when determining whether the collection is complete.
If the mini files are required (eg. for PR skimming (not SP), or Event Display detail),
then just leave out this line and remove the --components=^E on
the second command.
The remaining files (with header, tag bit, and micro information) in our datasets
are selected with the second command.
It uses --set file_status=1M. Files with statuses
beginning with 1 are flagged for import, with the second character
specifying the priority (A=top, Z=bottom).
This BbkUser command also prints a summary of the files selected for the cron job's log file.
file_status flag are:
file_status values | |
|---|---|
2 | Initial status - file disposition to be determined |
1 | to import |
1A | to import with top priority |
| ... | |
1Z | to import with lowest priority |
0 | on disk (or at least availably locally) |
-1 | don't import |
but you can also use other values for other purposes, eg.
Other suggested uses of file_status | |
|---|---|
3 | file imported - to be checked |
4 | file imported OK - to be swept into xrootd system |
0T | file is on tape |
Any file_status value starting with 0 indicates
that the file is local.