We can start by getting a list of datasets
% BbkDatasetTcl BbkDatasetTcl: 1703 datasets found:- A0-Run1-OffPeak-R14 A0-Run1-OnPeak-R14 A0-Run2-OffPeak-R14 A0-Run2-OnPeak-R14 A0-Run3-OffPeak-R14 A0-Run3-OnPeak-R14 AlignCal-Run1-OnPeak-R14 AllEvents-Run1-OffPeak-R12 AllEvents-Run1-OnPeak-R12 AllEvents-Run1-R12 AllEvents-Run2-OffPeak-R12 AllEvents-Run2-OnPeak-R12 AllEvents-Run2-R12 AllEvents-Run3-OffPeak-R12 AllEvents-Run3-OnPeak-R12 AllEvents-Run3-R12 AllEvents-Run4-OffPeak-R14 AllEvents-Run4-OnPeak-R14 ...
To get a summary of one of these datasets
% BbkDatasetHistory -ds AllEvents-Run3-R12 Dataset AllEvents-Run3-R12 CREATED ADDED REMOVED TOT_NEV TOT_LUMI INPUT_DS MAINTAINER DESCRIPTION =================== ===== ======= ========= ========== ======== ========== ===================================================== 2004/03/18 14:09:15 3685 0 473551540 33444975.6 NA douglas AllEvent stream PR collections produced for run 3 2004/04/09 21:27:54 0 4 0 0.0 douglas Removing bad collections due to changes in dse status 2004/04/09 23:44:16 25 0 3766263 268.9 NA douglas AllEvents stream PR collection for Run3 2004/04/12 15:06:39 0 8 0 0.0 douglas Removing bad collections due to changes in dse status DBD::Proxy::db disconnect failed: Can't call method "Call" on an undefined value at (eval 10) line 5 during global destruction. ^-- this is a spurious message you can ignoreThis shows when collections were added or removed from the dataset.
Now we can create a tcl file
% BbkDatasetTcl -ds AllEvents-Run3-R12 BbkDatasetTcl: wrote AllEvents-Run3-R12.tcl (476389055 events) Selected 3698 collections, 476389055/476389055 events, 33368.8/pb
That command is roughly equivalent to
% skimData -t -G good_run3.txt -s AllEvents --tableprefix=objy
The file AllEvents-Run3-R12.tcl contains all the collections in the dataset. For large datasets like this, it is usually necessary to split it into smaller chunks, so each job finishes in a reasonable time.
% BbkDatasetTcl -ds AllEvents-Run3-R12 -t 200000000 BbkDatasetTcl: wrote AllEvents-Run3-R12-1.tcl (199838611 events) BbkDatasetTcl: wrote AllEvents-Run3-R12-2.tcl (199944398 events) BbkDatasetTcl: wrote AllEvents-Run3-R12-3.tcl (76606046 events) Selected 3698 collections, 476389055/476389055 events, 33368.8/pb
(you'd probably want to split into smaller chunks still, but you get the idea).
Many CM2 collections are much larger than collections of old (due to merging runs). This is more efficient, but can lead to nasty edge effects where splitting the job on collection boundaries makes for a too-long or too-short job. In that case, you can use the --splitruns option.
% BbkDatasetTcl -ds AllEvents-Run3-R12 -t 200000000 --splitruns BbkDatasetTcl: wrote AllEvents-Run3-R12-1.tcl (200000000 events) BbkDatasetTcl: wrote AllEvents-Run3-R12-2.tcl (200000000 events) BbkDatasetTcl: wrote AllEvents-Run3-R12-3.tcl (76389055 events) Selected 3698 collections, 476389055/476389055 events, 33368.8/pb
The tcl contains code to stop one job in the middle of the collection, and start the next from where it left off.
Another option, --basename, allows you to specify a different output file name (.tcl and the sequence numbers are still added).
When using a dataset that is still ongoing (run 4 at the moment), you need to take care not to reanalyse data you have already included (at best, this wastes CPU, at worst it artificially inflates your luminosity!). You can use the --marker option to record where we were up to when the tcl file was created (a file is created in ~/.bbk/BbkDatasetTcl/lastsession.sav). Next time you use BbkDatasetTcl, specify --newer (or --older) to include just the collections added to the dataset after (or before) that point.
You can also specify an explicit date with --start or --end, eg.
% BbkDatasetTcl -ds AllEventsSkim-Run4-OnPeak-R14 --start AllEventsSkim-Run4-OnPeak-R14-GreenCircle BbkDatasetTcl: wrote AllEventsSkim-Run4-OnPeak-R14.tcl (437640532 events) Selected 40 collections, 437640532/437640532 events, ~30492.2/pb wrote : AllEventsSkim-Run4-OnPeak-R14-bad-runs.txt (36 runs, ~170.4/pb) The events associated with these runs at the start time are now known to be bad.Please removed or block events with these run numbers to protect from possible double counting and use of bad data.
As you can see, this only includes the 40 collections added since the Green Cirlce dataset was defined (see the BbkDatasetHistory output above). You can also use a date as the argument to --start, for example using "04/05/28" would produce the same output as no collections were added to the dataset after Green Circle until that date.
You should also note that some runs have been removed from the dataset (since Green Circle). Due to this a warning has been produced and a file containing the list of runs removed.
All these commands access the SLAC database by default (this default will probably be changed to access the local database if you are at another site). You can access another site by specifying it on the command line
% BbkDatasetTcl --site=ral
When the bookkeeping tools are used for the first time, you may notice a short delay. A directory ~/.bbk is created containing the connection information for SLAC, Tier A, and some Tier C databases (this is copied from SLAC AFS). This is only updated when there is a problem connecting.
For more infomation and information on other subjects related to the bookkeeping, like database mirroring, see the Bookkeeping Documentation page.
BbkUser allows arbitrary queries with arbitrary selection of most of the information available in the core bookkeeping.
For example, the following command lists the input and output events, PR luminosity, and collection name for collections in the AllEvents-Run4-R14 dataset containing runs in the range 42400-42404.% BbkUser --dataset=AllEvents-Run4-R14 --run=42400-42404 \ events_in events pr_lumi collection EVENTS_IN EVENTS PR_LUMI COLLECTION 202182 84177 5322.3819 /store/PR/R14/AllEvents/0004/24/14.3.2/AllEvents_00042400_14.3.2V00 284197 117071 8022.5873 /store/PR/R14/AllEvents/0004/24/14.3.2/AllEvents_00042401_14.3.2V00 256763 106387 6852.1561 /store/PR/R14/AllEvents/0004/24/14.3.2/AllEvents_00042402_14.3.2V00 401839 166754 10675.5647 /store/PR/R14/AllEvents/0004/24/14.3.2/AllEvents_00042403_14.3.2V00 493289 204189 13537.9877 /store/PR/R14/AllEvents/0004/24/14.3.2/AllEvents_00042404_14.3.2V00 5 rows returned
See BbkUser -h for a full list of selection and query options (there are currently 50 of each!).
A few caveats on the use of BbkUser
BbkUser
provides summary query items such as tot_lumi and
nruns.
BbkUser is based on the SQL selection API.