SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo
HEPIC E,S & H Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews
Unwrap page!
Comp. Search
Who's who?
Meetings
FAQ Homepage
Archive
Environment
Administration
New User Info.
Web Info/Tools
Monitoring
Training
Tools & Utils
Programming
C++ Standard
SRT, AFS, CVS
QA and QC
Remedy
Histogramming
Operations
PromptReco
Simulation Production
Online SW
Dataflow
Detector Control
Evt Processing
Run Control
Calibration
Databases
Offline
Workbook
Coding Standards
Simulation
Reconstruction
Prompt Reco.
BaBar Grid
Data Distribution
Beta & BetaTools
Kanga & Root
Analysis Tools
RooFit Toolkit
Data Management
Data Quality
Event display
Event Browser
Code releases
Databases
Check this page for HTML 4.01 Transitional compliance with the
W3C Validator
(More checks...)
Index See also: The Bookkeeping SQL API (talk in the Bookkeeping / Data Distribution Parallel of the December 2004 Collaboration Meeting).

Relational Database Query Classes

The primary purpose of the SQL Selection API is to allow OO access to the bookkeeping without needing to know the table and column structure of the database. In order to maintain modularity of our code and encapsulation of our classes we would, for example, like to be able to find the luminosity without having to know whether that is stored in the dsentities, dstorun, or runs table.

This is complicated by the fact that efficient access to a relational database requires queries that return information from several tables at once (using a 'join').

There are two pieces to the Selection API. Firstly, every table column is given a unique logical name. The user does not need to know which table that column is in - it can even be generated from information in several places. Using these logical names, the user can specify selection conditions (which will be added to the WHERE clause) and return values (for the SELECT column list). An SQL SELECT query is generated, automatically selecting the required tables and join conditions.

When the results are returned, the information is accessible via automatically-generated OO accessor functions (rather than elements of an array or hash, which is what Perl's DBI gives you). The automatic accessors can be overridden, allowing class-local post-processing of the returned information.

These two pieces (the logical column naming with SQL statement generation, and OO accessor functions) can be used separately, but together provide the desired encapsulation.

A nice side-effect is that, once the table definitions have been created, it is often simpler and clearer to use the Selection API than a hardcoded SQL statement - even when that statement is just for the class's own table. On the other hand, the Selection API can be used alongside hardcoded SQL queries.

Example Usage in BbkUser

BbkUser provides a simple command-line interface to the Selection API. For example, to return all collection names, with their number of events, for 14-series releases (limiting the output to 4 collections to avoid too much printout here):-

% BbkUser --release='14.*' --limit=4 events collection

BbkUser shows the SQL command it generates, followed by the output - in this case:-

SELECT bbkr14.bbk_dsentities.output_nev AS events,
       bbkr14.bbk_dsentities.name AS collection
  FROM bbkr14.bbk_releases,
       bbkr14.bbk_dsentities
 WHERE bbkr14.bbk_releases.id=bbkr14.bbk_dsentities.rel_id
   AND bbkr14.bbk_releases.name LIKE '14.%'
   AND ROWNUM<=4;

EVENTS COLLECTION
177930 /store/PR/R14/AllEvents/0004/22/14.2.0b/AllEvents_00042282_14.2.0bV00
61910 /store/PR/R14/TriggerStream/0004/22/14.2.0b/TriggerStream_00042282_14.2.0bV00
88395 /store/PR/R14/AllEvents/0004/23/14.1.3b/AllEvents_00042399_14.1.3bV00
31153 /store/PR/R14/TriggerStream/0004/23/14.1.3b/TriggerStream_00042399_14.1.3bV00
4 rows returned

As you can see, BbkUser converted the logical column names collection, events, and release into their full SQL specifications, added the required tables bbk_releases and bbk_dsentities, and applied the join condition between the two of them.

Here is a more complex example that uses multiple selections and five tables:-

% BbkUser --dataset='AllEvents*' --release '>=12.4.0h' --lfn-file=sample.lis \
          dataset events bytes lfn

SELECT bbkr14.bbk_releases.precedence AS rel_precedence
  FROM bbkr14.bbk_releases
 WHERE bbkr14.bbk_releases.name='12.4.0h';
-> 120400080000

SELECT bbkr14.bbk_dataset.name AS dataset,
       bbkr14.bbk_dsentities.output_nev AS events,
       bbkr14.bbk_files.bytes,
       bbkr14.bbk_dsentities.name AS collection,
       bbkr14.bbk_files.suffix AS file_suffix
  FROM bbkr14.bbk_dataset,
       bbkr14.bbk_dsentities,
       bbkr14.bbk_releases,
       bbkr14.bbk_dstodse,
       bbkr14.bbk_files
 WHERE bbkr14.bbk_dataset.id=bbkr14.bbk_dstodse.ds_id
   AND bbkr14.bbk_dsentities.id=bbkr14.bbk_files.dse_id
   AND bbkr14.bbk_releases.id=bbkr14.bbk_dsentities.rel_id
   AND bbkr14.bbk_dstodse.dse_id=bbkr14.bbk_dsentities.id
   AND bbkr14.bbk_dataset.name LIKE 'AllEvents%'
   AND bbkr14.bbk_releases.precedence>=120400080000
   AND ((bbkr14.bbk_dsentities.name='/store/PR/R12/AllEvents/0003/49/12.4.0g/AllEvents_00034938_12.4.0gV01_C14.2.0bV00' AND
         bbkr14.bbk_files.suffix='.01') OR
        (bbkr14.bbk_dsentities.name='/store/PR/R12/AllEvents/0003/29/12.4.0j/AllEvents_00032955_12.4.0jV01_C14.2.0bV00' AND
         bbkr14.bbk_files.suffix='.02E'));

DATASET EVENTS BYTES LFN
AllEventsRun3 49839 182888177 /store/PR/R12/AllEvents/0003/29/12.4.0j/AllEvents_00032955_12.4.0jV01_C14.2.0bV00.02E.root
1 rows returned

This returns the dataset, number of events, file size, and LFN (logical file name) for all files listed in sample.lis (just two of them), selecting only those in release 12.4.0h or later, that are also in one of the AllEvents datasets. Some of the magic that this takes care of for you is: converting the release name into a precedence number, splitting the LFNs into collection names and file suffixes, combining them to output the full filename, and working out which tables are required for this information, and which additional ones are required to link those tables together (bbk_dstodse).

You can see the full list of selection options and display values with BbkUser -h.

Example Usage from Perl

Of course your script can choose to use only a few of those options (though even a general-purpose command such as BbkUser does nearly all it's work via the API). Here is an example of calling the API from another program (for the same selection as the first example above).

# Standard setup of the database connection and
# Selection API object.
my $config = BbkTools::BbkConfiguration->new;
$config->getConnection();
my $query = BbkTools::BbkSqlSelect->new($config);
BbkTools::BbkAllTables::addModules($query);

# Specify return values and selection.
$query->addValues('collection', 'events');
$query->addSelector('release', '14.*');

# Execute query and return results.
my $sta= $query->execute or exit 1;
while (my $row = $sta->fetch) {
  print $row->events, ' ', $row->collection, "\n";
}

The call to addModules just defines which tables to consider (in this case, all those defined in the BbkTools package). The returned values are accessed via the automatically-generated OO accessors.

Command-line utilities (like BbkUser) can obtain a full list of options, so the caller doesn't need to be modified when new selectors are added. In fact, the caller only needs to know the logical column names of the information it uses (collection and events in this case), as well as any hard-coded selectors (release='14.*' here). The tables, joins, and SQL selections are all provided automatically. (The one exception is that if tables from other packages are required, then the caller has to call the Package::BbkAllTables::addModules routine to include them.) To allow our example to accept all the selection options, we just replace the addValues and addSelector calls.

# Standard setup of the database connection and
# Selection API object.
my $config = BbkTools::BbkConfiguration->new;
my $query = BbkTools::BbkSqlSelect->new($config);
BbkTools::BbkAllTables::addModules($query);

# Parse options.
my $opt = $query->init;

# Specify return values and selection.
$query->createQuery(['collection', 'events'], $opt);

# Execute query and return results.
my $sta= $query->execute or exit 1;
while (my $row = $sta->fetch) {
  print $row->events, ' ', $row->collection, "\n";
}

Table Configuration

If you just plan to use the selection API, you can stop reading here. What follows is a brief summary of what needs to be specified to define additional logical column names. If the selection API is to be used with tables other than the core bookkeeping (eg. task management and production tables), then it will be necessary to specify their layout. In many cases this will be easier than writing your own SQL, and will allow for user-defined selections encompassing different packages.

The table characteristics are defined in classes Bbk*Table (eg. BbkDsEntityTable), which mostly inherit from BbkDefaultTable (which defines common columns like the row id), which inherits from BbkTableBase. The tables to consider can be added by hand, or defined all at once with BbkTools::BbkAllTables::addModules.

The table configuration mostly just defines the relevant logical column names and their associated table column names, specifying what type of selection is to be used: wildcard (eg. '/store/*.??E.root'), match, relation (eg. '>=12.4.0j<=12.5.2b'), or range (eg. '5-7,9,15-'). Wildcard selection is the default. Columns that require special handling can specify a custom Bbk*Table method. This should all be defined in a method called Bbk*Table::tableConfig (see perldoc BbkTools::BbkTableBase for a few details), eg. in BbkTools::BbkFileTable we have the following

sub tableConfig {
  return {
    alias      => 'file',
    columns    => [
      file_suffix => 'suffix',
      components  => 'components',
      bytes       => { column => 'bytes', selectorType => 'relation' },
      lfn         => { valueAction => 'addLfnValue', selectorAction => 'lfnSelector' },
      ...
    ],
    joins      => [
      dse_id      => 'file_dse_id',
    ],
}}

This specifies the methods addLfnValue or lfnSelector should be called to handle the lfn column. It also defines the possible joins to other tables: in this case to bbk_dsentities. In all cases (including joins), the table configuration class should use only logical column names to refer to other tables (eg. BbkFileTable::addLfnValue refers to the bbk_dsentities.name column by its logical name collection). This maintains strict encapsulation of each table's behaviour.

Future Improvements

  1. Implement more efficient algorithm to choose which tables are needed (current method uses a recursive search that can be very slow if many tables are involved). I need to learn some graph theory. Done (though I didn't learn much graph theory)!
  2. Allow aggregate queries, which allows counts, sums, etc over some values (eg. total luminosity for all runs within a merged collection, or in an entire dataset; number of files in a collection). I have a working implementation of this, but it needs tidying up. Done!
  3. Automatically prune deleted collections for a dataset. Currently, BbkUser selection by dataset returns all the collections that were ever in that dataset, even those that have been subsequently removed. Done!
  4. Reimplement standard tools (BbkDatasetTcl etc) using the Selection API. Done for BbkDatasetTcl!
  5. Add the other bookkeeping tables (from PR, SP, TM?).

Index Valid HTML 4.01! Best viewed with ANY browser! /BFROOT/www/Computing/Distributed/Bookkeeping/Documentation/BbkSqlSelect.html last last modified on 14th April 2004 by
Tim Adye, <T.J.Adye@rl.ac.uk>