SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo
HEPIC E,S & H Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews
Unwrap page!
Comp. Search
Who's who?
Meetings
FAQ Homepage
Archive
Environment
Administration
New User Info.
Web Info/Tools
Monitoring
Training
Tools & Utils
Programming
C++ Standard
SRT, AFS, CVS
QA and QC
Remedy
Histogramming
Operations
PromptReco
Simulation Production
Online SW
Dataflow
Detector Control
Evt Processing
Run Control
Calibration
Databases
Offline
Workbook
Coding Standards
Simulation
Reconstruction
Prompt Reco.
BaBar Grid
Data Distribution
Beta & BetaTools
Kanga & Root
Analysis Tools
RooFit Toolkit
Data Management
Data Quality
Event display
Event Browser
Code releases
Databases
Check this page for HTML 4.01 Transitional compliance with the
W3C Validator
(More checks...)

Accessing E-logbook Data With Perl and SQL

This page will give you a brief introduction into writing Perl scripts that use SQL statements to query the data in the electronic logbook. This page is not a tutorial on Perl and SQL itself. To learn Perl or SQL, there are many books and online references. I list a few at the bottom of this page.

WARNING: Please subscribe to the Oracle and other SQL databases in BaBar HyperNews forum if you write any scripts in order to stay informed of changes to servers and related Perl packages.


E-logbook Data

First one must understand the layout of the data in the electronic logbook database to know how to query it. The figure below shows the major Oracle tables containing the electronic logbook data. Each box is a table with the names of the columns listed followed by their data types. Columns with plus signs signify those columns that make up the primary key of the table. In most cases the key must be unique.

Elog table diagram [pdf]

The arrows between tables represent how these tables are related (or joined) and the multiplicity of the relation. For instance, the arrow from the shift_summary table to the shift_run which connects the shiftid column in each tables and goes from "1" to "n" signifies that for every record in shift_summary with a given shiftid there are "0" to "n" records in the shift_run table with the same shiftid. This particular relation is due to the fact that during a particular shift any number of runs may be taken.

The shiftid column is an import key (also known as a foreign key) in several tables and identifies which shift each record belongs too. This ID is a nine digit number (yes, I know it says ten in the diagram) representing the date and time of the shift with the format YYYYMMDDS. YYYY is the four digit year, MM is the two digit month, DD is the two digit day, and S is a single digit represent the shift time during the day. Specifically, S=0 is the owl shift, S=1 is the day shift, and S=2 is the swing shift. So day shift on February 12, 2000 is 200002121.

Constraints

Constraints are used to centralize enforcement of business rules defined in the design phase of a database system. Database constraints can only indicate invalid data after an attempt to insert or update data. In the electronic logbook a combination of database constraints plus validation by the application (i.e. the elogbook scripts) are used to ensure the integrity of the electronic logbook data.

Symbols legend for database attributes:

  • + = attribute is a primary key for that table (cannot be null)
  • > = attribute is a foreign key and refers to another attribute
  • * = attribute is indexed
  • ~ = attribute is unique (null allowed as a value)
A plus (+) next to an attribute indicates that the attribute is the primary key for that table. If there are multiple attributes with plus signs in a table then the combined attributes together make up the primary key. Attributes that are designated as the primary key or part of a primary key can not have a null value. Such keys are used to uniquely identify a row in a table.
A greater than sign ( > ) next to an attribute indicates the attribute is a foreign key and the value is identical and must always be identical to the primary key attribute it is referring to. This is called a referential integrity constraint.
An asterisk (*) next to an attribute indicates the attribute has been indexed. An index is normally used to enhance data retrieval.
A tilde ( ~ ) next to an attribute indicates the attribute is unique. Values for these attributes must be unique. For example, the attribute 'run' in the Shift_Run table is unique and will not accept non-unique values for 'run'.

More to come soon on describing the table data


Writing Perl Query Scripts

The following is an example of a Perl script to query the run database for all cosmic runs with a given minimum number of events.

 1 
 2 
 3 
 4 
 5 
 6 
 7 
 8 
 9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
#!/usr/local/bin/perl

use strict vars;
use lib "/afs/slac/g/babar/lib.shared/perl";
use BaBar::SQL;

my $dbh = BaBar::SQL->connect or die "$DBI::errstr\n";

my $nevents = shift or die "No events specified\n";

my $sth = $dbh->prepare(qq{
  select run,nevents from bbrora.shift_run where 
  runtype = 'Cosmics' and nevents > $nevents 
});
$sth->execute;
    
while(my($run,$nevts) = $sth->fetchrow_array) {
  printf("%5d  %7d\n",$run,$nevts);
}
$sth->finish;

$dbh->disconnect;

Lets dissect this script line by line.
Line 1: Yes, this is a Perl script. Make sure to make it executable too.
Line 3: All variables must be declared with my. This is not strictly necessary but good practice.
Line 4: Add BaBar-specific package directory to package search path.
Line 5: Load the BaBar SQL package for connecting to BaBar Oracle databases. All query scripts will need to start with these first five lines.
Line 7: Get a connection handle to the database.
Line 9: Get minimum number of events to query for from command arguments.
Lines 11-14: Prepare the SQL statement for our query and get the statment handle for the query. All e-log tables are in the bbrora account so table names will need to be prefixed with that name.
Line 15: Tell the database to execute the query on the server.
Lines 17-19: Loop over the results storing the selected columns in variables $run and $nevts. Print each record to standard out. Notice that the SQL select statement fetches two columns so we use two variables here.
Line 20: Tell the database we are done with this query.
Line 22: Disconnect from the database.

Here is a slightly more complicated query to extract all IFR cosmic runs and the number of events in each run. This example demonstrates doing a join of two tables. We are depending on the fact that for each run record in the shift_run table, there can be a corresponding run record in shift_runnotes for each subsystem (i.e. a "1" to "n" relation).

 1 
 2 
 3 
 4 
 5 
 6 
 7 
 8 
 9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
#!/usr/local/bin/perl

use strict vars;
use lib "/afs/slac/g/babar/lib.shared/perl";
use BaBar::SQL;

my $dbh = BaBar::SQL->connect or die "$DBI::errstr\n";

my $sth = $dbh->prepare(qq{
  select R.run,R.nevents from bbrora.shift_run R, bbrora.shift_runnotes N 
  where R.run = N.run and N.subsys = 'ifr' 
  and LOWER(N.note) LIKE 'ifr plateau%'
});
$sth->execute;
    
while(my($run,$nevts) = $sth->fetchrow_array) {
  printf("%5d  %7d\n",$run,$nevts);
}
$sth->finish;

$dbh->disconnect;

The above select statement shows two new concepts: table joins and the LIKE operator. The first step for a join is to place a comma separated list of two or more tables after the from keyword. Notice also we assign table aliases to each table. This is necessary to differentiate the columns between the tables in the other parts of the select statement. This is really critical when both tables have columns with the same name which is often the case for columns the tables are joined on. In this case, they are joined together by the run column. This connection, R.run = N.run, is the last critcal step of a join.

Line 12 of the above select statement also demonstrates the LIKE operator which is used for string matching. There are two special characters possible in the LIKE string. The % character matches any number of characters like * in your unix shell. The _ character matches exactly one character liek the ? character in your unix shell. Note that LIKE is case-sensitive so I apply the LOWER function to the N.note column to obtain case insensitivity.

This example demonstrates a more complicated outer join.

 1 
 2 
 3 
 4 
 5 
 6 
 7 
 8 
 9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
#!/usr/local/bin/perl

use strict vars;
use lib "/afs/slac/g/babar/lib.shared/perl";
use BaBar::SQL;

my %QASTATS = ( '0' => 'Data Unusable', '1' => 'Detector Studies',
    '2' => 'Data Flawed', '3' => 'Data OK', '' => 'UNSET' );

my $subsys = shift or die "No subsystem given\n";
my $version = shift or die "No Elf version given\n";

my $dbh = BaBar::SQL->connect or die "$DBI::errstr\n";

my $sth = $dbh->prepare(qq{
  select R.run,O.nevents,Q.status from 
  bbrora.shift_run R, bbrora.shift_oprrun O, bbrora.shift_runqa Q 
  where R.run = O.run and R.run = Q.run (+)
  and LOWER(R.runtype) LIKE 'collid%' and O.status = 'done'
  and Q.subsys (+) = ? and O.procspec LIKE ?
  order by R.run
});
$sth->execute($subsys,"P${version}%");
    
while(my($run,$nevts,$stat) = $sth->fetchrow_array) {
  printf("%5d  %7d  %s\n",$run,$nevts,$QASTATS{$stat});
}
$sth->finish;

$dbh->disconnect;

An outer join is different than a regular join in that it also returns rows in the regularly joined tables that have no direct match in the outer joined table. In the above example, we want to get the data from the shift_run and shift_oprrun tables even when no record exists in the shift_runqa table for the given run and subsystem. To do this, all columns from the "outer" table in the where clause of the statement must have the (+) sequence placed after it. One can then read a clause like R.run = Q.run (+) as meaning "Q.run = R.run or Q.run is nonexistant".

Note that "nonexistant" is different than NULL (a column in an existing record that has no assigned value). One tests for this condition with a where clause like Q.subsys is NULL. An implementation decision was made in the shift_runqa table that no record would exist in the table for a run and subsystem that has not been QA'ed. This saves storage space. If the implementation was to create a record with an "unassigned" status for each subsytem in the shift_runqa table when a run is created in shift_run, then using an outer join in the example above would be unnecessary.

This example also demonstrates "binding". Note in the SQL statement on line 20 that the ? character serves as a place holder to by replaced by data in a later execute call on line 23. The advantage of inserting user supplied data into the SQL statement this way rather than embedding it directly is that Perl takes care of escaping any special characters (like quotes) in the data.


Perl and SQL Resources

The best resources for Perl are the various books on Perl published by O'Reilly. Online resources can be found at www.perl.com. From your shell, you have quick access to the Perl reference using the perldoc command. Type "perldoc perldoc" for info on using this command.

A reference for Oracle's SQL can be found onsite at SLAC here. Complete Oracle documentation is here. A generic SQL tutorial can be found here. Note that some of the more esoteric SQL operations like outer joins have a different syntax in Oracle than in generic SQL.


Maintained by Charlotte Hee and Douglas Smith. Send suggestions and additions to
douglas@slac.stanford.edu.
Last modified 18 Apr 2003