Accessing E-logbook Data With Perl and SQL
This page will give you a brief introduction into writing Perl scripts that
use SQL statements to query the data in the electronic logbook. This page
is not a tutorial on Perl and SQL itself. To learn Perl or SQL, there
are many books and online references. I list a few at the bottom of this page.
WARNING: Please subscribe to the
Oracle and other SQL databases in BaBar HyperNews forum if you
write any scripts in order to stay informed of changes to servers and
related Perl packages.
E-logbook Data
First one must understand the layout of the data in the electronic logbook
database to know how to query it. The figure below shows the major
Oracle tables containing the electronic logbook data. Each box is a
table with the names of the columns listed followed by their data types.
Columns with plus signs signify those columns that make up the primary
key of the table. In most cases the key must be unique.
Elog table diagram [pdf]
The arrows between tables represent how these tables are related (or joined)
and the multiplicity of the relation. For instance, the arrow from the
shift_summary table to the shift_run which connects the shiftid
column in each tables and goes from "1" to "n" signifies that for every
record in shift_summary with a given shiftid there are "0" to "n"
records in the shift_run table with the same shiftid. This
particular relation is due to the fact that during a particular shift
any number of runs may be taken.
The shiftid column is an import key (also known as a foreign key)
in several tables and identifies which shift each record belongs too. This
ID is a nine digit number (yes, I know it says ten in the diagram)
representing the date and time of the shift with the format YYYYMMDDS.
YYYY is the four digit year, MM is the two digit month,
DD is the two digit day, and S is a single digit
represent the shift time during the day. Specifically, S=0
is the owl shift, S=1 is the day shift, and S=2 is
the swing shift. So day shift on February 12, 2000 is 200002121.
Constraints
Constraints are used to centralize enforcement of business rules
defined in the design phase of a database system. Database constraints can
only indicate invalid data after an attempt to insert or update data.
In the electronic logbook a combination of database constraints plus
validation by the application (i.e. the elogbook scripts) are used to
ensure the integrity of the electronic logbook data.
Symbols legend for database attributes:
- + = attribute is a primary key for that table (cannot be null)
- > = attribute is a foreign key and refers to another attribute
- * = attribute is indexed
- ~ = attribute is unique (null allowed as a value)
A plus (+) next to an attribute indicates that the attribute is the primary
key for that table. If there are multiple attributes with plus signs in a
table then the combined attributes together make up the primary key. Attributes
that are designated as the primary key or part of a primary key can not have a
null value. Such keys are used to uniquely identify a row in a table.
A greater than sign ( > ) next to an attribute indicates the attribute is a
foreign key and the value is identical and must always be identical to the
primary key attribute it is referring to. This is called a referential
integrity constraint.
An asterisk (*) next to an attribute indicates the attribute has been
indexed. An index is normally used to enhance data retrieval.
A tilde ( ~ ) next to an attribute indicates the attribute is unique. Values
for these attributes must be unique. For example, the attribute 'run' in the
Shift_Run table is unique and will not accept non-unique values for 'run'.
More to come soon on describing the table data
Writing Perl Query Scripts
The following is an example of a Perl script to query the run database
for all cosmic runs with a given minimum number of events.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
#!/usr/local/bin/perl
use strict vars;
use lib "/afs/slac/g/babar/lib.shared/perl";
use BaBar::SQL;
my $dbh = BaBar::SQL->connect or die "$DBI::errstr\n";
my $nevents = shift or die "No events specified\n";
my $sth = $dbh->prepare(qq{
select run,nevents from bbrora.shift_run where
runtype = 'Cosmics' and nevents > $nevents
});
$sth->execute;
while(my($run,$nevts) = $sth->fetchrow_array) {
printf("%5d %7d\n",$run,$nevts);
}
$sth->finish;
$dbh->disconnect;
|
Lets dissect this script line by line.
| Line 1: |
Yes, this is a Perl script. Make sure to make it executable too. |
| Line 3: |
All variables must be declared with my. This is
not strictly necessary but good practice. |
| Line 4: |
Add BaBar-specific package directory to package search path. |
| Line 5: |
Load the BaBar SQL package for connecting to BaBar Oracle databases.
All query scripts will need to start with these first five lines. |
| Line 7: |
Get a connection handle to the database. |
| Line 9: |
Get minimum number of events to query for from command arguments. |
| Lines 11-14: |
Prepare the SQL statement for our query and get the statment
handle for the query. All e-log tables are in the bbrora account so
table names will need to be prefixed with that name. |
| Line 15: |
Tell the database to execute the query on the server. |
|---|
| Lines 17-19: |
Loop over the results storing the selected columns
in variables $run and $nevts. Print each record to standard out.
Notice that the SQL select statement fetches two columns so we
use two variables here. |
| Line 20: |
Tell the database we are done with this query. |
| Line 22: |
Disconnect from the database. |
Here is a slightly more complicated query to extract all IFR cosmic runs
and the number of events in each run. This example demonstrates doing
a join of two tables. We are depending on the fact that for
each run record in the shift_run table, there
can be a corresponding run record in shift_runnotes for each
subsystem (i.e. a "1" to "n" relation).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
#!/usr/local/bin/perl
use strict vars;
use lib "/afs/slac/g/babar/lib.shared/perl";
use BaBar::SQL;
my $dbh = BaBar::SQL->connect or die "$DBI::errstr\n";
my $sth = $dbh->prepare(qq{
select R.run,R.nevents from bbrora.shift_run R, bbrora.shift_runnotes N
where R.run = N.run and N.subsys = 'ifr'
and LOWER(N.note) LIKE 'ifr plateau%'
});
$sth->execute;
while(my($run,$nevts) = $sth->fetchrow_array) {
printf("%5d %7d\n",$run,$nevts);
}
$sth->finish;
$dbh->disconnect;
|
The above select statement shows two new concepts: table joins and
the LIKE operator. The first step for a join is to
place a comma separated list of two or more tables after the
from keyword. Notice also we assign table aliases to each
table. This is necessary to differentiate the columns between the tables
in the other parts of the select statement. This is really critical when
both tables have columns with the same name which is often the case for
columns the tables are joined on. In this case, they are joined together
by the run column. This connection, R.run = N.run, is the
last critcal step of a join.
Line 12 of the above select statement also demonstrates the LIKE operator which
is used for string matching. There are two special characters possible
in the LIKE string. The % character matches any number of
characters like * in your unix shell. The _
character matches exactly one character liek the ?
character in your unix shell. Note that LIKE is case-sensitive so
I apply the LOWER function to the N.note column to obtain case insensitivity.
This example demonstrates a more complicated outer join.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
#!/usr/local/bin/perl
use strict vars;
use lib "/afs/slac/g/babar/lib.shared/perl";
use BaBar::SQL;
my %QASTATS = ( '0' => 'Data Unusable', '1' => 'Detector Studies',
'2' => 'Data Flawed', '3' => 'Data OK', '' => 'UNSET' );
my $subsys = shift or die "No subsystem given\n";
my $version = shift or die "No Elf version given\n";
my $dbh = BaBar::SQL->connect or die "$DBI::errstr\n";
my $sth = $dbh->prepare(qq{
select R.run,O.nevents,Q.status from
bbrora.shift_run R, bbrora.shift_oprrun O, bbrora.shift_runqa Q
where R.run = O.run and R.run = Q.run (+)
and LOWER(R.runtype) LIKE 'collid%' and O.status = 'done'
and Q.subsys (+) = ? and O.procspec LIKE ?
order by R.run
});
$sth->execute($subsys,"P${version}%");
while(my($run,$nevts,$stat) = $sth->fetchrow_array) {
printf("%5d %7d %s\n",$run,$nevts,$QASTATS{$stat});
}
$sth->finish;
$dbh->disconnect;
|
An outer join is different than a regular join in that it also
returns rows in the regularly joined tables that have no
direct match in the outer joined table. In the above example, we
want to get the data from the shift_run and shift_oprrun
tables even when no record exists in the shift_runqa table
for the given run and subsystem. To do this, all columns from the
"outer" table in the where clause of the statement must have the
(+) sequence placed after it. One can then read a clause like
R.run = Q.run (+) as meaning "Q.run = R.run or
Q.run is nonexistant".
Note that "nonexistant" is different than
NULL (a column in an existing record that has no assigned value).
One tests for this condition with a where clause like Q.subsys is NULL.
An implementation decision was made in the shift_runqa table
that no record would exist in the table for a run and subsystem that has
not been QA'ed. This saves storage space. If the implementation was to
create a record with an "unassigned" status for each subsytem in
the shift_runqa table when a run is created in shift_run,
then using an outer join in the example above would be unnecessary.
This example also demonstrates "binding". Note in the SQL statement
on line 20 that the ? character serves as a place holder
to by replaced by data in a later execute call on line 23. The advantage
of inserting user supplied data into the SQL statement this way rather
than embedding it directly is that Perl takes care of escaping any
special characters (like quotes) in the data.
Perl and SQL Resources
The best resources for Perl are the various books on Perl published
by O'Reilly. Online resources can be found at
www.perl.com. From your shell,
you have quick access to the Perl reference using the perldoc
command. Type "perldoc perldoc" for info on using this command.
A reference for Oracle's SQL can be found onsite at SLAC
here. Complete Oracle documentation is
here.
A generic SQL tutorial can be found
here. Note that some of the more esoteric SQL operations like outer joins
have a different syntax in Oracle than in generic SQL.
Maintained by Charlotte Hee and Douglas Smith. Send suggestions and additions
to
douglas@slac.stanford.edu.
Last modified 18 Apr 2003
|