Very Large Databases at Stanford Linear Accelerator Center

Steven Meyer, PEP-II Project Database Group, Stanford Linear Accelerator Center

High Performance Storage System

BABAR Event Data

The BABAR event data will be stored on a High Performance Storage System of mirrored tapes and disk space to stage the tapes onto. Data will be available, transparently to the user, regardless of whether it is on disk or tape, with a performance hit only to load of the dataset from tape.

Objective Oriented Database - Why OODB and not RDB?

Justification for an Object Oriented Database

In order for a database product to be useful for the storage of event data from a high energy physics experiment, it must demonstrate the ability to support the rich and complex data structures and provide performance advantages relative to conventional file-based techniques. Object oriented databases support a direct programming language interface that provides a natural method for expressing the complex hierarchical event data structures. This is not the case for relational databases where the objects that comprise the event structure must be mapped into tables and queries involve use of specialized query languages such as SQL.

Several performance studies have been made within the high energy physics community comparing object oriented databases to relational databases for the storage of experimental data. In all cases (e.g. PASS project at SSCL, RD45 project at CERN), the object oriented database has significantly outperformed the relational database. Much of the data access is navigational using direct inter-object relationships and this is poorly matched to the capabilities of a relational database.

The PASS project from the SSCL performed a comparison of the performance of the following three techniques and vendors for a sample of data from the CDF experiment at FNAL (PASS Note 93-1).

1. Conventional sequential files using a FORTRAN data access package

2. Object oriented database (name withheld)

3. Relational database (name withheld)

The results were the following:

Technique Benchmark 1
(seconds)
Benchmark 2
(seconds)
Sequential file 734 840
OO database 12 435
Relational database 640 1964

These benchmarks attempted to replicate typical physicist queries. In all cases the object oriented database out-performed the other techniques, resulting in shorter times.

Why Objectivity/DB

Within the OO database arena, Objectivity/DB offers several unique features relative to its competitors. These include:

In 1996 and 1997 the BABAR experiment performed a competitive evaluation of the ObjectStore and Objectivity object oriented databases. A summary of the performance results from that evaluation is:

Quantity Objectivity ObjectStore
Database open/close overhead
(seconds)
0.89 1.17
Transaction begin/commit overhead
(seconds)
0.28 0.87
Write performance (with index)
(% of Unix)
28 21
Search time scaling
(ln(N))
~8 ~80
Fixed overhead per database
(kbytes)
~160 ~250
Overhead per object
(bytes)
14-80 8-200

In almost all cases Objectivity/DB exhibited a small but measurable advantage relative to ObjectStore.

Outstanding Issues

Several important outstanding issues remain unresolved. Among these are:

Restoring from backup

When restoring from a backup, the entire Objectivity/DB Federated Database must be restored together. However, this will most likely not be a major issue for the BABAR event data, due to the fact that the data will be stored on mirrored HPSS tapes. The disk data itself will not be mirrored, other than being a copy from the tapes.

Locking Database Objects

Currently in Objectivity/DB, locking occurs at the container level. This causes a lock of more than the one record that the user wishes to access and may lock out other user that would not otherwise be locked out if the locking occurred at the "row" level. Objectivity is currently working on changing to "row" level locking.

Journal Files

Objectivity/DB uses journal files while writing data to the database. Under certain circumstances, if one were to lose a journal file, the whole database may be lost. Objectivity is working on reducing the database's vulnerability to damaged journal files.

SQL++ Availability

SQL++, the Objectivity/DB version of SQL*Plus, is currently available on a limited number of platforms. In particular, there is no AIX version. Objectivity is working on making SQL++ available on additional platforms.

Acknowledgments

The author wishes to thank David R. Quarrie of Lawrence Berkeley National Lab for providing much useful information about HPSS, Objectivity/DB and the needs of the BABAR detector project for event data storage.


Author: Steven Meyer