SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo
HEPIC E,S & H Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews
Unwrap page!
Computing Search
Who's who?
Meetings
FAQ Homepage
Archive
Environment
Online SW
Offline
Workbook
Simulation
Reconstruction
Data Distribution
Beta
Beta Tools
Event display
Code releases
Databases:
Hot Items!
About Us
Meetings
General DB info
Conditions DB
Event Store
Online DB
Links
Check this page for HTML 4.01 Transitional compliance with the
W3C Validator
(More checks...)

Schema Management for the BABARDatabase

David Quarrie

BABAR Database Group
BABAR Computing

 

Version Information

Original: 12th September 1998

Draft: 19th October 1998

This document is still under development. If you have any questions or comments, please address them to David Quarrie (DRQuarrie@LBL.Gov).

Table of Contents


Introduction

The BABARDatabase uses an Object Oriented Database Management System (OODBMS) for the management of persistent objects. Each such object is an instance of a persistent-capable class. Each persistent-capable class is assigned a unique type number or schema id in order to facilitate run-time type checking. These type numbers are maintained within the schema catalog that is part of the federated database. It is also saved within the source code during the schema generation processing and hence linkable libraries. It is essential that the type numbers in the schema catalog and linkable library are identical. If they are not, a schema-mismatch occurs and application suffering from this will either fail catatrophically or generate unpredictable results.

As long as the schema catalog and libraries are generated at the same time, this correspondence of schema ids is guaranteed. However, there are occasions when the catalog and library are not generated simultaneously. An example of this is when two developers each have their own federations, both based on the same basic set of classes, but both add their own classes to their own federations, and then wish to exchange their new classes and data. These have to be merged in a compatible fashion. Another goal is that the schema catalog can be regenerated from scratch, guaranteeing that it will be identical each time. Furthermore, that this is true, even when more persistent classes are added to a common base set.

Finally, as the requirements and understanding change as a function of time, it is likely that changes will have to be made to existing classes (e.g. the addition of data members).

Return to Table of Contents.


Terminology

The following terminology will be used throughout this document:

  • Schema ID. This is the unique number that identifies which persistent-capable class every persistent object within a federation belongs to. It is a synonym for Type Number.
  • Type Number. This is a synonym for Schema ID.
  • Named Schema. An assigned schema id range is associated with a unique schema name.
  • Federation. A synonym for Federated Database.
  • Registered Developer. A registered developer is a member of BABAR who has signed the Objectivity license agreement and has been assigned a unique schema id range. This range is guaranteed not to overlap with other developers and packages and be fixed across multiple federations. An unregistered developer will be assigned a range that is guaranteed not to overlap with other registered developers and packages, but not to be fixed across multiple federations. Each developer is assigned a named schema, being "User_<user>" where <user> is the SLAC Unix account name (e.g. User_quarrie).
  • Registered Package. The BABAR software is based on the concept of packages. A registered package is guaranteed to be assigned a unique range of schema ids which is fixed for the duration of the experiment. An unregistered package will be assigned a schema id range corresponding to the developer that is guaranteed not to collide with the ranges for the registered packages. However, this range is not guaranteed not to collide with the ranges assigned to other unregistered package within other federations. The mechanism for creating a registered package is discussed in a later section of this document. Each registered package is assigned a named schema, being the package name.
  • Registered Class.  A registered class is a class within a registered package that has a uniquely specified schema id assigned to it. This id is guaranteed to be fixed across all federations and is within the schema id range associated with the registered package to which this class belongs. A registered package can contain both registered and unregistered classes. The mechanism for creating a registered class is discussed in a later section of this document. An unregistered class is assigned a schema id that is within the schema id range associated with the developer rather than the package to which the class belongs. By default this is the developer performing the DDL processing, but this can be modified by commands to the BABAR makefiles to refer to other developers.

Return to Table of Contents.


Requirements

The following set of operations must be performed when wishing to use the BABAR database. They are described in more detail in the following sections.

  • Stabilize schema ids such that building a new reference federation is guaranteed to have the same schema ids for registered classes as previous versions of the reference federation, even if new packages and/or classes have been added.
  • Allow a developer to send to another developer databases containing persistent objects based on unregistered classes and to merge such databases into a destination federation that might itself contain other unregistered classes, providing that both federations are based upon the same underlying reference federation.
  • Provide a managed mechanism to allow for persistent classes to be modified (evolved or versioned) in such a way that existing persistent objects can be accessed by both old and new applications, and new applications can in addition access the modified schema definitions.

Return to Table of Contents.


Federations

The following set of federations are involved in schema management:

  • Production Federation. This federation is that which contains the experiment data. It therefore contains both a schema catalog and a database catalog. One of the goals of the schema management is to ensure that this federation is robust that additions to schema do not affect the existing data, and that changes to existing schema do not affect other schema and data. There is only a single production federation in the experiment.
  • Reference Federation. This is equivalent to the Production Federation but only has a schema catalog. This catalog is identical to that of the production federation. The main role of this federation is to ensure that any problems are caught at this stage rather than in the production federation itself. All schema in this federation have been assigned explicit and unique schema ids. There is only a single reference federation in the experiment.
  • Release Federations. A release federation is created for each software release, and there is one such federation per release. Currently it is created for each release from scratch, but this will change in a production-based environment rather than the current development-based environment. In production, this federation will be based upon the Reference Federation as a starting point for the release build.
  • Developer or Test Federations. These are the federations against which developers create and test new classes in their test releases. Each such federation is based upon the release federation for the corresponding release. Multiple developer federations can exist simultaneously.

Return to Table of Contents.


Identifying the Developer

The following discussion is based upon a unique method of identifying the developer independent of which site they are developing at. The proposal is to use their SLAC Unix account name for this purpose. The question then arises of how they should identify themselves when running at a site other than SLAC. In the following I assume the use of an environment variable BABARUSERNAME. This should be set by their login procedure so that it is specified both at build time (i.e. when the SoftRelTools are being used to compile BABARsoftware) and at run-time (i.e. when BABARapplications are being executed).

This is certainly inadequate and should be reviewed.

Return to Table of Contents.


The BdbSchema Package

The BdbSchema package is the principal schema management tool that deals with registered developers, packages and classes. It contains the following primary DDL files:

BdbRegistry.ddl
<Pkg>TypeNums.ddl

Two other files, BdbDevelopers.ddl and BdbPackages.ddl, contain the initial list of registered developers and packages respectively. These files are write-protected and should not be modified. They are only for administrative convenience.

The role of these files is the following:

  • BdbRegistry.ddl
    This sets up one entry for each of the registered developers and packages. This entry creates a unique schema id range for each such developer or package, having a schema name that is the same as the developer or package name. New entries must be added at the end of the BdbRegistry.ddl file.
  • <Pkg>TypeNums.ddl
    There is one such file for each registered package (where <Pkg> is the name of the package) that has assigned schema ids or type numbers for the persistent clases declared that package. This file is created by the database administrator once the schema has been successfully registered and fixes the schema ids for the duration of the project. These files should not be modified by anyone apart from the Package Coordinator for this package.

Return to Table of Contents.


How to register a developer

This operation should only be performed by the package coordinator in charge of the BdbSchema package.

Checkout the BdbSchema package and add the following lines at the end of the BdbRegistry.ddl file:

#pragma ooschema User_<user> // <Lastname>, <Firstname>
class User_<user>Schema : public ooObj {};

Where <user> is the SLAC Unix account name for the new developer (e.g. user quarrie becomes named schema User_quarrie). It is recommended that their actual names be included in a comment as shown above.

Commit the change back to the BABAR code repository, and then tag & checklist it. The new developer will be registered at the next production release build or rebuild.

Ideally this operation should be performed as soon as the developer has signed the Objectivity license agreement form and has been entered into the corresponding database, and should therefore be automated.

Note that the User_<user>Schema class that is declared in this procedure is only for establishing the named schema, it should not be used for any other purpose.

Return to Table of Contents.


How to register a package

This operation should only be performed by the package coordinator in charge of the BdbSchema package.

Checkout the BdbSchema package and add the following lines at the end of the BdbRegistry.ddl file:

#pragma ooschema <Pkg> //
class <Pkg>Schema : public ooObj {};

Where <Pkg> is the package name for the new package (e.g. BdbNewPackage).

Commit the changes to the BdbSchema package back to the code repository and then tag & checklist it. The new package will be registered at the next production release build or rebuild.

Ideally this operation should be performed as soon as the package is created and therefore should be automated.

Note that the <Pkg>Schema class that is declared in this procedure is only for establishing the named schema, it should not be used for any other purpose.

Return to Table of Contents.


How to register a class

This operation should only be performed by the package coordinator in charge of the package that the class resides in.

Checkout the package to which the new class belongs and add the file <Pkg>Schema.hh if it doesn't already exist. The file BdbSchema/PkgSchema.hh file can be used as a template for this purpose.

Change the package name in the following line:

#pragma ooschema <Pkg>

Add the following lines for each class that is being registered to this file in the location identified for that purpose:

class <class>;

Where <class> is the name for the new class (e.g. BdbNewClass). Add new classes at the end of the existing list if any.

Commit the changes back to the BABAR code repository and then tag & checklist it. The new classes will be registered at the next production release build or rebuild.

Return to Table of Contents.


Rules for creation of persistent classes and their DDL files

One limitation of the DDL compiler is that it needs to be informed about the named schema a class resides in. This is the motivation behind the <Pkg>/<Pkg>Schema.hh files in each registered package. They specify the registered classes for each package and should be included in the DDL files as appropriate in order to reference a class in one package from one in another package.

Consider the situation when Class C1 in package P1 has a dependency on a class C2 in package P2. The C1.ddl file should have the following lines:

#include "P2/P2Schema.hh"
#include "P1/P1Schema.hh"

This should appear before any other reference to the C1 class.

Thus the coding rules for DDL files within a package are:

  • A <Pkg>Schema.hh file must be added to each package containing persistent classes (identified by the corresponding DDL files).
  • Every DDL file within the package should include the named schema declarations header files for the package itself, as well as any other package upon which the schema depend.

Templated Classes

Since there is no concept of forward declaration of a templated class (as opposed to a class template), templated classes must have their specific named schema declarared within the body of the DDL file rather than within the <Pkg>Schema.hh file. The syntax for this is:

#ifdef USE_NAMED_SCHEMA
#pragma ooschema <Pkg>
#endif

template class BdbRWVector<ooObj>;

#ifdef USE_NAMED_SCHEMA
#pragma ooschema
#endif

Or for inheritance:

#ifdef USE_NAMED_SCHEMA
#pragma ooschema <Pkg>
#endif

class Foo : public BdbRWVector<ooObj> {
[...]
};

#ifdef USE_NAMED_SCHEMA
#pragma ooschema
#endif

I wish I could find a better way of doing this.

Return to Table of Contents.


The procedure for introducing new schema into the Production Federation

The procedures and relationships between the various federated databases is as follows:

  1. For each test release, a Developer or Test Federation is created from the Release Federation for the production release upon which this test release is based. Actually, this is the so-called exported copy of the release federation, rather than the federation itself. This allows this creation to take place anywhere within the BABAR collaboration. This is exactly the same procedure as is currently used within BABAR. New classes can be created within such a test release, but they should remain unregistered. That is to say that
  2. For each production release build, a Release Federation is created from the Reference Federation and thus establishes the initial schema. As for stage 1 above, this actually uses the exported copy of the reference federation rather than the federation itself. The build then proceeds to conclusion and undergoes some QA to determine whether it is a valid build. If there is are rebuilds, these also use the Reference Federation to establish the initial schema. Following the appropriate QA, new schema are identified using the Objectivity ooshow utility and comparing the schema listings produced for the current release and the Reference Federation. Part of the QA process will be to ensure that no schema fall within the "*" or unnamed schema range. Every schema must be associated with a known package.
  3. Once new schema have been identified and another review process has determined whether they should be accepted into the production environment, then the corresponding schema ids are entered in to the appropriate BdbSchema/<Pkg>TypeNums.ddl file or files (creating them if necessary). The updated BdbSchema package is tagged & checklisted. A further schema generation is performed within a full test release (created using newrel -a -F) with this updated BdbSchema package to ensure that this transcription is correct (can we accelerate this process?). This check confirms the schema id assignments and guarantees that the schema ids within the libraries created in stage 2 above will correspond to the pinned schema ids within the production federation.[Should this test build be another rebuild, only proceeding as far as the schema pass?]
  4. The schema from this test build is extracted using the Objectivity ooschemadump utility.
  5. The new schema is introduced to the Reference Federation using the Objectivity ooschemaupgrade utility. We should think of necessary QA tests to be made at this point. If there is a problem, the reference federation is restored from it's exported copy.
  6. The Production Federation is stalled using the BABAR utility to make it temporarily unavailable. This causes all running applications to close any open transactions and stall until the federation is marked as being available again. Note that waiting for all transactions to terminate might take several minutes.
  7. A backup of the Production Federation catalog (.FDB) file is taken using a simple Unix copy script.
  8. The new schema is introduced to the Production Federation using the Objectivity ooschemaupgrade utility. We should think of necessary QA checks to be made at this point. If there is a problem, we can revert to the backup that was taken at stage 7.
  9. The Productioin Federation is made available again. This will trigger running applications.

Note that in this procedure that all upgrade activity is made to the Reference federation and tests can be made before the upgrades are applied to the production federation itself. I believe this is a necessary safety requirement. Big question - is it sufficient????

Note also that I expect this procedure to be time-consuming because of the multiple stages and the necessary QA that must take place. We should think of scheduling this on a regular, but infrequent (1-2 weeks?) basis.

Return to Table of Contents.


Development Scenario

A possible development scenario is:

  • Developer D1 creates a new class C1 in package P1. This is developed and debugged in a test release of D1, and then the changes are committed to CVS and the package is tagged. Note that at this stage the class is associated with a schema id in the range assigned to the developer.
  • In order to share data with developer D2, developer D1 determines the schema id for C1 using the ooshow command. Developer D2 checks out the new version of package P1 builds the library using the DEVELOPER_NAME flag:

    gmake P1.schema DEVELOPER_NAME="D1:C1:123456"

    Where "D1" is the registered name of the developer, C1 is the class name and 123456 is the schema id that was assigned to it.

  • The package P1 is checklisted and accepted for a release build. During the build, the new class C1 will be associated with a schema id within the range assigned to package P1. Once the release is successful (e.g. passes some acceptance criterion such as reaching test status), the developer D1 asks the database manager to register the new class C1 in the package P1. The database manager does this by using ooshow to determine the schame id associated with C1, and modifies the BdbSchema/P1TypeNums.ddl file (creating it if necessary) to include the new class and assigned schema id:

    #pragma ooassign C1 123456 ; /* (0x00100010) */

    The database manager then commits this change back to CVS, tags the BdbSchema package, and checklists it.

Return to Table of Contents.


Importing an unregistered package or class from another developer

Consider the following scenario:

Registered developer D1 has created an unregistered class C1 in registered package P1, based on the production release R1. They have created a test federation based on this extended schema and have created some persistent objects of type C1. Meanwhile registered developer D2 has created an unregistered class C2 in registered package P2, based on the production release R2. They have also created a test federation based on their extended schema and have created some persistent objects of type C2. Developer D1 now wants to ship D2 their data and allow D2 to merge it in with their federation.

The sequence of operations and underlying mechanisms are the following:

  • Since developer D1 created an unregistered class C1, that class is assigned a schema id in the D1 named schema rather than in the P1 named schema. This is done automatically by the SRT makefiles. Developer D1 uses the ooshow command to determine the schema id of class C1

ooshow -a -d $OO_FD_BOOT | grep C1
%assign C1 %% 123456 ; /* (0x00100010) */

  • Developer D1 adds the following line to the end of their C1.ddl file and commits it to CVS:

#pragma ooassign C1 123456

  • Developer D2 adds the modified package to their test release, identifying the source of the new class by the following construct:

gmake P1.schema DEVELOPER_NAME=D1

  • Developer D2 can now attach database files from D1.

Notes:

  • It doesn't matter whether packages P1 & P2 are identical in the above scenario. [need to check this - might need to modify the makefile flag to something like the following]

gmake P1.schema DEVELOPER_NAME="D1:C1:123456 D1:Cn:nnnnn"

  • Need to think about implications of whether the two releases must be identical.

Return to Table of Contents.


Modifying Schema

This section not ready yet.

Return to Table of Contents.


Package and Developer Name Conflicts

Two different types of conflicts are possible:

  1. A package name is not a valid Objectivity Schema name.
  2. A package name conflicts with a developer name.

Since developer names are Unix usernames they are compatible with Objectivity schema names. [Is this true - need to confirm it.] Unfortunately that's not true of package names and in fact several of the existing packages do not have valid Objectivity schema names. These are currently packages that end in "++" (e.g. Cornelius++). I would propose that the "++" characters are automatically converted to "pp" characters for the purposes of naming the schema. Thus "Cornelius++" becomes "Corneliuspp". I also propose that we adopt a policy of disallowing further packages to be created that do not conform to this restriction.

The situation where a package name conflicts with a developer name has been addressed by prefixing the developer schema names with "User_" which violates the BABAR package naming guidelines.

Return to Table of Contents.


Unresolved Issues

I don't yet know the following:

  1. When exactly should new persistent classes be registered?  i.e. associated with the package named schema in the <Pkg>Schema.hh file? If it's done in the test development already before CVS commit'ing the pacakge, then two developers who are both working on the same package cannot exchange private data based on that package since in their test releases they will not know about the class or classes created by the other developer. If we wait until after the package has been tagged & checklisted, there's another round of editing required in order to add the classes to the <Pkg>Schema.hh file. However, since in this case the new classes are unregistered, they will be associated with the developer's own named schema rather than the package named schema and so can be exchanged between two developers, even if those developers are working on the same package (I think...).
  2. How to automate the registering of packages & developers?
  3. What QA should we do at each stage?
  4. I wish I could think of a better way of dealing with templated classes. For non-templated classes, all forward declarations are in the <Pkg>Schema.hh files and there is nothing else visible in the DDL files. This is much cleaner than the situation for templated classes, where the #pragma ooschema statements are embedded in the DDL files themselves.
  5. Do we really need the Reference Federation? A possible alternative would be to use the output of ooschemadump as the equivalent (called Reference Schema for the following). Then a release build would start with an empty release federation which is ooschemaupgrade'd with the Reference Schema. Once that release has been validated, then a new Reference Schema would be created using ooschemadump from it, this being used to upgrade the Production Federation.

Return to Table of Contents.

DB Home | BABAR Home | Computing | Online | Reconstruction | Simulation | Search

e-mail DRQuarrie@LBL.Gov