![]() |
|
|
analysis fds | outage schedule | locked fds| data mgmt | server assignments |
|
Basic Information on How Objectivity/DB Works
- Introduction
- Accessing data stored in Objectivity/DB
- Example: reading objects
- Interactions with Objectivity/DB servers and what can go wrong
- Why do we need locking?
- Updating objects
- How locking works - interactions with other jobs
- What can go wrong due to locking?
- What to do after an application died?
- What next?
Introduction
For many people an object-oriented database system is a black box. Indeed, it usually is a complicated, distributed system with many hidden interactions to tens of servers. This section describes to an Objectivity novice how a client application is interacting with Objectivity servers in the context of BaBar. In particular it covers:
- How a simple job interacts with Objectivity Database system
- How simultaneous access to the same data is synchronized
- How jobs may interfere with each other
- What can go wrong during accessing Objectivity Database
Accessing data stored in Objectivity/DB
Data stored in Objectivity/DB can be accessed by applications written in an object-oriented language, like C++ or Java. Each such application is written using an Objectivity/DB programming interface. In BaBar, many such applications have already been built, therefore you don't need to develop them.
More than a half million lines of C++ code have been written on top of Objectivity/DB, to ensure that access to the database is as transparent and as easy as possible. Still, to use BaBar database you need a basic understanding of the system. Blindly running jobs that are using Objectivity database may:
- Conflict with other (possibly production) jobs
- Put an extra load on production servers and slow down the whole system
Objectivity/DB stores persistent objects in databases. A single database corresponds to a file and is divided into logical units, called containers. Containers serve a number of purposes. They:
- group basic objects; basic objects within a container are physically clustered together,
- server as the unit of locking
A group of databases using a common object model (schema) is organized into a unit called a federation.
Example: reading objects
Let's consider an example: a user runs a job that reads persistent objects from a federation "BaBar.BOOT". Objects are stored in a database "myDB", in a container "X" and the database is stored on a host DataServer1. At this point we will forget that there might be other users accessing the same data. Also, let's not go into details about how the federation was created.
We need an application that was written using an Objectivity/DB programming interface. Here is an example of how such an application might look. This example does not cover BaBar specific functions, such as checking whether a user is authorized to access the data, or whether the federation is locked. Reminder: in BaBar, there is no need for users to write applications accessing BaBar's persistent data. Doing that correctly requires detailed knowledge about where data is placed, how it is arranged, how the schema looks, and so on. There are many applications available already. See how do I choose which application I should run for details (this example is not part of official BaBar applications, therefore you will not find it there).
Before we run the application, we need to set an environment variable OO_FD_BOOT. Next is a discussion about what happens when we run the application.
Interactions with Objectivity/DB servers and what can go wrong
What really happens when we run the application?
1) Initialization
The application:
- Verifies whether the specified bootfile is valid and accessible. If not, an error is reported and the application terminates.
- Reads a bootfile.
- Starts a transaction.
- Opens a federation for reading. It may fail if a federated database file is not accessible (for example due to wrong file permissions). This operation involves:
- Checking with a lock server whether the federation is available for reading. It may fail if the federation is locked. See How locking works - interactions with other jobs.
- Putting a read lock on the federation.
- Reads a federated database file to load the schema. This involves communication with AMS on a catalog server. It may fail if the federated database file is not accessible.
In BaBar, in addition to the items above:
- Each time a transaction is started the application verifies that the federation is not locked. In case it is locked, it will pause until the federation is unlocked. A message is printed every minute. See Locking federation for more details.
- Each time the application enters a new domain, it verifies whether the user that started the application is authorized to perform the requested operation. If not, an error is reported and the application terminates.
2) Opening database
The application opens a database "myDB" for reading. This involves:
- Checking with a lock server to determine whether the database is locked for reading. It may fail if the database is locked. See How locking works - interactions with other jobs.
- Locking the database for reading. This involves communication with a lock server.
- Reading a federated database file to get a catalog, to find out where the database is located (which host, and the full path). This involves communication with AMS on a catalog server. An error is reported if a database with a specified name does not exist.
- Reading the database page table into memory (internal metadata maintained by Objectivity). This involves communication with AMS on a server where a database is physically located (DataServer1). It will fail if the database file is not available (for instance if the file is not on disk).
3) Opening container
The application opens container "X" for reading. It involves:
- Checking with a lock server whether the container is locked for reading. It may fail if the container is locked. See How locking works - interactions with other jobs.
- Locking the container for reading. This involves communication with a lock server.
- Checking whether a container with a specified name exists in the database "myDB". If not, an error is reported and the application terminates.
- Reading some container level internal metadata maintained by Objectivity. This involves communication with AMS on DataServer1.
4) Reading objects
Finally, objects are read from the opened container. Depending on a size and the number of objects, this might involve one or more transfers from DataServer1. Each transfer = page size = 16 KB.
This operation does not involve the lock server, unless the objects contain references to other persistent objects that need to be opened and are located in a different container.
5) Shutdown
Transaction is closed (committed) and the application terminates. All the locks acquired during that transaction are released. It involves communication with lock server.
All of the above
Any of the operations described above can fail when the application looses communication with any of the servers (AMS, lock server) while a transaction is open. The problem could include: somebody killed the lock server/AMS, restarted it, a server machine has been rebooted, network connection has been broken, and so on.
Why do we need locking?
Let's now focus on locks. During initialization, a federation is opened for reading and the application obtains a read lock on a federation. During database open a read lock on a database is obtained. Similarly, opening a container results in locking it for read. After the application finishes reading the object and a transaction is committed, all the kept locks are released.
You might ask why we need locks because they introduce extra overhead and make things more complicated. It is true in a world where there is only a single user accessing data at any given time, and in a world where nothing goes wrong.
We do not want to let multiple users update the same object or set of objects at the same time, as it is very likely to cause damage to data and leave it in an inconsistent state.
The following example explains how locking helps maintain data consistency during unexpected crashes. A user transfers $40 from account A (initial balance = $100) to account B (initially balance = $0). We have the following choices:
- Extract $40 from account A, then add $40 to account B
- Add $40 to account B, then extract $40 from account A
We cannot perform both operations atomically. If a system crashes (for example, there is a power outage) in between the two operations the accounts are left in an inconsistent state. In case 1) a client lost money; in case 2) a client ended up with $40 extra.
The solution is: make the whole transfer a single transaction, lock both accounts, and transfer the money only if both accounts have been locked successfully. Commit the transaction (and unlock the accounts) only if there are no errors, otherwise abort the transaction (roll back the changes) and leave both accounts in their initial state. That insures that the whole transfer is done atomically.
The role of transactions is to ensure integrity of the data. They ensure that:
- Either all operations of the transaction are reflected properly in the database, or none are (Atomicity)
- Execution of a transaction is in isolation (that is, with no other transaction executing concurrently) the consistency of the database is preserved (Consistency)
- Each transaction is unaware of other transactions, even though they are executed concurrently (for every pair of transactions, it appears that it started execution after the other transaction finished, or finished execution after the other transaction started) (Isolation)
- After a transaction completes successfully, the changes it has made to the database persist, even if the system crashes (Durability)
These properties are known as ACID properties and are discussed in many database related books and articles. See for instance www.odbmsfacts.com.
Updating objects
Let's now imagine that we need to update objects, not just read them. The first, obvious difference now is that an update lock needs to be obtained on a federation, a database, and a container. The difference in interactions with the servers include:
- There is more traffic to the lock server (obtaining an update lock is more complicated)
- The page containing the object that changed is written back to disk, usually this happens during commit, but if many objects are changed or created inside one transaction a client code might decide to flush data to disk earlier. This involves more traffic to our DataServer1 machine.
- A journal file is created on a journal server when a transaction is started, all the changes to the persistent data are logged in that file, and the file is removed at the end of transaction closure. This generates many short communications with the journal server, but insures that we will be able to recover (roll back to a consistent state) after every, even the most unexpected failure.
So far we did not consider whether other users are in the system. It is possible that objects we are trying to read are read or updated by another user, or vice versa, the object we are trying to update is in use. When we are updating objects, the chance of disrupting other jobs is much higher. So let's look closely at how Objectivity locking works.
How locking works - interactions with other jobs
In Objectivity, a granularity of locking is a container. This means a job cannot lock an object. The smallest object that can be locked is a container (no matter whether the container contains one or millions of objects).
Multiple jobs can read or update a single federation.
Multiple jobs can read or update a single database.
Multiple jobs can lock the same container for reading.
Multiple update locks on the same container are never allowed. There can be only one job updating a container at any time.
The interactions among jobs reading a container and a job updating a container depends on used policy: Objectivity/DB provides two types of concurrent access policies:
- Multiple-Readers-One-Writer (MROW)
- Non-Multiple-Readers-One-Writer (Non-MROW).
Within BaBar, in most places we use the Non-MROW model, and we use MROW only in very well defined places. MROW is more flexible, but it has very high costs (from our perspective) associated with using it. In most cases we do not allow an update of a container while other jobs are reading it.
In addition to read and update locks, Objectivity provides a third type of lock: the exclusive lock. If a resource is locked exclusive, neither a read nor update lock will be granted. In BaBar, exclusive is rarely used. The only occasion when it may occur is when running an administration tool on a database, for instance ootidy.
What can go wrong due to locking?
Let's review what can go wrong when we are reading/updating objects.
1) Initialization
There should be no problems during the initialization. You should always be able to start a transaction and open a federation for reading or writing.
2) Opening database
Similarly, there should be no lock-related problem. Multiple reads or updates are allowed. The only exception is when a database is exclusively locked.
3) Opening container
This is the place where a lock conflict is likely to occur.
If the application is opening a container for read, it will succeed only if no other user is updating this container. If someone does, the application will wait a short time, depending on your configuration, usually 60 sec. Failure to obtain a lock during that time results in an error. On success, a lock is obtained. From then until the end of the transaction no other user can update that container.
If the application is opening a container for update, it will succeed only if no other user is accessing this container (reading or updating). If someone does, the application will wait a short time, depending on your configuration, usually 60 sec. Failure to obtain a lock during that time results in an error. On success, a lock is obtained. From now till the end of the transaction no other user can either read or update that container.
Reading/updating object
Locking is not involved during reading/updating of objects. All locks have already been obtained.
Shutdown
During a transaction commitment altered and new data is written to disk. Sometimes the disk gets full and the transaction cannot be committed. Such transaction is automatically aborted and the application is terminated.
What to do after an application died?
In most cases, the reason of failure is described in the error message issued by the application. Depending on the problem, help from a database administrator might be needed (for instance, if a lock server is not running). See troubleshooting.
Should the application die while it is inside a transaction, it may leave a lock/locks. Objectivity always attempts to abort opened transactions (and release locks associated with these transactions) in exit handler; however, you cannot count on it. If you issue kill, or kill -6 an exit handler will be called. But if you issue kill -9, or exit from the debugger without stopping the application, the exit handler will not be called, and Objectivity will not have a chance to abort opened transaction. It is wise to always check after an application crashes whether it left any locks. See How to check if my job left any locks, and How to cleanup locks.
What next?
After reading this page, you should know how data stored in Objectivity/DB is accessed, and how a single job interacts with the servers and other jobs in the system. Please send us comments if you find that some information is missing or unclear, or if you have suggestions for other information. With this knowledge, you have the backgrounds to work with complex interactions between a single analysis job and BaBar production federations. See: what happens during typical analysis job (bdb point of view).
BaBar Public Site | SLAC | News | Links | Who's Who | Contact Us
Page Owner: Jacek Becla
Last Update: July 15, 2003