choose: sw hd | installObjy | setup: ams ls pud chs oids | confCH | maintain: ams ls pud chs oids | moveFD | creaBri

Search | Site Map .

Maintaining OID server

Introduction
Starting, stopping and restarting the server
    Performing remote restart
Checking status of the server
Acquiring the server's statistics (counters)
Run-time monitoring of the server
Log files
Reading the journal file
Who to contact for problem resolution?

Introduction

This document provides the basics on OID server management assuming that the proper installation of the server has already been done as described in the corresponding installation guide.

Following is a summary of important information found here:

There are five scripts located at a root directory of each installation. These scripts have the names beginning with "oidserver_".

There is a log directory created beneath a root directory of each installation. It has one subdirectory per federation served through the installation, where the corresponding log files are recorded.

There is one (per installation) journal file (journal/activity.log) to record all management activities related to the installation.

IMPORTANT:

The proper version of the BaBar Database Authorization utility (BdbAuthCmd) has to be available through the binary path of a user who is starting, stopping, or restarting the server. This utility is used to check if the user has proper authorization to perform the server management. The main rule is that only those users System authorization privilege are allowed to start, stop, and restart the server.

Starting, stopping, and restarting the server

The server can only be started or stopped from a machine where the corresponding installation is performed. The only exception is the remote restart (see the next subsection for details).

Be careful not to stop the running server while is being used by clients. They (clients) can crash if this happens.

The following scripts do this:

Both scripts expect the nickname of the (or the server serving the corresponding) federation as their very first mandatory parameter:

oidserver_start <federation> ...
oidserver_stop <federation>

Normally the starting script will refuse to start the server if the script discovers another script already running under the same name. To avoid this limitation and to force a running server to stop before starting a fresh copy, the following optional switch can be specified:

oidserver_start <federation> [-restart]

One of the things done during the server's startup sequence is to compress (using the gzip command)  the previous log files (if any) left by this server. This may take a few minutes, depending on the size of files.

Performing remote restart

Once the server is running it can be restarted from a remote host (not the one where the server is running). Use the following command:

oidserver_restart <federation>

This command unlike the above described oidserver_start will not reopen the log file.

Checking status of the server

The best way to check whether the corresponding server is running is by using the command:

oidserver_status <federation>

The command will do three tests on the server, as illustrated in the following sample output:

.../oidserver_status repro3
<CHECKING IF THE FOLLOWING SERVER IS ALIVE:

FACILITY: Bdb/Conditions/OIDService
SERVICE:  \nfs\objyboot1\objy\databases\production\boot\physics\V1\rep\current\con003\BaBar.BOOT

1: Checking if the server is known to the CORBA Naming Server...
2: Checking if there is an instance of the server for the found name...
3: Getting in touch with the server to see if it's able to respond to requests...

The server is live and is able to respond to requests.

The SERVICE: name is in fact the BOOT file name of a federation associated with a short federation name repro3 in which normal slashes are replaced by the backslashes. This example show the normal response of the command if the server is running and is able to respond to requests. This command communicates with the server by sending simple requests to it and expecting proper replies. This instance of the server can only be restarted as it has been described in the previous section.

Another example shows what to expect in case there is no active server for a specified federation:

.../oidserver_status test
<CHECKING IF THE FOLLOWING SERVER IS ALIVE:

FACILITY: Bdb/Conditions/OIDService
SERVICE:  \nfs\objyserv2\objy\databases\bootfiles\conds3\BaBar.BOOT

1: Checking if the server is known to the CORBA Naming Server...
ERROR: failed to obtain an object reference through the Naming Service.
       This may also mean that the Name Server is either not running
       or is not available.

The server is not live.

This kind of response is the normal one if the server is not running. It can be started by running the oidserver_start command described in the previous section.

Acquiring the server's statistics (counters)

The next command will acquire and print server statistics. This includes various counters as well as the current (Objectivity) transaction status. 

Here is the command line syntax for the command:

oidserver_statistics <federation>

The sample output of this command is here:

.../oidserver_statistics test
TRANSACTION STATUS: <NOT STARTED>

HISTORT
  Startup Time:             11/01/01 18:46:52 (local time) 0 ns
  First Operation Executed: 11/01/32 19:56:43 (local time) 324240000 Ns
  Last  Operation Executed: 11/18/32 14:57:23 (local time) 521555000 Ns

COUNTERS

  COMMIT TRANSACTION REQUESTS
    Total:    239
    Executed: 323

  FIND INTERVAL OID AS FUNCTION OF TIME
    Total:                 1095882
    Cached (out of total): 1083769

  FIND FIRST INTERVAL OID
    Total:                 0
    Cached (out of total): 0

  FIND LAST INTERVAL OID
    Total:                 0
    Cached (out of total): 0

  FIND OBJECT OID
    Total:                 6671696
    Cached (out of total): 6563755

  MISC
    Get Server:  41525
    Reset Cache: 0
    Get Timeout: 0
    Set Timeout: 0
    Get Statistics: 2
    Load  Configuration: 0
    Merge Configuration: 0
    Get Verbose Mode: 1
    Set Verbose Mode: 0

  TOTALS
    Other Requests: 0
    Total Requests:                 7809641
    Failed (out of total) Requests: 0

Run-time monitoring of the server

The next command acquires and prints the server statistics. This includes various counters as well as the current (Objectivity) transaction status.  Here is the command line syntax for the command:

oidserver_monitor <federation>

If the corresponding OID server does not have an open transaction (none is talking to the server during its timeout) the command will keep producing the following sample output until it gets interrupted by Ctrl-C, or until a real transaction is started:

.../oidserver_monitor repro3
The monitor can be interrupted by pressing Control-C at any time.

11/18/32 15:17:14 (local time) 369489000 Ns TRANSACTION IS NOT ACTIVE. WAITING FOR THE START...
11/18/32 15:17:15 (local time) 377362000 Ns TRANSACTION IS NOT ACTIVE. WAITING FOR THE START...
11/18/32 15:17:16 (local time) 398946000 Ns TRANSACTION IS NOT ACTIVE. WAITING FOR THE START...
11/18/32 15:17:17 (local time) 407408000 Ns TRANSACTION IS NOT ACTIVE. WAITING FOR THE START...
11/18/32 15:17:18 (local time) 417650000 Ns TRANSACTION IS NOT ACTIVE. WAITING FOR THE START...
11/18/32 15:17:19 (local time) 427411000 Ns TRANSACTION IS NOT ACTIVE. WAITING FOR THE START...
11/18/32 15:17:20 (local time) 437464000 Ns TRANSACTION IS NOT ACTIVE. WAITING FOR THE START...
11/18/32 15:17:21 (local time) 447505000 Ns TRANSACTION IS NOT ACTIVE. WAITING FOR THE START...

However, if the server's transaction is active the output is different:

.../oidserver_monitor repro3
The monitor can be interrupted by pressing Control-C at any time.

11/18/32 15:21:59 (local time) 184251000 Ns ACTIVE TRANSACTION HAS BEEN DETECTED.
findInterval:  0 [Hz],  0 [%],  delta = 0/0,  total = 1095883/1083769
firstInterval: 0        0               0 0,          0 0
lastInterval:  0        0               0 0,          0 0
findObject:    0        0               0 0,          6671696 6563755
findInterval:  0 [Hz],  0 [%],  delta = 0/0,  total = 1095883/1083769
firstInterval: 0        0               0 0,          0 0
lastInterval:  0        0               0 0,          0 0
findObject:    0        0               0 0,          6671696 6563755
findInterval:  0 [Hz],  0 [%],  delta = 0/0,  total = 1095883/1083769
firstInterval: 0        0               0 0,          0 0
lastInterval:  0        0               0 0,          0 0
findObject:    0        0               0 0,          6671696 6563755

Log files

Each served federation has its own area (subdirectory) where server log files are recorded:

log/<federation>/

These files are in human readable format. Most of them, except the one created by the most recent start operation, are compressed. The files have special formats for their names, including date and creation time. Here is a sample log files directory:

ls -al log/repro3
total 4478336
drwxrwxrwx   2 objysrv  ec          8192 Nov  1 18:46 .
drwxrwxr-x   5 objysrv  EC            96 Jul  5 13:10 ..
-rw-rw-rw-   1 objysrv  EC        613194 Jun  4 15:45 2001_June_04_13:44.18.log.gz
-rw-rw-rw-   1 objysrv  ec       15841546 Jun  7 10:37 2001_June_04_15:49.46.log.gz
-rw-rw-rw-   1 objysrv  ec       15808242 Jun 10 16:57 2001_June_07_12:20.23.log.gz
-rw-rw-rw-   1 objysrv  ec       16101747 Jun 15 14:41 2001_June_10_17:06.46.log.gz
-rw-rw-rw-   1 objysrv  ec           602 May 31 17:04 2001_May_31_16:55.42.log.gz
-rw-rw-rw-   1 objysrv  ec           824 May 31 17:06 2001_May_31_17:05.20.log.gz
-rw-rw-rw-   1 objysrv  ec          1189 May 31 19:39 2001_May_31_17:06.28.log.gz
-rw-rw-rw-   1 objysrv  ec           837 May 31 21:33 2001_May_31_19:55.11.log.gz
-rw-rw-rw-   1 objysrv  ec       15184174 Jun  4 13:39 2001_May_31_21:33.49.log.gz
-rw-rw-rw-   1 objysrv  ec       2144050307 Nov 18 14:57 2001_November_01_18:46.04.log
-rw-rw-rw-   1 objysrv  ec           840 Oct 12 08:41 2001_October_11_19:07.11.log.gz
-rw-rw-rw-   1 objysrv  ec       16491809 Oct 16 15:52 2001_October_12_08:41.22.log.gz
-rw-rw-rw-   1 objysrv  ec       5118656 Oct 17 19:57 2001_October_16_15:54.55.log.gz
-rw-rw-rw-   1 objysrv  ec       3393732 Oct 18 13:04 2001_October_17_19:57.26.log.gz
-rw-rw-rw-   1 objysrv  ec           781 Oct 19 00:11 2001_October_19_00:09.42.log.gz
-rw-rw-rw-   1 objysrv  ec       30785775 Oct 25 07:51 2001_October_19_00:13.49.log.gz
-rw-rw-rw-   1 objysrv  ec       11022552 Oct 29 15:41 2001_October_26_15:56.58.log.gz
-rw-rw-rw-   1 objysrv  ec           836 Oct 29 15:55 2001_October_29_15:42.22.log.gz
-rw-rw-rw-   1 objysrv  ec       13455139 Oct 31 15:48 2001_October_29_15:54.59.log.gz
-rw-rw-rw-   1 objysrv  ec       4918793 Nov  1 18:45 2001_October_31_16:46.40.log.gz

Remember that the files are compressed by the server startup procedure only. You can compress/decompress them at any time. The startup procedure will compress any non-compressed log files the next time it runs.

Here is a sample contents of the above highlighted file:

more log/repro3/2001_November_01_18:46.04.log
Thu Nov 1 18:46:51 PST 2001 : Starting the OID Server...

Initializing Objectivity use BdbApplication class
18:46:52.830 [1] OID_LOADER: Object constructed.
18:46:52.831 [1] BdbCondSimpleTimer::BdbCondSimpleTimer() The timer object has been constructed.
18:46:52.831 [1] BdbCondSimpleTimer::setTimeout() Set new timeout: 600
18:46:52.831 [1] OID_SERVANT_OBJECT(0): Start serving federation: /nfs/objyboot1/objy/databases/product
ion/boot/physics/V1/rep/current/con003/BaBar.BOOT
18:46:52.870 [1] OID_SERVANT_OBJECT(1): Start serving federation: /nfs/objyboot1/objy/databases/product
ion/boot/physics/V1/rep/current/con003/BaBar.BOOT
18:46:52.890 [1] OID_SERVANT_OBJECT(2): Start serving federation: /nfs/objyboot1/objy/databases/product
ion/boot/physics/V1/rep/current/con003/BaBar.BOOT
18:46:52.911 [1] OID_SERVANT_OBJECT(3): Start serving federation: /nfs/objyboot1/objy/databases/product
ion/boot/physics/V1/rep/current/con003/BaBar.BOOT
BdbCondROIDServerApp: Working thread: 4 started.
BdbCondROIDServerApp: Working thread: 5 started.
BdbCondROIDServerApp: Working thread: 6 started.
19:56:43.326 [7] OID_SERVANT_OBJECT(0): Find object. DETECTOR "svt" CONTAINER "SvtTkFindConstP" TIME "3147700405"
19:56:43.327 [7] BdbCondSimpleTimer::startWorkingThread() Working thread started.
19:56:43.327 [7] BdbCondSimpleTimer::addListener() Add new listener.
    The new listener address: 0xffbeecb4
    The old listener address: 0
19:56:44.295 [7] OID_LOADER: Transaction started.
BdbCondPathSingleton::loadFromConfigFile() -- info.
    A configuration file describing a revision/owner path was not found.
    Assuming the default behavior of BdbDatabase::fetch() interface --
    the most recent version of the condition data will always be
    fetched from the database.
19:56:44.551 [7] OID_LOADER: Find object [from cache: ]. DETECTOR "svt" CONTAINER "SvtTkFindConstP" TIME (seconds): "3147700405"

Each line (except very few of them) reported by the OID server has the timestamp (no date is given), the thread number, source of the information, and the message itself. The most important messages are highlighted in the above presented example.

Reading the journal file

The journal file of an installation is a plain text file containing information about all start/stop/restart operations over the servers of the installation. Remember, an installation can be used to serve more than one federation. This file can also be used to introduce handwritten comments.

The file is located at:

.../journal/activity.log

Sample contents of this file:

...
2001_August_16_15:50.24 OPERATION: oidserver_start FED: test USER: becla REASON: unknown
2001_August_21_15:44.10 OPERATION: oidserver_start FED: reptest USER: objysrv REASON: unknown
2001_August_22_23:52.50 OPERATION: oidserver_restart FED: test USER: becla HOST: objyserv5 REASON: unknown
2001_October_11_19:07.11 OPERATION: oidserver_start FED: repro3 USER: objysrv REASON: unknown
2001_October_12_08:41.11 OPERATION: oidserver_stop FED: repro3 USER: objysrv REASON: unknown
2001_October_12_08:41.22 OPERATION: oidserver_start FED: repro3 USER: objysrv REASON: unknown
2001_October_16_15:52.49 OPERATION: oidserver_stop FED: repro3 USER: objysrv REASON: unknown
2001_October_16_15:54.55 OPERATION: oidserver_start FED: repro3 USER: objysrv REASON: unknown
...

The timestamps for the start operations of this journal file are exactly the same as the names of the corresponding log files created by these commands (see log/<federation> directory). Commands and timestamps are highlighted in the example above.

Who to contact for problem resolution?

In case of problems, contact Igor Gaponenko.

 


BaBar Public Site | SLAC | News | Links | Who's Who | Contact Us

Page Owner: Jacek Becla
Last Update: June 13, 2002