![]() |
|
|
choose: sw hd | installObjy | setup: ams ls pud chs oids | confCH | maintain: ams ls pud chs oids | moveFD | creaBri |
|
Maintaining OID server
Introduction
Starting, stopping and restarting the server
Performing remote restart
Checking status of the server
Acquiring the server's statistics (counters)
Run-time monitoring of the server
Log files
Reading the journal file
Who to contact for problem resolution?
Introduction
This document provides the basics on OID server management assuming that the proper installation of the server has already been done as described in the corresponding installation guide.
Following is a summary of important information found here:
There are five scripts located at a root directory of each installation. These scripts have the names beginning with "oidserver_".
There is a log directory created beneath a root directory of each installation. It has one subdirectory per federation served through the installation, where the corresponding log files are recorded.
There is one (per installation) journal file (journal/activity.log) to record all management activities related to the installation.
IMPORTANT:
The proper version of the BaBar Database Authorization utility (BdbAuthCmd) has to be available through the binary path of a user who is starting, stopping, or restarting the server. This utility is used to check if the user has proper authorization to perform the server management. The main rule is that only those users System authorization privilege are allowed to start, stop, and restart the server.
Starting, stopping, and restarting the server
The server can only be started or stopped from a machine where the corresponding installation is performed. The only exception is the remote restart (see the next subsection for details).
Be careful not to stop the running server while is being used by clients. They (clients) can crash if this happens.
The following scripts do this:
- oidserver_start
- oidserver_stop
Both scripts expect the nickname of the (or the server serving the corresponding) federation as their very first mandatory parameter:
oidserver_start <federation> ...
oidserver_stop <federation>Normally the starting script will refuse to start the server if the script discovers another script already running under the same name. To avoid this limitation and to force a running server to stop before starting a fresh copy, the following optional switch can be specified:
oidserver_start <federation> [-restart]One of the things done during the server's startup sequence is to compress (using the gzip command) the previous log files (if any) left by this server. This may take a few minutes, depending on the size of files.
Performing remote restart
Once the server is running it can be restarted from a remote host (not the one where the server is running). Use the following command:
oidserver_restart <federation>This command unlike the above described oidserver_start will not reopen the log file.
Checking status of the server
The best way to check whether the corresponding server is running is by using the command:
oidserver_status <federation>The command will do three tests on the server, as illustrated in the following sample output:
.../oidserver_status repro3<CHECKING IF THE FOLLOWING SERVER IS ALIVE: FACILITY: Bdb/Conditions/OIDService SERVICE: \nfs\objyboot1\objy\databases\production\boot\physics\V1\rep\current\con003\BaBar.BOOT 1: Checking if the server is known to the CORBA Naming Server... 2: Checking if there is an instance of the server for the found name... 3: Getting in touch with the server to see if it's able to respond to requests... The server is live and is able to respond to requests.The SERVICE: name is in fact the BOOT file name of a federation associated with a short federation name repro3 in which normal slashes are replaced by the backslashes. This example show the normal response of the command if the server is running and is able to respond to requests. This command communicates with the server by sending simple requests to it and expecting proper replies. This instance of the server can only be restarted as it has been described in the previous section.
Another example shows what to expect in case there is no active server for a specified federation:
.../oidserver_status test<CHECKING IF THE FOLLOWING SERVER IS ALIVE: FACILITY: Bdb/Conditions/OIDService SERVICE: \nfs\objyserv2\objy\databases\bootfiles\conds3\BaBar.BOOT 1: Checking if the server is known to the CORBA Naming Server... ERROR: failed to obtain an object reference through the Naming Service. This may also mean that the Name Server is either not running or is not available. The server is not live.This kind of response is the normal one if the server is not running. It can be started by running the oidserver_start command described in the previous section.
Acquiring the server's statistics (counters)
The next command will acquire and print server statistics. This includes various counters as well as the current (Objectivity) transaction status.
Here is the command line syntax for the command:
oidserver_statistics <federation>The sample output of this command is here:
.../oidserver_statistics testTRANSACTION STATUS: <NOT STARTED> HISTORT Startup Time: 11/01/01 18:46:52 (local time) 0 ns First Operation Executed: 11/01/32 19:56:43 (local time) 324240000 Ns Last Operation Executed: 11/18/32 14:57:23 (local time) 521555000 Ns COUNTERS COMMIT TRANSACTION REQUESTS Total: 239 Executed: 323 FIND INTERVAL OID AS FUNCTION OF TIME Total: 1095882 Cached (out of total): 1083769 FIND FIRST INTERVAL OID Total: 0 Cached (out of total): 0 FIND LAST INTERVAL OID Total: 0 Cached (out of total): 0 FIND OBJECT OID Total: 6671696 Cached (out of total): 6563755 MISC Get Server: 41525 Reset Cache: 0 Get Timeout: 0 Set Timeout: 0 Get Statistics: 2 Load Configuration: 0 Merge Configuration: 0 Get Verbose Mode: 1 Set Verbose Mode: 0 TOTALS Other Requests: 0 Total Requests: 7809641 Failed (out of total) Requests: 0Run-time monitoring of the server
The next command acquires and prints the server statistics. This includes various counters as well as the current (Objectivity) transaction status. Here is the command line syntax for the command:
oidserver_monitor <federation>If the corresponding OID server does not have an open transaction (none is talking to the server during its timeout) the command will keep producing the following sample output until it gets interrupted by Ctrl-C, or until a real transaction is started:
.../oidserver_monitor repro3The monitor can be interrupted by pressing Control-C at any time. 11/18/32 15:17:14 (local time) 369489000 Ns TRANSACTION IS NOT ACTIVE. WAITING FOR THE START... 11/18/32 15:17:15 (local time) 377362000 Ns TRANSACTION IS NOT ACTIVE. WAITING FOR THE START... 11/18/32 15:17:16 (local time) 398946000 Ns TRANSACTION IS NOT ACTIVE. WAITING FOR THE START... 11/18/32 15:17:17 (local time) 407408000 Ns TRANSACTION IS NOT ACTIVE. WAITING FOR THE START... 11/18/32 15:17:18 (local time) 417650000 Ns TRANSACTION IS NOT ACTIVE. WAITING FOR THE START... 11/18/32 15:17:19 (local time) 427411000 Ns TRANSACTION IS NOT ACTIVE. WAITING FOR THE START... 11/18/32 15:17:20 (local time) 437464000 Ns TRANSACTION IS NOT ACTIVE. WAITING FOR THE START... 11/18/32 15:17:21 (local time) 447505000 Ns TRANSACTION IS NOT ACTIVE. WAITING FOR THE START...However, if the server's transaction is active the output is different:
.../oidserver_monitor repro3The monitor can be interrupted by pressing Control-C at any time. 11/18/32 15:21:59 (local time) 184251000 Ns ACTIVE TRANSACTION HAS BEEN DETECTED. findInterval: 0 [Hz], 0 [%], delta = 0/0, total = 1095883/1083769 firstInterval: 0 0 0 0, 0 0 lastInterval: 0 0 0 0, 0 0 findObject: 0 0 0 0, 6671696 6563755 findInterval: 0 [Hz], 0 [%], delta = 0/0, total = 1095883/1083769 firstInterval: 0 0 0 0, 0 0 lastInterval: 0 0 0 0, 0 0 findObject: 0 0 0 0, 6671696 6563755 findInterval: 0 [Hz], 0 [%], delta = 0/0, total = 1095883/1083769 firstInterval: 0 0 0 0, 0 0 lastInterval: 0 0 0 0, 0 0 findObject: 0 0 0 0, 6671696 6563755Log files
Each served federation has its own area (subdirectory) where server log files are recorded:
log/<federation>/These files are in human readable format. Most of them, except the one created by the most recent start operation, are compressed. The files have special formats for their names, including date and creation time. Here is a sample log files directory:
ls -al log/repro3total 4478336 drwxrwxrwx 2 objysrv ec 8192 Nov 1 18:46 . drwxrwxr-x 5 objysrv EC 96 Jul 5 13:10 .. -rw-rw-rw- 1 objysrv EC 613194 Jun 4 15:45 2001_June_04_13:44.18.log.gz -rw-rw-rw- 1 objysrv ec 15841546 Jun 7 10:37 2001_June_04_15:49.46.log.gz -rw-rw-rw- 1 objysrv ec 15808242 Jun 10 16:57 2001_June_07_12:20.23.log.gz -rw-rw-rw- 1 objysrv ec 16101747 Jun 15 14:41 2001_June_10_17:06.46.log.gz -rw-rw-rw- 1 objysrv ec 602 May 31 17:04 2001_May_31_16:55.42.log.gz -rw-rw-rw- 1 objysrv ec 824 May 31 17:06 2001_May_31_17:05.20.log.gz -rw-rw-rw- 1 objysrv ec 1189 May 31 19:39 2001_May_31_17:06.28.log.gz -rw-rw-rw- 1 objysrv ec 837 May 31 21:33 2001_May_31_19:55.11.log.gz -rw-rw-rw- 1 objysrv ec 15184174 Jun 4 13:39 2001_May_31_21:33.49.log.gz -rw-rw-rw- 1 objysrv ec 2144050307 Nov 18 14:57 2001_November_01_18:46.04.log -rw-rw-rw- 1 objysrv ec 840 Oct 12 08:41 2001_October_11_19:07.11.log.gz -rw-rw-rw- 1 objysrv ec 16491809 Oct 16 15:52 2001_October_12_08:41.22.log.gz -rw-rw-rw- 1 objysrv ec 5118656 Oct 17 19:57 2001_October_16_15:54.55.log.gz -rw-rw-rw- 1 objysrv ec 3393732 Oct 18 13:04 2001_October_17_19:57.26.log.gz -rw-rw-rw- 1 objysrv ec 781 Oct 19 00:11 2001_October_19_00:09.42.log.gz -rw-rw-rw- 1 objysrv ec 30785775 Oct 25 07:51 2001_October_19_00:13.49.log.gz -rw-rw-rw- 1 objysrv ec 11022552 Oct 29 15:41 2001_October_26_15:56.58.log.gz -rw-rw-rw- 1 objysrv ec 836 Oct 29 15:55 2001_October_29_15:42.22.log.gz -rw-rw-rw- 1 objysrv ec 13455139 Oct 31 15:48 2001_October_29_15:54.59.log.gz -rw-rw-rw- 1 objysrv ec 4918793 Nov 1 18:45 2001_October_31_16:46.40.log.gzRemember that the files are compressed by the server startup procedure only. You can compress/decompress them at any time. The startup procedure will compress any non-compressed log files the next time it runs.
Here is a sample contents of the above highlighted file:
more log/repro3/2001_November_01_18:46.04.logThu Nov 1 18:46:51 PST 2001 : Starting the OID Server... Initializing Objectivity use BdbApplication class 18:46:52.830 [1] OID_LOADER: Object constructed. 18:46:52.831 [1] BdbCondSimpleTimer::BdbCondSimpleTimer() The timer object has been constructed. 18:46:52.831 [1] BdbCondSimpleTimer::setTimeout() Set new timeout: 600 18:46:52.831 [1] OID_SERVANT_OBJECT(0): Start serving federation: /nfs/objyboot1/objy/databases/product ion/boot/physics/V1/rep/current/con003/BaBar.BOOT 18:46:52.870 [1] OID_SERVANT_OBJECT(1): Start serving federation: /nfs/objyboot1/objy/databases/product ion/boot/physics/V1/rep/current/con003/BaBar.BOOT 18:46:52.890 [1] OID_SERVANT_OBJECT(2): Start serving federation: /nfs/objyboot1/objy/databases/product ion/boot/physics/V1/rep/current/con003/BaBar.BOOT 18:46:52.911 [1] OID_SERVANT_OBJECT(3): Start serving federation: /nfs/objyboot1/objy/databases/product ion/boot/physics/V1/rep/current/con003/BaBar.BOOT BdbCondROIDServerApp: Working thread: 4 started. BdbCondROIDServerApp: Working thread: 5 started. BdbCondROIDServerApp: Working thread: 6 started. 19:56:43.326 [7] OID_SERVANT_OBJECT(0): Find object. DETECTOR "svt" CONTAINER "SvtTkFindConstP" TIME "3147700405" 19:56:43.327 [7] BdbCondSimpleTimer::startWorkingThread() Working thread started. 19:56:43.327 [7] BdbCondSimpleTimer::addListener() Add new listener. The new listener address: 0xffbeecb4 The old listener address: 0 19:56:44.295 [7] OID_LOADER: Transaction started. BdbCondPathSingleton::loadFromConfigFile() -- info. A configuration file describing a revision/owner path was not found. Assuming the default behavior of BdbDatabase::fetch() interface -- the most recent version of the condition data will always be fetched from the database. 19:56:44.551 [7] OID_LOADER: Find object [from cache: ]. DETECTOR "svt" CONTAINER "SvtTkFindConstP" TIME (seconds): "3147700405"Each line (except very few of them) reported by the OID server has the timestamp (no date is given), the thread number, source of the information, and the message itself. The most important messages are highlighted in the above presented example.
Reading the journal file
The journal file of an installation is a plain text file containing information about all start/stop/restart operations over the servers of the installation. Remember, an installation can be used to serve more than one federation. This file can also be used to introduce handwritten comments.
The file is located at:
.../journal/activity.logSample contents of this file:
...2001_August_16_15:50.24 OPERATION: oidserver_start FED: test USER: becla REASON: unknown 2001_August_21_15:44.10 OPERATION: oidserver_start FED: reptest USER: objysrv REASON: unknown 2001_August_22_23:52.50 OPERATION: oidserver_restart FED: test USER: becla HOST: objyserv5 REASON: unknown 2001_October_11_19:07.11 OPERATION: oidserver_start FED: repro3 USER: objysrv REASON: unknown 2001_October_12_08:41.11 OPERATION: oidserver_stop FED: repro3 USER: objysrv REASON: unknown 2001_October_12_08:41.22 OPERATION: oidserver_start FED: repro3 USER: objysrv REASON: unknown 2001_October_16_15:52.49 OPERATION: oidserver_stop FED: repro3 USER: objysrv REASON: unknown 2001_October_16_15:54.55 OPERATION: oidserver_start FED: repro3 USER: objysrv REASON: unknown...The timestamps for the start operations of this journal file are exactly the same as the names of the corresponding log files created by these commands (see log/<federation> directory). Commands and timestamps are highlighted in the example above.
Who to contact for problem resolution?
In case of problems, contact Igor Gaponenko.
BaBar Public Site | SLAC | News | Links | Who's Who | Contact Us
Page Owner: Jacek Becla
Last Update: June 13, 2002