This document describes the process of starting the Aida system, AIDAPROD, after it has been down (for instance, after a planned or unplanned power outage). The old AIDALCLS system running on lcls-daemon4 has been decommissioned. The new AIDAPROD system now runs on the Linux mccas0 system, which was previously a Solaris system until April 21, 2015.
The AIDAPROD system is now the only Aida system and is used by all Aida users, including LCLS Operations, FACET, NLCTA, and physicists in their offices. Although not part of the AIDAPROD system, the system depends on two Err processes (oocCosEventService and errClient) to log messages and these processes must be started before any of the Aida processes.
There are two machines on which the Err and AIDAPROD processes run: the Linux mccas0 machine and the VMS mcc machine. The two Err processes, the core Aida processes (DaNameServer and DaServer), and the UNIX (Linux) Aida data provider processes run on mccas0. The VMS Aida data provider processes, almost exclusively now only servicing requests from FACET project Aida clients, run on mcc.
After logging onto mccas0 as laci, the "ps -ef | grep -i java" command will show all of the Java processes running on this machine including the Err and Aida processes. The AIDAPROD processes running on mcc may be identified using a "show sys" command with names having a prefix "AIDA" without a suffix of LCLS (those with a suffix of LCLS belonged to the now decommissioned AIDALCLS system).
The Err and UNIX Aida processes are started in the following order on mccas0:
The VMS Aida processes are all Aida data provider processes and may be started in any order on the VMS mcc machine after the Err and core Aida processes are running on the Linux mccas0 machine: the Aida SLC Control Database server, the Aida SLC History server, the Aida SLC BPM Orbit Data server, the Aida SLC Magnet server, the Aida SLC Utility server, the Aida SLC Master Oscillator server, the Aida SLC Klystron server, the Aida SLC Model Server, and the Aida SLC Buffered Data server.
As mentioned previously, the two Err processes are not part of the AIDAPROD system but must be running before the AIDAPROD system is started. They are used to log Aida informational, warning, and error messages. One of the key methods for determining the health of the Aida system when the Aida processes are started is to observe Aida log messages using the Message Log Viewer. Errors encountered when processing Aida client requests are also logged. Aida processes running on the Linux mccas0 system have a program tag of LCLS while Aida processes running on the VMS mcc system have a program tag of FACET, since requests serviced by the VMS Aida data providers almost exclusively come from FACET project Aida clients.
As described above, it is important to bring up the Err processes, oocCosEventService and errClient, on mccas0 before attempting to start the AIDAPROD Aida system. The Err processes are started by the st.ooc_CosEventService and st.err "type 1" startup files located in the /etc/init.d directory on mccas0. First the st.ooc_CosEventService startup file and then the st.err startup file is invoked when the mccas0 system is rebooted. To start these proceesses manually:
In the past the cmlogClientD process was needed on the mccas0 system to forward Err messages to the iocLogMsgServer process running on lcls-daemon2. This process is no longer needed since the errClient process now directly forwards these Err messages.
Log files for the startup of the Err processes are located in:
After starting the Err processes, oocCosEventService and errClient, the AIDAPROD Aida system may be started. Assuming that the iocLogMsgServer message logging process is running on lcls-daemon2, log messages from the Aida processes may be observed using the Message Log Viewer. This viewer may be brought up by selecting the "Message Log..." button on the "lclshome" display, which may be displayed by invoking the "lclshome" script on a machine such as lcls-builder using the softegr account. The Message Log Viewer is one of the primary methods of determining the health of Aida processes.
The DaNameServer and DaServer processes are the first Aida system processes that should be started and run on the Linux mccas0 system. These processes are started by the st.DaNameServer and st.DaServer "type 1" startup files located in the /etc/init.d directory on mccas0. First the st.DaNameServer startup file and then the st.DaServer startup file are the first Aida process startup files invoked when the mccas0 system is rebooted. To start these processes manually:
Observe the Message Log Viewer display to determine whether the DaNameServer and DaServer processes were successfully started. All Aida processes send the log message "Server ready" if they have been successfully started. One will almost certainly want to use the filter capability of the Message Log Viewer to see only messages for desired processes (e.g., filter LCLS messages having the Host name "mccas0" to view only log messages from Aida processes running on the mccas0 system).
Log files for the startup of these Aida processes running on mccas0 are located in the following directory:
After the DaNameServer and DaServer have been successfully started on mccas0 the rest of the AIDAPROD Aida system processes may be restarted. The following are the "type 1" startup files located in the /etc/init.d directory on mccas0 for the remainder of the AIDAPROD Aida system processes on mccas0: st.DpCaLclsServer, st.DpCaServer, st.DpKlysServer, st.DpModelServer, st.DpRdbServer, st.DpTestHistServer, and st.DpTestServer. These Aida UNIX data providers may be started in any order. For example, these processes may be started manually as follows:
Observe the Message Log Viewer display to confirm that each of the Aida data provider processes and the test Aida data provider processes have sent the log message "Server ready", indicating that the server was successfully started.
Sometimes a problem occurs while trying to start the DpCaServer or DpCaLclsServer processes resulting in the "Server ready" log message not appearing for one of these processes. The log file for the process will have the following line:
The AIDAPROD VMS processes will not startup correctly until first the Err processes and then the Aida core processes (DaNameServer and DaServer) are running on mccas0. They may be started as follows (these Aida data provider processes may be started in any order):
MCC::SLCSHR> sho sys /process=AIDA_DP*
Observe the Message Log Viewer display to confirm that each Aida server process started has sent the log message "Server ready" indicating that the server was successfully started. One will almost certainly want to use the filter capability of the Message Log Viewer to see only messages for desired processes (e.g., filter FACET messages having the Host name "mcc.slac.stanford.edu" to view only log messages from Aida processes running on the mcc system).
An AIDAPROD data provider validation script is available that can help in determining whether the AIDAPROD data providers are correctly functioning. An Aida data provider is any Aida process other than DaNameServer and DaServer. For each Aida data provider the validation script sends a client request to the data provider (server) and determines whether a response was successfully returned.
The AIDAPROD data provider validation script may be invoked as follows:
This results in output that includes the status of each client request to a server (e.g., "slcDbStatus = 0"). A displayed value of 0 indicates a response was successfully returned from the associated server and can be interpreted to mean that the server is functioning correctly. A displayed value of 1 indicates that a response was not returned. This can be interpreted to mean that there is a problem with the associated server unless there is a known reason why this condition occurred. When starting all of the computer systems after a down period there may be instances when there is a valid reason for an Aida server not to return a response to a client request (e.g., the client request involves a PV whose associated IOC is not yet running). One may see the results through the UNIX watchdog interface.
Author: Bob Hall 24-Jan-2012