SLAC CPE Software Engineering Group
Stanford Linear Accelerator Center
System Admin

AIDALCLS

SYSTEM MANAGEMENT GUIDE

SLAC Detailed
SLAC Computing
Software Home
Software Detailed
 
 

Greg White, SLAC   

1st Working Draft, 2-Mar-2010   

Last Modified: May 16, 2012



RESTARTING THE AIDALCLS NETWORK

 
  1. The Orbacus Event Service and the ErrClient must be started first, if they are not already running.

    1. To check they're running, log into Solaris AFS (slcs2, etc..)

      1. source /afs/slac/g/cd/soft/dev/script/ENVS.csh

      2. Check to see if oocCosEventService and errClient are running
        1. errmanager all show
          1. (If both are running your done with step 1 go to step 2)
      3. To start each individually:
        1. errmanager $OOC_COSEVENT_NAME start
        2. errmanager $ERRCLIENT_NAME start
        3. OR to start them both together
          1. errmanager all restart
         
  2. You must start DaNameServer first.  If DaServer was started before DaNameServer, then DaServer must be restarted.
    1. Login to lcls-daemon4 as laci
    2. You can bring up cmlogviewer to verify service startup
      1. Type: cmlogviewer &  (Look for "Server ready" message)
      2. Setting the filter to "Server" will help
    3. Check if daNameServer is running :
      1. [laci@lcls-daemon4 ~]$     pgrep -fl daNameServer
    4. Start it if not running:
      1. [laci@lcls-daemon4 ~]$     /etc/init.d/st.DaNameServer restart
       
  3. Start /Restart remaining Unix servers:
    1. log into lcls-daemon4 as laci
      1. /etc/init.d/st.DaServer restart
      2. /etc/init.d/st.DpTestServer restart
      3. /etc/init.d/st.DpTestHistServer restart
      4. /etc/init.d/st.DpChadsLclsServer  restart 
      5. /etc/init.d/st.DpCaLclsServer restart
      6. /etc/init.d/st.DpRdbServer restart
      7. /etc/init.d/st.DpModelServer restart
      8. /etc/init.d/st.DpKlysServer restart

*Note: If the DpCaLclsServer fails to start, or fails to get data, restart it.

In cmlogviewer -If you get:

Bind Address: dpCa   2       _DaObjectBase failed to initialize. Caused by: bind() failed: java.net.BindException: Address already in use while initializing server, --check for presence of the caRepeater and kill that.

  1. Start AIDA Server on VMS
    1. These depend on the SLC control system being up on MCC (AIDALCLS & AIDAPROD) and MCCDEV (AIDADEV).
    2. Log into MCC as slcshr
      1. warmslcx AIDA_HIST_LCLS /restart
      2. warmslcx  AIDA_KLYS_LCLS /restart
      3. warmslcx  AIDA_MGNT_LCLS /restart
      4. warmslcx  AIDA_MODL_LCLS /restart
      5. warmslcx  AIDA_MOSC_LCLS /restart
      6. warmslcx  AIDA_UTIL_LCLS /restart
      7. warmslcx  AIDA_SLC_LCLS /restart

 

*****************

FOR AIDAPROD ONLY:

  1. To start the AIDAPROD (and AIDADEV network)
    1. mccas0's httpd server must be running.
      1. Click on this link, if it comes up then the httpd server is running
        1. http://mccas0.slac.stanford.edu/aida/  
          1. The browser should list the Nameserver DEV/PROD.ior files.

*****************


 

Gottchas / Watchout / Troubleshooting

  1. Log files: /u1/lcls/sys/log/aida
  2. If AIDA@MCCO is synched from AIDAPROD@SLACPROD, then all the AIDA_INTERFACES will contain IORs of the AIDAPROD servers. Result, AIDALCLS would be using AIDAPROD servers. So, don't do that.
 
  1. If you see this in a log, or stderr, then the Event Service is not running.

    1. Restart using: step 1 in RESTARTING THE AIDALCLS NETWORK above:

Tue Feb 02 18:34:58 PST 2010

        ERROR, 'EventService' connect failed:org.omg.CORBA.TRANSIENT: attempt to establish connection failed: java.net.ConnectException: Connection refused  minor code: 0x4f4f0001  completed: No Tue Feb 02 18:34:58 PST 2010

        Could not resolve 'EventService' object reference Tue Feb 02 18:34:59 PST 2010

        ERROR, 'EventService' connect failed:org.omg.CORBA.TRANSIENT: attempt to establish connection failed: java.net.ConnectException: Connection refused  minor code: 0x4f4f0001  completed: No Tue Feb 02 18:34:59 PST 2010

        Could not resolve 'EventService' object reference Tue Feb 02 18:34:59 PST 2010

        There is no event service connnection

        Unable to issue:

         Making connection to Name Service

 


APPENDIX A : To Run a Linux AIDALCLS Server

The following describes how to manage (get status of, optionally stop, and start) an AIDA server on lcls-daemon4 using a DEVELOPMENT script. To run a production script, use the script in /etc/init.d in step 5.

  1. Log into laci@lcls-daemon4
    1. [laci@lcls-daemon4 ~]$ cd /usr/local/lcls/physics/package/aida/common/script
    2.   Optionally, check which servers are running. If an instance is running, kill it before starting a new one
      1. pgrep -fl AIDA  (Or look for a server specifically, e.g. "pgrep -fl daNameServer")
    3.    Kill it with pkill.
      1. pkill -f DpTestServer (or use package name, e.g. "detest")
      2. pkill -f aida - to kill all
    4. Use cmlogviewer to verify correct server startup (you're looking for the "Server Ready" message)
      1. Type: cmlogviewer &
    5. Run the servers startup file:
      1. ./st.DaNameServer start

 


APPENDIX B: AIDALCLS VMS Processes

Which AIDALCLS process are running on MCC?

-------------

There should be these 7 :

NOTE: At the time of writing, the AIDA_BPM_LCLS process is not expected on LCLS.

MCC::SLCSHR> sho sys /process=AIDA*LCLS

OpenVMS V8.3  on node MCC   1-FEB-2010 16:29:31.98  Uptime  11 00:55:40

  Pid    Process Name    State  Pri      I/O       CPU       Page flts  Pages

20802C32 AIDA_MOSC_LCLS  HIB      6     6397   0 00:00:12.89      7817   4027 M

20802C40 AIDA_HIST_LCLS  HIB      6     6368   0 00:00:12.50      7800   4241 M

20801A0A AIDA_MGNT_LCLS  HIB      6     6609   0 00:00:30.76      7885   4196 M

20801A13 AIDA_MODL_LCLS  HIB      6     6401   0 00:00:28.90      7802   4091 M

20801A1F AIDA_UTIL_LCLS  HIB      6     6367   0 00:00:28.60      7780   4247 M

20801A24 AIDA_SLC_LCLS   HIB      6     6472   0 00:00:32.08      8595   4868 M

20802B19 AIDA_KLYS_LCLS  HIB      6     6456   0 00:00:13.34      7830   4062 M

Which AIDAPROD processes are running on MCC?

There should be these 8 :

MCC::SLCSHR> sho sys /process=AIDA_DP*

OpenVMS V8.3  on node MCC   1-FEB-2010 16:30:47.88  Uptime  11 00:56:56

  Pid    Process Name    State  Pri      I/O       CPU       Page flts  Pages

20802C93 AIDA_DPSLCBPM   HIB      1     7021   0 00:00:14.76      8398   4775 M

20802C9C AIDA_DPSLCKLYS  HIB      6     6561   0 00:00:12.82      7818   4107 M

20802CA1 AIDA_DPSLCMGNT  HIB      6     6599   0 00:00:14.20      9672   6089 M

20802CAC AIDA_DPSLCUTIL  HIB      6     6538   0 00:00:12.89      7876   4126 M

20802CB4 AIDA_DPSLC      HIB      5     6682   0 00:00:13.18      8584   4920 M

20800540 AIDA_DPSLCMODEL HIB      6    17132   0 00:01:29.28      9755   5991 M

20800549 AIDA_DPSLCMOSC  HIB      6     6363   0 00:01:09.72      7809   4222 M

208005D3 AIDA_DPSLCHIST  HIB      6  3851681   0 09:30:19.40     16458  11675 M

 

IMPORTANT AIDALCLS RUNTIME FILES

ON LCLS CA PRODUCTION:

+ executable, scripts, persistent config data

/usr/local/lcls/physics/package/aida

+ runtime data (IOR)

/u1/lcls/package/aida/ior (NameServerLCLS.ior) There is also "/u1/lcls/tools/aida/ior". This contains some files, but is probably not useful and can be removed.

+ orbacus

/u1/lcls/tools/orbacus (Event Service conf file) /usr/local/lcls/package/iona/orbacus/prod/JOB/lib/OB.jar etc

ON VMS:

+ Runtime data (IOR)

data_disk_mcc:[slc.aida]NameServerLCLS.ior

+ Server startup .COM files and .SUBMIT files

SLCCOM:AIDA_*_LCLS.SUBMIT (The batch submit, as used by warmslcx).

SLCCOM:STARTDP*_LCLS.COM  (The meat of the startup process, with the java command)

 

 

 


 



[SLAC CPE Software Engineering Group][ SLAC Home Page]