SLAC CPE Software
Engineering Group |
||||||||||
|
|
|
Greg White, SLAC 1st Working Draft, 2-Mar-2010 Last Modified: May 16, 2012 |
RESTARTING THE AIDALCLS NETWORK
The Orbacus Event Service and the ErrClient must be started first, if they are not already running.
To check they're running, log into Solaris AFS (slcs2, etc..)
source /afs/slac/g/cd/soft/dev/script/ENVS.csh
- Check to see if oocCosEventService and errClient are running
- errmanager all show
- (If both are running your done with step 1 go to step 2)
- To start each individually:
- errmanager $OOC_COSEVENT_NAME start
- errmanager $ERRCLIENT_NAME start
- OR to start them both together
- errmanager all restart
- You must start DaNameServer first. If DaServer was started before DaNameServer, then DaServer must be restarted.
- Login to lcls-daemon4 as laci
- You can bring up cmlogviewer to verify service startup
- Type: cmlogviewer & (Look for "Server ready" message)
- Setting the filter to "Server" will help
- Check if daNameServer is running :
- [laci@lcls-daemon4 ~]$ pgrep -fl daNameServer
- Start it if not running:
- [laci@lcls-daemon4 ~]$ /etc/init.d/st.DaNameServer restart
- Start /Restart remaining Unix servers:
- log into lcls-daemon4 as laci
- /etc/init.d/st.DaServer restart
- /etc/init.d/st.DpTestServer restart
- /etc/init.d/st.DpTestHistServer restart
- /etc/init.d/st.DpChadsLclsServer restart
- /etc/init.d/st.DpCaLclsServer restart
- /etc/init.d/st.DpRdbServer restart
- /etc/init.d/st.DpModelServer restart
- /etc/init.d/st.DpKlysServer restart
*Note: If the DpCaLclsServer fails to start, or fails to get data, restart it.
In cmlogviewer -If you get:
Bind Address: dpCa 2 _DaObjectBase failed to initialize. Caused by: bind() failed: java.net.BindException: Address already in use while initializing server, --check for presence of the caRepeater and kill that.
- Start AIDA Server on VMS
- These depend on the SLC control system being up on MCC (AIDALCLS & AIDAPROD) and MCCDEV (AIDADEV).
- Log into MCC as slcshr
- warmslcx AIDA_HIST_LCLS /restart
- warmslcx AIDA_KLYS_LCLS /restart
- warmslcx AIDA_MGNT_LCLS /restart
- warmslcx AIDA_MODL_LCLS /restart
- warmslcx AIDA_MOSC_LCLS /restart
- warmslcx AIDA_UTIL_LCLS /restart
- warmslcx AIDA_SLC_LCLS /restart
*****************
FOR AIDAPROD ONLY:
- To start the AIDAPROD (and AIDADEV network)
- mccas0's httpd server must be running.
- Click on this link, if it comes up then the httpd server is running
- http://mccas0.slac.stanford.edu/aida/
- The browser should list the Nameserver DEV/PROD.ior files.
*****************
Gottchas / Watchout / Troubleshooting
- Log files: /u1/lcls/sys/log/aida
- If AIDA@MCCO is synched from AIDAPROD@SLACPROD, then all the AIDA_INTERFACES will contain IORs of the AIDAPROD servers. Result, AIDALCLS would be using AIDAPROD servers. So, don't do that.
If you see this in a log, or stderr, then the Event Service is not running.
- Restart using: step 1 in RESTARTING THE AIDALCLS NETWORK above:
Tue Feb 02 18:34:58 PST 2010
ERROR, 'EventService' connect failed:org.omg.CORBA.TRANSIENT: attempt to establish connection failed: java.net.ConnectException: Connection refused minor code: 0x4f4f0001 completed: No Tue Feb 02 18:34:58 PST 2010
Could not resolve 'EventService' object reference Tue Feb 02 18:34:59 PST 2010
ERROR, 'EventService' connect failed:org.omg.CORBA.TRANSIENT: attempt to establish connection failed: java.net.ConnectException: Connection refused minor code: 0x4f4f0001 completed: No Tue Feb 02 18:34:59 PST 2010
Could not resolve 'EventService' object reference Tue Feb 02 18:34:59 PST 2010
There is no event service connnection
Unable to issue:
Making connection to Name Service
APPENDIX A : To Run a Linux AIDALCLS Server
The following describes how to manage (get status of, optionally stop, and start) an AIDA server on lcls-daemon4 using a DEVELOPMENT script. To run a production script, use the script in /etc/init.d in step 5.
- Log into laci@lcls-daemon4
- [laci@lcls-daemon4 ~]$ cd /usr/local/lcls/physics/package/aida/common/script
- Optionally, check which servers are running. If an instance is running, kill it before starting a new one
- pgrep -fl AIDA (Or look for a server specifically, e.g. "pgrep -fl daNameServer")
- Kill it with pkill.
- pkill -f DpTestServer (or use package name, e.g. "detest")
- pkill -f aida - to kill all
- Use cmlogviewer to verify correct server startup (you're looking for the "Server Ready" message)
- Type: cmlogviewer &
- Run the servers startup file:
- ./st.DaNameServer start
APPENDIX B: AIDALCLS VMS Processes
Which AIDALCLS process are running on MCC?
-------------
There should be these 7 :
NOTE: At the time of writing, the AIDA_BPM_LCLS process is not expected on LCLS.
MCC::SLCSHR> sho sys /process=AIDA*LCLS
OpenVMS V8.3 on node MCC 1-FEB-2010 16:29:31.98 Uptime 11 00:55:40
Pid Process Name State Pri I/O CPU Page flts Pages
20802C32 AIDA_MOSC_LCLS HIB 6 6397 0 00:00:12.89 7817 4027 M
20802C40 AIDA_HIST_LCLS HIB 6 6368 0 00:00:12.50 7800 4241 M
20801A0A AIDA_MGNT_LCLS HIB 6 6609 0 00:00:30.76 7885 4196 M
20801A13 AIDA_MODL_LCLS HIB 6 6401 0 00:00:28.90 7802 4091 M
20801A1F AIDA_UTIL_LCLS HIB 6 6367 0 00:00:28.60 7780 4247 M
20801A24 AIDA_SLC_LCLS HIB 6 6472 0 00:00:32.08 8595 4868 M
20802B19 AIDA_KLYS_LCLS HIB 6 6456 0 00:00:13.34 7830 4062 M
Which AIDAPROD processes are running on MCC?
There should be these 8 :
MCC::SLCSHR> sho sys /process=AIDA_DP*
OpenVMS V8.3 on node MCC 1-FEB-2010 16:30:47.88 Uptime 11 00:56:56
Pid Process Name State Pri I/O CPU Page flts Pages
20802C93 AIDA_DPSLCBPM HIB 1 7021 0 00:00:14.76 8398 4775 M
20802C9C AIDA_DPSLCKLYS HIB 6 6561 0 00:00:12.82 7818 4107 M
20802CA1 AIDA_DPSLCMGNT HIB 6 6599 0 00:00:14.20 9672 6089 M
20802CAC AIDA_DPSLCUTIL HIB 6 6538 0 00:00:12.89 7876 4126 M
20802CB4 AIDA_DPSLC HIB 5 6682 0 00:00:13.18 8584 4920 M
20800540 AIDA_DPSLCMODEL HIB 6 17132 0 00:01:29.28 9755 5991 M
20800549 AIDA_DPSLCMOSC HIB 6 6363 0 00:01:09.72 7809 4222 M
208005D3 AIDA_DPSLCHIST HIB 6 3851681 0 09:30:19.40 16458 11675 M
IMPORTANT AIDALCLS RUNTIME FILES
ON LCLS CA PRODUCTION:
+ executable, scripts, persistent config data
/usr/local/lcls/physics/package/aida
+ runtime data (IOR)
/u1/lcls/package/aida/ior (NameServerLCLS.ior) There is also "/u1/lcls/tools/aida/ior". This contains some files, but is probably not useful and can be removed.
+ orbacus
/u1/lcls/tools/orbacus (Event Service conf file) /usr/local/lcls/package/iona/orbacus/prod/JOB/lib/OB.jar etc
ON VMS:
+ Runtime data (IOR)
data_disk_mcc:[slc.aida]NameServerLCLS.ior
+ Server startup .COM files and .SUBMIT files
SLCCOM:AIDA_*_LCLS.SUBMIT (The batch submit, as used by warmslcx).
SLCCOM:STARTDP*_LCLS.COM (The meat of the startup process, with the java command)
[SLAC CPE Software Engineering Group][ SLAC Home Page]