SLAC CPE Software Engineering Group
Stanford Linear Accelerator Center
System Admin

How to restart some VMS processes

SLAC Detailed
SLAC Computing
Software Home
Software Detailed
 

 

 

Programmers' Guides, Users' Guides


 


 


 

 

The need to restart AIDA processes (most often AIDA_SLCBPM and AIDA_SLCKLYS):

The need to restart is generally that the process appears to stop responding to requests.  This problem was noted by OPS and they found that a restart corrected the issue.  I looked at the "signature" of the apparently non-responding process and implemented a new process MONAIDAINVH to check for this signature (in ALL AIDA_SLC* processes) and  force  a restart.  Currently MONAIDAINVH does a large majority of these restarts.  It typically takes a few to several minutes for the signature to be satisfied.  Occasionally OPS restarts first (or, redundantly, shortly after MONAIDAINV restarts).  MONAIDAINVH does have some false positives and false negatives, but they appear to be uncommon.

I see no reason that OPS should not do an AIDA restart if they suspect a problem.

The need to restart SLCCAS.

I assume that OPS takes this action when they see they are not getting PV access to MCC.   

The MONSLCCAS process was implemented long ago to recognize a signature which had been seen occasionally:  very high cpu usage with little i/o usage.  That pattern has NOT been seen  since the year 2014.

From what I can see (cpu usage of SLCCAS and SLCCAS02) SLCCAS02 is NOT being used at all.  SLCCAS is heavily used (average 60% of once CPU, where 100% is its ABSOLUTE maximum).  It may be that changing some of its clients to target SLCCAS02 rather than SLCCAS would be beneficial.  I would be surprised if currently restarting SLCCAS02 had ANY effect at all.

 

 



To Restart processes on MCC

Login to MCC as slcshr   (get password from EOIC)

 

ssh -l slcshr mcc

--You will see the below screen...


MCC::SLCSHR>

$
$ SLCUPDN           :== @SLCCOM:SLCUPDN
$ WARMSLCX          :== @SLCCOM:WARMSLCX
$ EPICS_FWD_RESTART :== @SLCCOM:EPICS_FWD_RESTART ! New!!
$ SHOSLCJOBS        :== @SLCCOM:SHO_SLCSHR_JOBS ! List all SLCSHR jobs
$ MONSLCJOB         :== @SLCCOM:MONSLCJOB ! List missing SLCSHR jobs (if any)
$ EOIC_Menu         :== @SLCCOM:EOIC_MENU
$ TELE              :== SEARCH SLC_ALLPHONE
$ FLUSH_ERR_INT_MB  :== @SLCCOM:FLUSH_ERR_INT_MB
$ SHOW_DB_IMAGES    :== @SLCCOM:SHOW_DB_IMAGES
$
$ ACCESS :== RUN/NODEB SLCIMAGE:ACCESS
$

 

To list jobs/processes:

 

MCC::SLCSHR> shoslcjobs

To search jobs/processes by regular expression

MCC::SLCSHR> pipe shoslcjobs | search sys$input aida
2180789E AIDA_SLCBPM     HIB 4/ 4 772016 2146 19294 161888 det
21800520 AIDA_SLCBPMBUFF HIB 4/ 4 22259935 24613 936538 174080 det
21807522 AIDA_SLCUTIL    HIB 4/ 4 3852222 5113 17178 152512 det
21800525 AIDA_SLCMAGNET  HIB 4/ 4 15310745 14560 17821 156336 det
21800934 MONAIDAINVH     LEF 0/ 0 10930335 46125 2177787 2368 det
2180053A MONAIDAPAGF     LEF 4/ 4 1460690 3083 12893 2624 det
2180595D AIDA_SLCDB      HIB 5/ 4 27879388 90429 551804 174080 det
21806D85 AIDA_SLCKLYS    HIB 4/ 4 1716591 3816 69315 174080 det

 

MCC::SLCSHR> pipe shoslcjobs | search sys$input slccas
21806852 SLCCAS         CUR 5/ 4 26467903 13545 9337 75168 det
2180785E SLCCAS02       LEF 6/ 4 6578 5 5698 27136 det
21800533 SLCCAS_GETDIAG LEF 6/ 4 1323391 1221 801556 2576 det
21800538 MONSLCCAS      LEF 4/ 4 162414 204 4074 2000 det

 

How Restart a process:

MCC::SLCSHR> warmslcx

Command file to start up SLC Control Processes, optionally just kill
the processes, or kill and startup a new process.

Syntax: $ WARMSLC process-spec [optional qualifiers]

Parameters

 

process-spec - Process(es) to affect (can be a comma separated list, and
               entries can be logical names, that translate to a list).
             - if * then affect all SLC Control Processes
               ("SLC_$PROCESS_LIST,SLC_$USER_PROCESS")

Qualifiers

 

    /Kill    - Just kill the process(es) (in reverse order).
or  /Restart - Kill and start the process(es).
or           - default is to simply start the process(es)--unless
               a process is already running.

/Oldsoft - Start up process(es) in oldsoft.
/Log - Set verify before running the process-specific command file.
/Trailoff - Don't prompt for a TRAIL message

 

Normally by Controls Admins or EOIC:

NOTE: use WARMSLC * following a SLCUPDN UP /MINSYS to
      start all remaining standalones AND to allow
      non-control room SCP's to be started.

 

Example:

To restart SLCCAS :

MCC::SLCSHR> warmslc slccas /restart

-You will be prompted:  Who and Why?

 




 

[SLAC CPE Software Engineering Group][ SLAC Home Page]

 

Modified: 17-Nov-2023
Created by: Ken Brobeck Nov 16 2023