To Gracefully Kill a Run That is Processing (or Stuck)

Normal run


Here's the recipe:

  1. need to set the IOR file for the nameserver you want to look at:
    e.g. for PC1 (alias nameserver1)
    setenv NameServiceIOR `cat /nfs/oprserv01/u1/TAO/TAO_NS.ior`
  2. make sure you have done srtpath (newish release)
  3. use TaoNSDumper to look at the nameserver
    > TaoNSDumper
    Name service ior:
    IOR:010000002b00000049444c3a6f6d672e6f72672f436f73 ... 
    
    Name Graph
         0:  PC1: context
               LM: context
    here no LM is registered.
    With a registered LM you will see:
    > TaoNSDumper
      Name service ior:
    IOR:010000002b00000049444c3a6f6d672e6f72672f436f734 ... 
    
    Name Graph
         0:  ER6: context
           0:  LM: context
             0:  44279-1: reference
  4. use OprLMInterrupter to kill the LM
    > OprLMInterrupter ER6/LM/44279-1
  5. You will see that the elven start to finish and slowly (order 5min) the run should finish.

Other Ways to Kill a Run

FarmManager:
REQUESTING_XTC

RP not started

-> can reap and scratch
check first that the run has not started:
- look for the logdir
- look in oprruns status=Processing
- check no elven running:
findElven ER1

otherwise you want the farm to be asleep
before reaping, and reaped before scratching,
except in emergency cases.

If the RP is running... want to put it to sleep.

- there is a logdir
- there is oprruns status=Processing

- look in logdir to see what is happening
- look for running LM
- look for elven: findElven ER1

to kill nicely:
> OprLMInterrupter ERx/LM/run#-1

to kill meanly:
killElven ER1

if all elven are dead or not yet started, and
you know the processing is bad:

rpsetstate ER1 SetFailedProcessing (only for ER farms)

if do this while elven or LM are running they
will not be killed, run will continue, but CS will
not monitor it anymore.
if RP is stuck in MonitorProcessing (i.e. some NPs do not return) you can do:
rpsetstate ER# RecoverProcessing(works on ER and PC farms)
This will allow the run to finish, it will automatically get marked done or failed as appropriate.



Run with Nodes in bad shape (with never started elvens)

If there are set of nodes which are listed (in farminfo) as Started but have no elven associated with it and can not be killed with procedure above, do the following:
There are also some more comments on this subject in OprSOS hypernews
Last modified: Thu Apr 21 10:46:09 CEST 2005