SLAC CPE Software Engineering Group
 

UNIX Watchdog

UWD

SLAC Detailed
SLAC Computing
Software Home
Software Detailed
 

 

The UWD is a tool to monitor vital functions of our computer systems


              

PC Users: 

Currently the UWD does not work properly with Xwin32 V9.1.  In order to close a window you must close the main UWD button:

 



Privileges:

There are two levels of privileges when running the UWD

  1. General User
    1. Only monitor
  2. Privileged User
    1. Restart processes and acknowledge alarms
    2. Only the EOIC desk can perform this function

Start the UWD:

  1. Log in to lcls-srv01 or lcls-srv02
    1. type:   uwd   -the following runtime window will appear with the button reflecting the color of the highest alarm:

 

Clicking on the "Controls_UNIX_Watchdog" button will bring up the following display:

The Alarm Handler Main Window is divided into three parts:

  • Menu bar
    • Pull-down menu items that perform all the functions of the Alarm Handler
  • Alarm configuration display area   (2 parts)
    • Alarm configuration tree structure display (Left)
    • Alarm Group contents display (Right)
      • Contents of the currently selected Alarm Group from the alarm configuration tree structure
  • Message area

Alarms: (Color is used to show alarm severity)

           
WHITE   ---------
E - Error State Usually a Bad PV
WHITE   --------- V - Invalid Alarm Good PV,  but no data written to PV yet
RED  ------------- R - Major Alarm High limits reached
YELLOW -------- Y - Minor Alarm Minor limits reached
Background Color No Alarm No Alarm

 

Controls_UNIX_Watchdog:

  • Controls_Servers
    • Any of our servers not directly supporting any one program & AIDA checks
  • LCLS
    • LCLS servers and workstations

 


What we Monitor:

  • CPU
    • Monitors CPU usage
  • DISK
    • Monitors Disk Space
  • MEMORY
    • Monitors Memory usage
  • PING
    • Monitors network connectivity
    • lcls-uwd performs this monitoring
  • PROC
    • Monitor Specific Processes

 

How to move around in the UWD:

Clicking on will expand/collapse the branches

 

(Close up of a window)

    

                          

Acknowledge Button:

  • New alarms will cause the top window (Controls_UNIX_Watchdog) to blink.  After the alarm is acknowledged the blinking will stop.  Acknowledgement button will clear the alarm and remove the severity color from the button and up the branch, but will leave the alarm next to the button. This allows a user to see which PV is in error even after it has been acknowledged.
  • For example: if there is a process in error on lcls-daemon2 the severity error will be shown at these levels:
    • Controls_UNIX_Watchdog
      • LCLS
        • lcls-daemon2
          • PROC
  • When the error on PROC is acknowledged it will clear that specific error up the branch all the way up to Controls_UNIX_Watchdog.

 

Guidance Button:

  • Guidance will provide information concerning that PV (pop-up window) or it will open a browser and display an html page about the PV. Right now many guidance buttons are not functional, but will be in the future.

 

Process Button:

  • The Process Button is only on the Privileged UWD. (Currently only the EOIC)
  • Pushing this button will execute a command. Usually this is used to restart a process.
*NOTE: There is no query, you push the button and the command is executed.


To view more information on a specific PV:
1. Select a button

2. Click "VIEW" on the menu bar
a. Click on "Group/Channel Properties Window" (last one)

 

 

 

 



 

[SLAC CPE Software Engineering Group][ SLAC Home Page]

Author: Ken Brobeck, 10-Aug-2007
Last updated by Jingchen Zhou, 29-Jan-2009