Computing at SLAC
Search SLAC      

About

Installation

Documentation

Database API's

Tools Page

Adding a New Probe

Moving IEPM-BW

Troubleshooting Hints

Results

Original PLM

Troubleshooting Hints

The following are a list of problems which we have run into, what the causes might be and how to possible fix them.

  • Main Results Page:

    • The summary table has red timestamps even though they are current - This may be caused by the TOOLSPECS record parameter "runInterval" for the given test not being set or set at too small a value. If (the current time) - (the last run time) > 2*runinterval, the date and time are set red.
  • ABWE Failures:

    • Check that the node ip address is correct in the NODES table.
    • Run "create-abw-list" to create an updated configuration file
    • Kill abwed
    • It should auto restart in about 15 minutes.
  • Troubleshooting Probes:

    • The daemon that loads the probe data into the data base is "load-datad". This can work so fast that you cannot see the data appearing in the data directory for loading. The first step is to kill "load-datad".
    • "cd" to the data directory in the mysql directory (ex. if the mysql directory is /u1/mysql, cd to /u1/mysql/data)
    • Do an "ls". If there are any data files, they will appear. The data type is included in the file name. You can sit and watch this directory, or you can force probes to run and write to it.
    • You can force probes to be scheduled (and run) by executing "schedule-load -T probetype", (ex. "schedule-load -T traceroute"). They will be loaded into the scheduler and run in due time.
    • Watch the above mentioned data directory for the probe data to appear. When it does, you can force loading of it into the data base by calling the appropriate load routine (ex. "load-trace-data").
    • Note: there are also various logs which can help you see what has been happening. In "/u1/mysql/logs" there are logs for the various operations that are done. The ones ending in "today" are the current ones.
    • bw-synchd.today
      keepalive.today
      load-abwe-datad.today
      load-iperf-data.today
      load-pathchirp-data.today
      load-pathload-data.today
      load-ping-datad.today
      load-schedule.today
      load-tlaytcp-data.today
      load-trace-datad.today
      pathchirpd.today
      pingd.today
      traced.today
      
    • Check the logs appropriate to the probe you are checking out and see if it has errors, is loading their data, etc. They can be helpful.
    • The load-datad daemon will automatically restart after a period of time, so if you need more time, you might have to kill it again.
  • Killing Servers:

    • kill-all-servers: kills all active ipem-bw servers and daemons. kill-all-servers reads the $iepmSrcDir/config/servers.alive file and kills all the servers listed in there. It also reads the $mysqldb/keepalives directory and kills all the daemons with keepalives in there.
  • Restarting Servers:

    • restart-all-servers: restarts all the iepm-bw servers and daemons. restart-all-servers reads the $iepmSrcDir/config/servers.alive file and restarts all the servers there. It also reads the $mysaldb/keepalives directory and restarts all the daemons with keepalives in there.
  • Update Code:

    • update-code: kills all the servers and daemons on a monitoring host, detars the newtar file and restarts all the servers and daemons. Located in the $iepmSrcDir/ directory. Please do not copy any code to the monitoring hosts without using this if at all possible.
  • Updates to Monitoring Hosts:

    • update-monhosts: date and time stamps the VERSION_DATE file and create a new tar file of the code distribution. It then uses scp to copy the newtar file to each monitoring host and then call via ssh update-code. For details on moving your IEPM-BW monitoring to a new system, click here.

  Send Us Feedback  
  SLAC