SLAC's Network Management Features

Connie Logg, October 1993

Current NMS:

We have a DECstation 5000 on which we are running DEC MSU. DEC has recently announced that they are no longer support DEC MSU, and we are going to be replacing it with NETVIEW 6000 running on an RS/6000 37T.

The features of DEC MSU which we currently use are:

Ethermeter Utilization

We have attempted to place an NAT ethermeter on EVERY segment of cable in our network. Several pieces of code have been developed in house to probe these ethermeters, on a regular basis (once an hour currently), and to just look at the MIB variables in a sensible fashion (we call this piece of code natlookup).

  1. Data and error statistics are collected hourly, 7 days a week, 24 hours a day, from all the ethermeters (44 of them currently). Plots of network activity are available on demand. These plots include:
  2. Extensive use of WWW is made to provide plots to the user community. Once an hour the data for "today" is plotted and the postcript file is made available via WWW. Once a day (early in the morning) a plot of yesterday's data and the data for the past week are created and the postscript files are made available via WWW.

    If a plot of any other time period is desired, the plotting program can be invoked directly by the user.

  3. Other reports generated from this data include:

Bridge Monitoring

Data is collected hourly from the NAT bridges in the SLAC network. The data includes the number of good packets, the number of crc and alignment errors, the number of collisions, and the number of multicasts.

Routers

Currently data is collected hourly from routers in the network. A daily report of this is generated, but no "analysis" is done. It is just eyeballed as needed.

Network Timing

We have made an attempt to "time" our network via issueing pings to the ethermeters and critical servers. The ethermeters and critical servers are pinged 4 times an hour a total of 10 times each: 5 times with a 100 byte packet and 5 times with a 1000 byte packet. The ping command returns a minimum, maximum, and average ping time for the 5 pings. This is stuffed in a flat file and plotted on a daily basis.

The pings only capture a snapshot of the network at a specific time, but it does allow us to compare the responsiveness of nodes.

System Connectivity Tracking

As mentioned in section I., we track network connectivity (network management station centric) via the NMS. The trap generated by the NMS when it sees a change of response from a node (they are polled every two minutes by the NMS) results in the event being recorded in a flat file. This flat file is processed daily by 2 programs, which generate the following reports:

Enterprise Wide Network Database

We have an Oracle data base (currently hosted by a VAX 9000) which contains a plethora of information about our network. Much of the data collection detailed above is driven by lists of servers, bridges, ethermeters, and routers which are extracted daily (automatically) from this data base known affectionately as CANDO. The person who maintains this database and updates it as needed also maintains the MSU network map described in the first section.

Problem Tracking

Our network problem tracking is currently done by a system developed for VM problem tracking years ago. Every problem, change, or other action to the network is currently register in this system. This system will probably be replaced in the next year with a new system that taps into our new network management system and our CANDO database.

Summary

I have tried to summarize some of the components of our network management strategy. There are several other areas which are monitored which I have not covered (for example: appletalk, micom switch, and IHEP traffic).

If you would like more information, please feel free to contact me by phone: 415-926-2879 or email: CAL@SLAC.STANFORD.EDU.