SLAC CPE Software Engineering Group
Stanford Linear Accelerator Center
System Admin

Managing the SLCNET Gateway's

SLAC Detailed
SLAC Computing
Software Home
Software Detailed
 

 

Programmers' Guides, Users' Guides


SLAC contact:   Mike Harms x3220

Contractor: Eric Siskind  516 759-0707     -developed the cards

.

Gateway 0 is the only one in the first routing table, if it is down...nothing will come up...( 61 Mhrtz)
Gateway 1 is the other half of the network (67 MHrtz) -Also has Timing

.

Gateway 2 is PEP2 network ( 61 MHrtz )

.

Master pattern Generator (MP00) is on Gtw1, Siskind moved it from GTW0. It doesn't matter which one it is on. If there is a "Chattering Micro" on the 67 Mhrtz channel, then we must remove MP00 from the 67 Mhrtz channel in order to kill the chattering micro

.



To check gateways

.

  • Login to MCC and type:   @slccom:check_slcnet_gws
  • EX)

mcc::brobeck>@slccom:check_slcnet_gws

.

Checking SLCNET gateways
Checking gateway 0...
Checking gateway 1...
Checking gateway 2...

.

SLCNET
Gateway No. of micro slots Loopback check status
------- ------------------ ---------------------
0 207 (default gateway) test completed: loopback passed
1 35 test completed: loopback passed
2 12 test completed: loopback passed

If all are "passed" then do not mess with the QSUD0 or QPUA0

 


 

 


If GTW's do not come up

If GTW0 is not passing its loopback tests you can bypass the external loopback test by unplugging the RF cable

Have the EOIC check the BIVSC VSC Stats page on the SCP to look at the micros and if any have a bunch of errors

If there is a micro with a bunch of errors and there is a chattering micro -then restart the micro (power cycle)

If the gateway checks do not respond and SLCUPDN UP /MINSYS  -might show that QSUDO is not responding:

If QSUD0 is not coming up it is probably not passing its external loopback because of a chattering micro.  You can bypass this test by unplugging the RF cable in the back of GTW0.  The the loopback test will pass and the control system can be brought up.


To load routing tables:

.

This is dynamic and can be run without bouncing the control system
  • run sys$common:[drivers]loadroute.exe
    • At the prompt:  qsud0
  • Routing table: SYS$COMMON:[DRIVERS]QSUD0.RTE

The default routing for any micro is via gateway 0, so qsud0.rte only contains entries for micros which are NOT accessed via gateway 0.  As you might guess, the value is 8001 for gateway 1 and 8002 for gateway 2.  Therefore, if you’re switching MC00 from gateway 0 to gateway 1 you’d add a line with 8001 for MC00 to qsud0.rte, switch the frequency at MC00’s modem, and then run sys$drivers:loadroute (from a process with PHYIO privilege) and enter “qsud0” (without the quotes) at the prompt.  Note that the routing table is a dynamic property of the system, so there’s absolutely no need to bounce the control system in order to make such a change!  You just reload the routing table while the control system is up and running.  Usually after switching over a micro from one gateway to another, we run disp86 to see if the micro is visible via the new gateway.  Assuming that’s okay, you just IPL it, and away you go

 


To Reset card w/o loading card:

Running this with control system up will Disable Interrupts and control system will need to be bounced to get interupts enabled

run sys$common:[drivers]resetbipi.exe

At the prompt:  qpua0

 


To Load card:

.

Running this with control system up will Disable Interrupts and control system will need to be bounced to get interupts enabled

run sys$common:[drivers]loadpcil.exe

At the prompt:  qpua0

 


To bring UP or take Down Control System:

.

On MCC as SLCSHR

slcupdn down /fast     (Brings down the control system)

slcupdn up                 (Brings up the control system)

.

This takes about 2 minutes.

Loads:

  • Routing tables
  • Interrupts
  • Polling lists

 


.

Known issues:

.

  • chattering Micro on gateways -Screaming micro on Frequencies ( 61 or 67 MHrtz )
    • An error has occurred on either MCC or MCCDEV

.

  • if an error has occurred on QPUA0 then:
    • type: show error
      • Qpua0 will show errors > 2 (2 errors are normal)
        • QPUA0 is the Port driver for SLC
        • Must take Control system down
        .

If none of the above has worked:

.

  • run sys$common:[drivers]resetbackbone.exe
  • or run sys$common:[drivers]resetreset.exe
  • -After running any or both of these
    • run sys$common:[drivers]loadroute.exe
      • At the prompt:  qsud0
      .
Running the resets will page Eric Siskind



INFO:

.

  • Monitor file on VMS:
    • CVS: common$root:[com.gen]qpua0_monitor.submit 
    • slccom:qpua0_monitor.submit
      • -Sends out pages and emails

.

  • To check poll rates
    • open a scp on mcc
    • click on Network Micro Index
    • click on BIVSC poll panel
    • click on display vsc stats
      • You can page down or up...


  • From a SCP Click on "spawn gateway check" to check on the gateways
    • same as @slccom:check_slcnet_gws
    • Also click on "display errors"

.

.


Troubleshooting SLCNET:

From Eric SISKIND:

.

After each power cycle, I did the following:

 

.

  • Ran resetbipi.
    • run sys$common:[drivers]resetbipi.exe
      • At the prompt:  qpua0
  • Waited for qpua0: to come back online. 
    • show dev/ful qpua0          from a DCL $ prompt.
  • Waited for qsud0: to lose the “host unavailable” status designation. 
    • show dev/ful qsud0          from a DCL $ prompt.
  • Tested to see whether or not I could talk to gateway 0.  You can try virtually any $QIO function which actually talks to a gateway for this.  In my case, I used a program which dumps the poll list from a gateway. 
    • I have a stand-alone program dumpgatepoll.exe in user_disk_slc:[siskind.bivsc.axp] for this purpose.  Many of the programs in that directory talk to device qsua0 rather than qsud0, so you need to issue an “assign qsud0 qsua0” from a DCL $ prompt to use them.
      •  assign qsud0 qsua0
      •  run user_disk_slc:[siskind.bivsc.axp]dumpgatepoll.exe
  • Assuming that you could successfully dump the poll list from gateway 0,
    • use loadroute to load the routing table.  This lets you talk to the other gateways.
  • After loading the routing table, dump the poll list from all 3 gateways.
    • run user_disk_slc:[siskind.bivsc.axp]dumpgatepoll.exe        -gateway 0
    • run user_disk_slc:[siskind.bivsc.axp]dumpgatepoll.exe1       -gateway 1
    • run user_disk_slc:[siskind.bivsc.axp]dumpgatepoll.exe2       -gateway 2.
  • If you get this far, then the backbone is working properly and all 3 gateways have passed their selftest.  The remaining thing to try is whether the modem is working and the receive side of the modem can hear the transmit side after the outgoing signal (transmitted on 61 MHz for gateway 0 or 67 MHz for gateways 1 and 2) goes all the way to the head end of the respective CCTV cable, gets upshifted in frequency by 156.25 MHz, and comes back from the head end to the receiver at 217.25 MHz for gateway 0 or 223.25 MHz for gateways 1 and 2.  The is what I believe the check_slcnet_gws command file does.  However, I have a set of stand-alones which are approximately equivalent to the slcimage:SLCNET_LOOPBACK program which is run from that command file. 
    • run user_disk_slc:[siskind.bivsc.axp]externalloopback       checks the modem and path to the head end for gateway 0,
    • run user_disk_slc:[siskind.bivsc.axp]externalloopback1     does the same for gateway 1
    • run user_disk_slc:[siskind.bivsc.axp]externalloopback2     does the same for gateway 2.

.

 

It’s also necessary to understand the requirements which must be met at each stage of putting things together:

 

.

  • A gateway will not respond to commands from the PCIL board in the Alpha PCI cage unless it has successfully passed its selftest program.  That selftest currently includes an *internal* loopback test but not an *external* loopback test.  However, if there is a modem attached to the gateway’s control card, and if that modem is connected to a CCTV system which contains a chattering micro on the same RF frequency as the modem, then even the *internal* loopback test will fail.  I wrote new code for the gateways to eliminate the internal loopback because of this, but deploying this code requires changing proms in each gateway.  This was never done.  Therefore, if you suspect that a chattering micro is preventing a gateway from passing its *internal* loopback test, you can simply disconnect the modem’s RF cable to permit the gateway to pass selftest.  The indication that the gateway has passed selftest is that its red LEDs have all gone bright, then turned off, then counted up a bit, and then returned to all off.  If the LED pattern stops changing with any LED still lit, then the selftest has failed.
  • The PCIL board will not respond to any of the handshakes with the qpua0 port driver after initialization unless the PCIL board has successfully passed its selftest program.  That selftest includes sending a memory check command to gateway 0.  By virtue of the previous point, this command will never complete if gateway 0 can’t pass *its* selftest.
  • The qpua0 driver will not come online until it exchanges several handshakes with the code running in the PCIL.  Therefore, if the PCIL fails selftest (which can happen if gateway 0 fails selftest), the qpua0 driver will not come online.  The error count for qpua0 always increments when the driver comes online, but it also increments the first time the PCIL fails to respond to a driver handshake.  After that first increment, the qpua0 driver will wait 10 seconds, reboot the PCIL processor, and try to establish the handshakes again.  After each 5 failures, the driver waits an additional 10 seconds.  The loop continues forever until the PCIL responds to a handshake or else the Alpha is rebooted.
  • If the qpua0 driver is not online, then you can’t run resetbipi or loadpcil.  Any $QIO attempt will fail with SS$_DEVOFFLINE status.
  • The qsud0 driver will come online even if qpua0 is offline, but will be in the “host unavailable” state.  However, the qsud0 driver will continually retry connecting to qpua0 driver every 10-20 seconds until it succeeds.
  • If the qsud0 driver is in the “host unavailable” state, any $QIO operation attempted will get pushed into a lookaside queue and will not be executed.  This prevents any code which is attempting to talk to either the PCIL or the gateways (or through the gateways to the network) from functioning.  Things like loadroute, dumpgatepoll, externalloopback, and such will simply hang.

.

 

Outside of that, you simply have to apply logic.

.

.


Troubleshooting DEVNET:

.

Check the connection from the modem to devnet cable.  Also, make sure that the head end of the devnet cable is working properly.  If you can run dumpgatepoll, then the problem is in the modem, the network, or the ribbon cable connecting the controller board to the modem board in the gateway.  Also, make sure that there’s power to the modem board (for instance, that 2 of the 4 green LEDs on the front are lit.

.

If you want to run another test, try bringing up the control system on MCCDEV (after permitting it to come up even though the gateway is failing the loopback test).  Then type BLAST at the DCL prompt and observe whether one or two additional green LEDs on the front panel of the modem turn on.  It should be both of the pair which are normally off.  If only one of them comes on (the one next to the pair which are usually on) then the modem is transmitting but not receiving.  That makes it clear that the problem is in the network.

.

Past issues:

.

Mike cycled the power on gateway 1 - this seemed to cure the problem, for now.

.

The underlying cause was that gateway 1 thought that it was observing a carrier
on the cable when it wanted to transmit. Mike claims that
he observed the spectrum analyzer during one of the error periods and there was no
chattering micro, in fact there was nothing at all from either the
gateway or the micros on the 67 MHz band. This leads me to suspect that either
the modem in that gateway is generating a "carrier detect" output when there really
is no carrier present, or else the logic path through one of the FPGAs in the gateway
digital logic board was corrupted, leading the processor on that board to sense that
modem's carrier detect output when it wasn't really present. Cycling the power
certainly causes the FPGA's configuration memory to be reloaded from its PROM.

.

(If the processor in the gateway senses that a carrier is present, it won't transmit.
This is why Mike would observe no activity on the spectrum analyzer if the processor is
incorrectly sensing that a carrier is present. Also, if one gateway starts to continually
sense that a carrier is present, it generates a flood of Poll0 interrupts to the BIVSC
processor to report these carrier sense errors or each successive poll rate. The
reporting of these errors drops the poll rate for that gateway. However, if another gateway
senses a legitimate interrupt in response to a poll, it has very little available bandwidth
remaining to report that interrupt to its destination. Since each copy of the poller process
in the gateway polls until it gets a positive poll response, and then stops polling until

it reports that interrupt response to the Alpha, a swarm of interrupts from one gate
way can effectively drop the poll rate on the other gateways.)

.

If the problem recurs, my first suggestion to Mike is to change the modem card in
gateway 1. If this fails to correct the problem, the gateway's digital logic card
should be changed.

.


Chattering Micro

 



[SLAC CPE Software Engineering Group][ SLAC Home Page]

Modified: 01-Sep-2010
Created by: Ken Brobeck Aug 17, 2009