EMC Monitoring Troubleshooting

V00-04 September 19 2000
A List of current problems and solutions
Single Low voltage alarm
This causes the EMC to go "NON RUNNABLE". Here is the symptom observed on the Epics panel:

Low Volatge alarm

Check for any possible data corruption by using the Live Fast monitoring plots. If no damage is found, bypass the channel by going to the runnable panel accessible from the main epics panel. Don't forget to notify the EMC on call expert and report what you did in the logbook.

2) Power supply off or tripped
This causes the EMC to go "NON RUNNABLE". Here is how it shows up in the monitoring:

Low Volatge ps off

Go to the electronics house and find out which power supply is off. Barrel Backward PS are located in rack B620B-02, Barrel Forward in rack B620B-01 and End Cap in rack B620B-03. If the power supply is off, check with the EMC on call expert first before switching it on. If it is tripped (red LED on), cycle the power switch. If the trip does not clear, contact the expert to swap the power supply, or replace it yourself.

3) Single crystal temperature alarm
Here is how it shows up in the monitoring:

crystal temperature alarm
This is a readout problem. It does not affect the ability of the detector to take data, nor is it a safety issue. Notify the monitoring expert and mask the channel in the alarm handler.

4) Diode Bias, High current alarm

This happens usually for channel 9 of the bias voltage. The alarm goes away after a few seconds. We know it is due to a bad connection on the bias voltage distribution panel or to the CAEN module itself. Typically the current for the channel will go up to more than 1000 units for one measurement and go back to normal. This can cause data taking downtime for a few seconds. No action is necessary. If this occurs more than 3 times per shift, an expert should be informed.
High Current alarms or warnings are also observed during PEPII injection phase. This is no artefact and is due to high beam backgrounds. If such an alarm occurs it is good to check other background monitors and inform the PEPII operators or liaison.

A List of older problems and their solution

1) The display goes all white like this:

Main Display not connected

Congratulations, monitoring just crashed or the network went down. The IOC hang and there's no data coming back. If the problem is due to network outage, wait until the network is back. If there's no network problem, the IOC probably crashed. Go in the electronics house to rack B620B-04, second crate from the top and press the red reset button on the IOC cpu called emc-mon Picture of the IOC crate. It's located on the first slot on the left see picture . Wait a few minutes until the scanning has restarted. You can see that it restarted when the green LEDs on the CAN module start blinking again. The CAN module is the one in the middle of the crate with the 4 flat ribbon cables plugged into it, as you can see on the picture .

2) I Lost the monitoring for Low Voltage and IOB temperatures in an entire sector of the EMC (BB,BF,EC):

This is obviously a CAN bus problem. Check that all displays connected to this part of the detector are not updated (use alarm handler). You can also check in the electronics house that the CAN bus module Leds are not all blinking anymore. The CAN bus module is located in rack B620B-04, second crate from the top in the middle of the crate. It has 4 green LED, one for each BUS (CAN0 top left, CAN1 top right , CAN2 bottom left , CAN3 bottom right) and 4 ribbon cables connected to it. If one of the LED is not flashing the corresponding CAN bus is in a dominant state. open the rack B620B-03. In the lower part of it is a chassis with red switches labelled CAN0 through CAN3. Switch off the one corresponding to the CAN bus you want to reset. Wait for 10 seconds, switch on. The LED should start flashing again. If not and the bus is CAN0 or CAN2, cycle the power on the switch on the right of the chassis and labelled as RadFet. If this doesn't help either, then we're in trouble. It is certainly problem with an EMB. You can't cycle the power on those without stopping the Data Acquisition !!!! So check that we are not taking data or wait until data taking stops, then cycle the TRB power by turning OFF and ON the individual power supplies in rack B620B-03 bottom crate this way:
- PS 0 and 1  if CAN2 (End Cap)  hangs
- PS 2,3,4,5  if CAN1 (Barrel Forward) hangs
- PS 6,7,8,9  if CAN0 (Barrel Backward) hangs
You will also have to cycle the individual IOB power supplies in rack B620B-01,02nad 03. This is to re-establish the connection between TRB and IOB.
If this doesn't work  call the monitoring expert.

3) A set of channels corresponding to ~32 channels just stopped being updated and became white on the display:

Contact the monitoring expert as soon as possible and tell him what you see on the display. If it's in the middle of the night and there's not going to be any data taking wait for decent hour before waking up the expert.

4) A single temperature channel displays -273.1 C.

A sensor connection broke. Mask it in the alarm handler, contact the expert and write something in the logbook.

5) A channel starts oscillating dramatically.

This is most likely a GMB problem or sensor problem. Mask it in the alarm handler, contact the expert and write something in the logbook.

6) A group of 8 channels starts oscillating.

Analog multiplexers have a temperature dependance. Usually the problem can be solved by cycling the power of the board. Find out on which bus the monitoring board is connected (see table). Go to EH rack B620B-03 lower crate labelled "GMB power supply". Switch off the power corresponding to the bus you want to clear, wait 20 seconds and switch it on again. If this did not solve the problem, contact the expert and write something in the logbook.

Last modified P-A Fischer February 24 2000