EMC Monitoring Troubleshooting
V00-04 September 19 2000
|A List of current problems and solutions|
| Single Low voltage alarm|
This causes the EMC to go "NON RUNNABLE". Here is the symptom observed on the Epics panel:
Check for any possible data corruption by using the Live Fast monitoring plots. If no damage is found, bypass the channel by going to the runnable panel accessible from the main epics panel.
Don't forget to notify the EMC on call expert and report what you did in the logbook.
2) Power supply off or tripped
This causes the EMC to go "NON RUNNABLE". Here is how it shows up in the monitoring:
Go to the electronics house and find out which power supply is off. Barrel Backward PS are located in rack B620B-02, Barrel Forward in rack B620B-01 and End Cap in rack B620B-03. If the power supply is off, check with the EMC on call expert first before switching it on. If it is tripped (red LED on), cycle the power switch. If the trip does not clear, contact the expert to swap the power supply, or replace it yourself.
3) Single crystal temperature alarm
Here is how it shows up in the monitoring:
This is a readout problem. It does not affect the ability of the detector to take data, nor is it a safety issue. Notify the monitoring expert and mask the channel in the alarm handler.
4) Diode Bias, High current alarm
This happens usually for channel 9 of the bias voltage. The alarm goes away after a few seconds. We know it is due to a bad connection on the bias voltage distribution panel or to the CAEN module itself. Typically the current for the channel will go up to more than 1000 units for one measurement and go back to normal. This can cause data taking downtime for a few seconds. No action is necessary. If this occurs more than 3 times per shift, an expert should be informed.
High Current alarms or warnings are also observed during PEPII injection phase. This is no artefact and is due to high beam backgrounds. If such an alarm occurs it is good to check other background monitors and inform the PEPII operators or liaison.
A List of older problems and their solution
1) The display goes all white like this:
Congratulations, monitoring just crashed or the network went down.
The IOC hang and there's no data coming back. If the problem is due to network outage, wait until the network is back.
If there's no network problem, the IOC probably crashed. Go in the electronics house to rack B620B-04, second crate from the top and press
the red reset button on the IOC cpu called emc-mon Picture of the IOC crate. It's located on the first slot on the left see picture . Wait a few minutes until the scanning has restarted. You can see that it restarted when the green LEDs on the CAN module start blinking again. The CAN module is the one in the middle of the crate with the 4 flat
ribbon cables plugged into it, as you can see on the picture .
2) I Lost the monitoring for Low Voltage and IOB temperatures
in an entire sector of the EMC (BB,BF,EC):
This is obviously a CAN bus problem. Check that all displays connected
to this part of the detector are not updated (use alarm handler).
You can also check in the electronics house
that the CAN bus module Leds are not all blinking anymore. The CAN bus
module is located in rack B620B-04, second crate from the top in the middle
of the crate. It has 4 green LED, one for each BUS (CAN0 top left, CAN1
top right , CAN2 bottom left , CAN3 bottom right) and 4 ribbon cables connected
to it. If one of the LED is not flashing the corresponding CAN bus is in
a dominant state. open the rack B620B-03. In the lower part of it is a
chassis with red switches labelled CAN0 through CAN3. Switch off the one
corresponding to the CAN bus you want to reset. Wait for 10 seconds, switch
on. The LED should start flashing again. If not and the bus is CAN0 or
CAN2, cycle the power on the switch on the right of the chassis and labelled
as RadFet. If this doesn't help either, then we're in trouble. It is certainly
problem with an EMB. You can't cycle the power on those without stopping
the Data Acquisition !!!! So check that we are not taking data or wait
until data taking stops, then cycle the TRB power by turning OFF and ON
the individual power supplies in rack B620B-03 bottom crate this way:
- PS 0 and 1 if CAN2 (End Cap) hangs
- PS 2,3,4,5 if CAN1 (Barrel Forward) hangs
- PS 6,7,8,9 if CAN0 (Barrel Backward) hangs
You will also have to cycle the individual IOB power supplies in rack
B620B-01,02nad 03. This is to re-establish the connection between TRB and
If this doesn't work call the monitoring expert.
3) A set of channels corresponding to ~32 channels just
stopped being updated and became white on the display:
Contact the monitoring expert as soon as possible and tell him what
you see on the display. If it's in the middle of the night and there's
not going to be any data taking wait for decent hour before waking up the
4) A single temperature channel displays -273.1 C.
A sensor connection broke. Mask it in the alarm handler, contact the
expert and write something in the logbook.
5) A channel starts oscillating dramatically.
This is most likely a GMB problem or sensor problem. Mask it in the
alarm handler, contact the expert and write something in the logbook.
6) A group of 8 channels starts oscillating.
Analog multiplexers have a temperature dependance. Usually the problem
can be solved by cycling the power of the board. Find out on which bus
the monitoring board is connected (see table). Go to EH rack B620B-03 lower
crate labelled "GMB power supply". Switch off the power corresponding to
the bus you want to clear, wait 20 seconds and switch it on again. If this
did not solve the problem, contact the expert and write something in the
Last modified P-A Fischer February 24 2000