| Symptom |
Therapy |
| LV Power Supply (PS) shows negative temperature readings
|
If data from the corresponding IOB looks OK, and -5V voltage and currents
look OK as well, try the following sequence until problem fixed:
- powercycle the PS
- take PS out, plug it back in
- replace PS
If replacing the PS does not fix the temparature reading, the problem is most
likely on the backplane and cannot be fixed right away. Arrange for alarm limits
to be adjusted (ask the monitoring expert), or the alarm masked in the alarm handler.
Warning: doing any of the 3 steps described above will certainly require run control to be reconfigured. Pay attention to the data taken right after the power supply was touched. Look especially for groups of 72 channels missing.
|
| Yellow or Red EPICS alarm
|
- Make sure the alarm is real
- Here are some known symptoms causing alarms to help you identify the problem
- If related to a LV power supply, powercycle it. If problem persists,
replace it
.
|
| White EPICS alarm
| Communication has been lost.
- Wait. Often the channels come back after a few minutes.
- If problem persists: Follow instructions on the
Monitoring troubleshooting page
|
| EPICS display does not work (crash or white boxes on the screen)
|
- Are you running Epics from one of the machines on the IR2 subnet (bbr-devXX) ? If no, then your Epics client (display) does not have access to the channel database and you display will be white like this.
log on to bbr-dev03 and restart your EPICS display.
- Is Netscape running on the same terminal? If yes, kill Netscape and restart it with netscape -install &. Then restart EPICS with
source ~babaremc/monitoring/monitorSetup .
|
| Dead channels in OEP histogram
| Look for a pattern within the dead channels:
- Group of 720 channels (430 endcap): a whole crate is missing.
- Check the corresponding TRB power supply
(
Floor plan of E-Hut).
Replace if broken.
If that is not the problem, check points 2-4 with the DAQ shifter:
- Was the crate in the partition at all? Check crate mask for that run.
- If yes, did it fail to reboot?
- If not, did it run full?
- If a DAQ failure can be excluded, call the electronics expert.
- Group of 72 adjacent channels in the barrel (theta runs from 9-32
or 33-56, in three phi counts) or 43 channels in the endcap (appears as three
squares in the theta-phi plane):
one complete ROM is missing. This could have the following reasons:
- The ROM did not reboot. Check if the green i960 LED is on. If not,
push the reboot button.
- The power supply failed. Check the corresponding EPICS channels, or
the PS itself. If the red LED is on, try powercycling it first. If
problem persists, replace the power supply.
- If none of the above is true, the problem is either the IOB or the
UPC(ROM).
- Group of 24 adjacent channels: one fiber is missing.
- Check the corresponding LED on the ROM. If it is stable or flashing red,
swap the fiber with the next one on the same ROM.
- If the LED moves with the fiber, the problem is either the
fiber itself or the frontend (IOB). Nothing you can do about it,
fill out an electronics fault report form.
- If the same LED remains red no matter where you move the fiber to,
the problem is with the UPC. Notify the Odf expert.
- Contact Electronics expert.
- Group of 12 adjacent channels: one ADB is missing. This is a frontend
problem.
Report it.
- Group of 4 adjacent channels: Probably a frontend problem.
Report it.
- Single channels missing: frontend problem.
Report it.
|
| Noisy channels in OEP histogram
| Look at the Hit Maps (page 7).
|
| EMT reports hot towers or Level 3 reports hot channels
| Look at the OEP histograms (page 7, hitmaps for 0.01 GeV and higher
thresholds) and try to correlate noisy channels with alleged hot towers.
If the problem can in fact be traced back to the EMC, and it causes a serious
problem for taking data (i.e. high deadtime), contact the electronics expert.
|
| OutOfSynch errors
|
- In all slots: If you find those red bars all across page 11 ('Emc TC Damage'), it is most
probably due to the DAQ having been rebooted just before that particular run was taken.
This is normal since at that point everything in fact is out of synch. You can
double-check by looking at the next and previous runs.
- In one slot: Some UPC modules are known to create "out of Synch" error messages at low rate.
The cause for this is unknown and will certainly remin unknown until the end of time. As long as the errors occurs in less than 10%
of the events, the error can be ignored. You can always check with the DAQ on the "cmlog" screen if the error is still there and how frequent it is.
|
| Crates go full
| Check if the problem starts in a particular ROM.
The DAQ shifter should be able to tell from the Epics display. If not, watch the yellow 'full' LEDs come
alive while the shift starts a run. If there is one particular ROM, see below.
|
| ROMS go full or produce frequent errors
| Xyplex into the UPC and look for error messages. Record typical output by pasting it into a text file. Notify the Odf expert.
|
| Chiller alarm | The BaBar shift normally has
to respond to EMC Chiller alarms, the instructions to put the EMC in a
safe state are given in the Care and
Feeding Manual. The manual also tells the shifter to either call
the EMC shifter and/or the chiller expert. Switching ON a
chiller MUST be done by a trained electrician using the
correct PPE.
You should also look at the description
of the interlocks used by the chillers to shut down the EMC
electronics in case of emergency. If a chiller has been
turned off, make sure that the corresponding power supplies for the
frontend have been turned off as well.This should happen
automatically via the interlock system, but it is always better to
check. |
| Light pulser
|
Refer to the lightpulser troubleshooting page
|