(1) Stability problems: we can start taking data, but the data flow in the ROD stops after a small number of events (usually < 1000). When this happens, it is associated with at least one of three different fatal errors: (a) DX Fault (see below for acronyms and a brief system overview) - this is reported through hardware channels. It has been difficult for me to trace the source of this fault. It requires some study of the firmware (written in Verilog) to make progress here. The Verilog code is well commented, but I am reaching my limits here. (b) RPU Fault - reported by one or both of the RPUs (DSPs). We understand which condition leads to the fault, we're trying to find out what leads to this condition. In brief, the RPU expects data in its input buffer which is not there. Our goal this week is to understand the underlying code (C++, on the DSP) and find or rule out a bug in this code. (c) RPU stalls - one of two RPUs fails to process events at some point. Same approach as in (b). The described problems are observed at L1A rates of 50Hz and 1kHz alike, but not when triggers are generated by the ROD itself (in this case it is auto-throttled and the effective processing rate is low). There is one more, (d) One of 10 SPUs fails to process the first (and any subsequent) event. This happens at the beginning of a run, then requires a reboot. No pattern as to which SPU fails. (2) Rate problems: cannot sustain the design rate. We could barely write at 80Hz a couple of months ago. This was in part due to the HPU code, which was not at all optimized for speed, but for system tests. I am now working on new code which right now can at least do one event in less than 4ms, and I believe with further refinement we can push this to 1ms soon. And then comes the fine tuning.. this is where my limited expertise with this kind of pipelined processing could benefit from advice and discussion, which I'm lacking since our engineer left. However, a particular subsystem call - a simple read operation targeting all SPUs and RPUs - was taking 2ms. This read operation has to be performed for every event, so by design should never take this long. In fact the same operation was used in our 100kHz test years ago to transfer the data along the same path - a much larger data volume! If we succeeded reading out at 100kHz back then this operation should not take more than 10us. We managed to factorize the problem a bit and bypass the problem partially, but the driver routine still should run faster. The drivers are implemented in the DSP code, but the actual read/write process is implemented in the FPGAs. Now, for a very brief introduction to the system. One ROD processes the data from two chambers (960 channels each) at 20 or 40 MHz sampling rate. The data comes, as far as the ROD is concerned, from 5 frontend electronics boards per chamber. The frontend readout control, deserializing of the incoming data, communication with the TIM (TTC interface module in the ROD crate), and connection to the ROL via Slink(HOLA) mezz board are all implemented entirely on the CSC transition module (CTM) which acts together with the ROD as a unit. The ROD consists of the motherboard and 13 mezz boards (GPUs) which are identical in hardware (1 TI 6203 DSP, 2 Xilinx Spartan FPGAs, 1 2Mword SDRAM) but have different functions: 10 act as SPUs (Sparsifying Processing Unit) which receive the incoming data via the expansion bus (XB) and perform zero suppression and cluster identification. 2 act as RPUs (neutron Rejection PUs) which build the event fragment out of the SPU's output for each chamber, and optionally perform further noise suppression by matching clusters across the chamber layers. One acts as HPU (Host PU) and orchestrates the whole ensemble. It also adds the event header and trailer information to the fragment. The data flow between SPUs, RPUs, and CTM/SLink is handled by a bus system we call the Data Exchange (DX). The bus-GPU interface is implemented as a set of FIFOs in FPGAs (one per GPU). Communication between HPU and SPU/RPU (also referred to as DPUs) is handled by the DPU Control (DC) subsystem, also implemented in a FPGA. The available documentation on the ROD system is posted at http://positron.ps.uci.edu/~pier/csc/CSCElectronics.html and the most relevant documents to start with on this page are http://positron.ps.uci.edu/~pier/csc/CSC_ROD_FDR_1.pdf http://positron.ps.uci.edu/~pier/csc/IRODBlockDiagram9.pdf (Overview) http://positron.ps.uci.edu/~pier/csc/IROD_Subsystems/DX_Notes22.pdf (Data Exchange) http://positron.ps.uci.edu/~pier/csc/IROD_Subsystems/DC_Notes25.pdf (DPU Control) http://positron.ps.uci.edu/~pier/csc/CTM/CTM_ReferenceManual_01.pdf (CTM)