A LONGITUDINAL MULTI-BUNCH FEEDBACK SYSTEM USING PARALLEL DIGITAL SIGNAL PROCESSORS

L. Sapozhnikov, J. D. Fox, J. J. Olsen, G. Oxoby, I. Linscott
Stanford Linear Accelerator Center, Stanford, CA 94309
A. Drago, M. Serio
INFN Laboratori Nazionali, Frascati, Italy

ABSTRACT

A programmable longitudinal feedback system based on four AT&T 1610 digital signal processors has been developed as a component of the PEP-II R&D program. This longitudinal quick prototype is a proof of concept for the PEP-II system and implements full-speed bunch-by-bunch signal processing for storage rings with bunch spacing of 4 ns. The design incorporates a phase-detector-based front end that digitizes the oscillation phases of bunches at the 250 MHz crossing rate, four programmable signal processors that compute correction signals, and a 250-MHz hold buffer/kicker driver stage that applies correction signals back on the beam. The design implements a general-purpose, table-driven downsampler that allows the system to be operated at several accelerator facilities. The hardware architecture of the signal processing is described, and the software algorithms used in the feedback signal computation are discussed. The system configuration used for tests at the LBL Advanced Light Source is presented.

INTRODUCTION

Figure 1 presents the components of the prototype longitudinal damping system as installed at the LBL Advanced Light Source. The system uses a phase-detection technique to process beam signals from four button-type pickups. The detection frequency is at the sixth harmonic of the ring rf (6 × 499 MHz, or 2998 GHz) and a periodic microwave coupler circuit is utilized to generate a coherent tone burst from the pickup signals. The processing bandwidth of the detection process is limited to 400 MHz, allowing measurement of each bunch's synchrotron motion in an independent manner for bunch spacings of 4 ns. The detected oscillation signal is digitized at the bunch-crossing rate. A digital signal-processing block computes bunch-by-bunch correction signals and applies them to the beam using a fast D/A, an output modulator, a power amplifier, and a beam kicker.1,2

DIGITAL PROCESSING ARCHITECTURE

The digital signal-processing of the system consists of a fast 8-bit A/D stage, a programmable downsampler that selects bunches for processing, a computation stage composed of 4 AT&T 1610 processors, a hold buffer stage, a fast 8-bit D/A, and an output modulator stage. Figure 2 shows the signal flow in the digital processing components of the system. The front and back ends of this system operate at bunch crossing rates up to 250 MHz. The system is clocked by the ring rf clock at 500 MHz while a fiducial signal is used to synchronize the feedback process to a particular rf bucket.

*Work supported by Department of Energy contract DE-AC03-76SF00515.

Presented at the Beam Instrumentation Workshop, Santa Fe, New Mexico, October 20-23, 1993
Fig. 1. Block diagram of the longitudinal quick prototype as installed at the ALS.

All data processing in the digital system is performed on a long word composed of four 8-bit samples from consecutive bunches (a so-called group of four). The front and back-end bunch rates of 250 MHz, in conjunction with the 4-bunch parallelism of the processing, reduce the basic data transfer rate in the front end and hold buffer stages to 62.5 MHz. This parallelism simplifies the circuit design and timing, as only the very front and back-end circuits must operate at the full 250 MHz rate.

The feedback processing is a downsampled system, in which the oscillation coordinate of a bunch is only sampled once every $n$ turns, where $n$ is a downsampling factor. In the ALS nominal synchrotron frequencies are in the 8–10 kHz range, while the revolution frequency is 1.2 MHz. For the system as configured at the ALS, $n=24$ is used, which allows the processing block to run with roughly six samples per synchrotron oscillation period. This downsampling further reduces the data transfer rate in the DSPs.

The downsampling function of the quick prototype is implemented as a table-driven system. A memory look-up table is read for each group-of-four crossing. The address into the table is composed of a turn count and a group-of-four count. The memory table value contains two fields. A single TAG bit specifies if the A/D data should be processed for this group of four, while a HOLD_BUFFER_ADDRESS field specifies the address in the hold buffer where output data from the DSP should be placed. The quick prototype uses a $2K \times 9$ downsampler memory that allows downsampling factors of 1–31 and up to 64 groups of four (256 bunches). This approach is flexible and implements the downsampler in only 3 integrated circuits.
The heart of the signal processing is a computational block composed of four AT&T 1610 digital signal processors. These 16-bit single-chip processors are general-purpose programmable elements, each with 16 Kbyte of internal dual-port memory and a 25-ns multiply-accumulate time for cached instructions. The feedback signal computation is implemented in these processors and specified by a program and filter coefficients. A bit I/O interface is used to allow external signals (such as an injection trigger) to pass information to the feedback process on an exception basis.

The data interface to the DSPs is implemented using the parallel input/output (PIO) ports of the 1610. The data from the front-end A/D is transferred under downsampler control to the PIO input port, and the feedback computation result is taken from the PIO output port. The PIO ports are run in passive mode, controlled by a PAL-based state machine which sequences the input and output transfers as controlled by the downsampler TAG bit. As the data samples are 8 bits wide, the input data is presented with the 8-buffer address bits from the downsampler to create a 16-bit PIO word. The address bits allow the DSP program to identify which bunch generated the data sample in the PIO transaction. The hold buffer is a 32-bit-wide fast memory block which maintains an image of the kick signal for each bunch. The hold buffer is addressed by the group-of-four counter. Each hold-buffer read cycle is followed by a potential hidden write cycle, in which a new DSP result can be transparently written into the hold buffer if enabled by the downsampler TAG bit. As the HOLD_BUFFER_ADDRESS field of the downsampler memory specifies the
Fig. 3: Oscilloscope record of the base-band kick signal and the resulting modulated kicker signal. The figure shows kicks for sequential bunches 4 ns apart. The first kick is positive, the second negative and of lesser amplitude, the third zero, and the fourth and fifth positive. The kicker signal phase inverts to achieve the negative kick on the second bunch. The wide-band nature of these signals can be seen in the 380 ps rise time of the base-band kick signal.

location into which new DSP results are placed it is possible to maintain a fixed offset between input and output data as the bunches circulate. This offset is necessary to allow the kicker system to be placed at any point in the ring, as well as to compensate for cable and signal transit delays between pickup, processing, and kicker stages.

The output stages of the system comprise a fast 8-bit D/A stage running at the 250 MHz beam-crossing rate and an output modulator. The calculated DSP output is a base-band correction signal, while the longitudinal kicker structure is designed to operate in the 1.00–1.25 GHz range. An output-modulator function is provided that transfers the base-band D/A signal into modulation on a kicker oscillator signal. The kicker signal originates as a carrier at 2.25 times the rf frequency (1125 MHz, obtained by dividing the rf signal by 4 and multiplying the resulting synchronous 125 Mhz by 9 in a step-recovery diode and band-pass filter). The 1125 MHz carrier is quad phase-shift key (QPSK) modulated at the 500-MHz bucket crossing rate, which results in a strong 1 GHz component as well as higher side bands. The QPSK signal is then amplitude modulated by the output D/A, resulting in a kicker signal which can span the 1.00–1.25 GHz range of the longitudinal kicker.

The power amplifier stage is a 500-watt commercial unit. The longitudinal kicker is a wide-band drift-tube structure comprising two drift tubes and associated delay lines. It has been designed by the Beam Electrodynamics group at LBL.\(^5\)
Table 1 contains a summary of the system design and performance specifications as configured for the ALS installation. Figure 3 is an oscilloscope photo of the base-band kick signal and kicker-drive signal for five consecutive bunches 4 ns apart.

Table 1
PHASE DETECTOR RESOLUTION

<table>
<thead>
<tr>
<th>PARAMETERS</th>
<th>VALUE</th>
<th>UNITS</th>
<th>COMMENTS</th>
</tr>
</thead>
<tbody>
<tr>
<td>Front-end detection frequency</td>
<td>2998</td>
<td>MHz</td>
<td>6 × rf</td>
</tr>
<tr>
<td>Resolution</td>
<td>.25</td>
<td>Deg</td>
<td>at 500 MHz</td>
</tr>
<tr>
<td>Sampling rate</td>
<td>DC-250</td>
<td>MHz</td>
<td></td>
</tr>
<tr>
<td>Bunch-to-bunch isolation</td>
<td>~30</td>
<td>dB</td>
<td>250 MHz sampling rate</td>
</tr>
<tr>
<td>System dynamic range</td>
<td>48</td>
<td>dB</td>
<td></td>
</tr>
<tr>
<td>Number of bunches</td>
<td>1–256</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Downsampling factor</td>
<td>1–31</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Nominal filter execution</td>
<td>900</td>
<td>rs</td>
<td>4 tap FIR, no options</td>
</tr>
<tr>
<td>Bunches controlled at ALS</td>
<td>1–92</td>
<td></td>
<td>Depends on downsampling factor and filter complexity</td>
</tr>
</tbody>
</table>

OPERATIONAL AND SIGNAL PROCESSING SOFTWARE

The processors of the quick prototype are controlled by a JTAG serial test port connected to an external PC which runs the AT&T multi-processor development system. With this configuration feedback algorithms may be coded and downloaded to the DSPs and data may be sent in either direction over the JTAG link, providing a general-purpose user interface to configure and run the system.

The processors manage all basic high-level operations under user control via the JTAG port. These operations include loading the downsampler lookup table, loading the bunch and turn-counter preload values, and configuring programmable elements of the system. The sequence is outlined in the flowchart of Fig. 4. The DSP clears the memories of the lookup table and hold buffer, loads the lookup table, and loads registers with preload values for the group and turn counters. Once the general state of the board is set, the AT&T 1610 processors compute correction signals via a feedback filter algorithm.

The initialization process uses several external C-language programs to create accelerator-specific download programs. One program uses information on the number of rf buckets in the ring, and a suggested downsampling factor to create the lookup table for the downsampler. The user can specify arbitrary sampling patterns or control all bunches within the timing constraints of the feedback program. With this flexible design the prototype hardware can run at many accelerator facilities, using software to specify the operating conditions.

A second set of user C programs calculate feedback filter characteristics based on desired gain and phase-shift characteristics of the feedback path. Most feedback algorithms we have used to date are finite impulse response (FIR) filters of 2–6 taps in length. Such a filter is computed using a general-purpose FIR feedback program, with filter coefficients tailored to the accelerator system. It is possible to have several filters or coefficient sets in the DSP memory at once, and use the bit I/O signals to specify a feedback filter on a bunch-by-bunch basis. Such flexibility can allow special injection filters, or the sweeping of a filter band pass to match a moving synchrotron tune if required.
Fig. 4. Flowchart of the DSP filter program. Execution of this program takes 41 25-ns cycles (1025 ns total). The program implements a 5-tap FIR filter, and tests the tag bits of every data word.
As these feedback filters process information on a bunch-by-bunch basis, the feedback task requires synchronization of input and output data. Figure 5 illustrates data flow in a DSP filter program. Input data 16 bits wide is read from the parallel port. The eight most significant bits of this word are data itself; the rest of the word is a code that is unique for each group of bunches. Using this bunch information the program builds a data structure that keeps a set of pointers associated with each bunch. This structure includes pointers to the end and the beginning of data storage, a pointer to the oldest data in the storage, and a pointer for the set of filter coefficients for each bunch. The program computes a filter output with each input sample, writes this value to the PIO output buffer, and then waits for the next interrupt to process data for the next bunch.

The DSP program execution time determines the total number of bunches the system controls, as the program must complete before the next data is selected by the downsampler. We can estimate the maximum number of controlled bunches at the ALS by following the equation \( N_B = DS \times \frac{4 \tau_{rev}}{\tau_{ex}} \), where \( N_B \) is the number of controlled bunches; \( DS \) is the downsampling factor; \( 4 \) is the number of parallel DSP processors; \( \tau_{rev} \) is the revolution time; and \( \tau_{ex} \) is the execution time. As a four-tap FIR filter takes 900 ns to execute, a downsampling factor of 31 allows control of up to 92 bunches at ALS.

The execution time of 900 ns/loop includes 150 ns for the filter output calculation itself plus overhead. This overhead includes waiting for an interrupt, identifying a bunch, and updating the oldest value of data. Additional filter taps require only one more 25 ns multiply-accumulate instruction in cache.

The general-purpose nature of the software-based feedback allows the user to select the best filter for each particular feedback task.\(^6\) The selection of filters we have developed include pure delay, differentiator, band-pass FIR, IIR filters, and linear quadratic gaussian (LQG) filters. Programmable coefficients of these filters allow the adjustment of the filter phase shift so that the overall feedback loop will have the nominal 180 phase shift at the synchrotron frequency.

The general-purpose structure of DSP signal processing allows the quick prototype to perform a wide range of research and diagnostic algorithms that can take long time records reflecting the behavior of individual bunches in a ring, excite the beams with a particular excitation pattern, or flip the feedback from negative to positive for selected bunches. Such flexibility is useful for machine physics and diagnostic purposes. Figure 6 shows an open loop time record of 8 bunches from the ALS. The phase-oscillation data is taken from 48 bunches with a 4-ns bunch spacing for 1150 samples per bunch (19 ms). This algorithm uses the DSP memory as a multichannel oscilloscope that can be triggered on any event by using one of the bit input/output signals. The time data can be processed off-line to provide frequency data, such as power spectrums of bunch motion or bunch-to-bunch transfer functions.\(^8\)\(^9\)

**SUMMARY**

A general-purpose DSP-based longitudinal feedback system has been developed as a component of the PEP-II R&D program. The prototype contains all of the essential components required for the PEP-II system. This system is installed at the Advanced Light Source at Lawrence Berkeley Laboratory and is used to develop techniques to control longitudinal coupled-bunch instabilities. The signal processing and software is complete, and the ALS feedback power amplifier and longitudinal kicker will be installed during the fall of 1993. The quick prototype has been used to study open-loop longitudinal motion of the ALS beam and demonstrated real-time computation of feedback signals. Closed-loop operation of the system is expected in early 1994.
Fig. 5. DSP data flow. The 16 bit input word is made of two parts: the upper byte is data itself, and the lower byte is the downsampler tag. Tag bits define an offset from a base address where pointers for bunch information are stored.
Fig. 6. Time record of 8 consecutive bunches in the ALS, showing multi-bunch synchrotron oscillations.
ACKNOWLEDGMENTS

The authors thank the ALS staff of LBL for their hospitality and interest in this hardware development program, and thank the SLAC PEP-II Group and the Technical division for their support. The authors particularly appreciate the help of Bill Roster and Greg Dalit in the design of the mechanical packaging and cooling of the quick prototype system, and the help of Paul VaVra for the QPSK circuit development.

REFERENCES