SLAC - PUB - 3503 November 1984 (T/E)

# MICROSTORE - THE STANFORD ANALOG MEMORY UNIT

J. T. WALKER, SOO-IK CHAE

Stanford Electronics Laboratory, Stanford University, Stanford, California 94305

S. SHAPIRO, R. S. LARSEN

Stanford Linear Accelerator Center, Stanford, California, 94305

# Abstract

An NMOS device has been developed which provides high speed analog signal storage and readout for time expansion of transient signals. This device takes advantage of HMOS-1 VLSI technology to implement an array of 256 storage cells. Sequential samples of an input waveform can be taken every 5 ns while providing an effective sampling aperture time of less than 1 ns. The design signal-to-noise ratio is 1 part in 2000. Digital control circuitry is provided on the chip for controlling the read-in and read-out processes. A reference circuit is incorporated in the chip for first order compensation of leakage drifts, sampling pedestals, and temperature effects.

#### 1. Introduction

The development of the SLAC Microstore chip, a high speed wave form sampling device, as an adjunct to the use of drift chambers in particle physics was motivated by a) a desire to store multiple arrivals on a signal wire throughout the drift time, thereby improving the pulse pair resolution, b) a desire to improve the accuracy of "leading edge" timing by digitising many times during the pulse wave form, c) a desire to measure the position along the drift chamber wire by charge division without having to resort to separate ADC measurements, thus saving the cost of additional electronics, d) a desire to achieve a better dynamic range and resolution than is presently available with flash ADCs and better speed than is available with CCDs, and e) a desire to lower costs and improve packaging density thereby making possible the implementation of large systems.

By completely digitising the charge wave form, one improves the pulse pair resolution of the drift chamber. This is essential in the high multiplicity environment at colliding beam machines. Preservation of the analog information allows measurement of the spatial coordinate along the wire via charge division and in special situations, a measurement of dE/dX along the particle trajectory.

In the limit of infinitely fast electronics, capable of digitising the arrival time of every arriving electron, one can achieve a  $1/\sqrt{N}$  improvement factor in the position of the center of gravity compared to first electron timing, where N is the number of electrons contributing. Monte Carlo studies have shown,<sup>1</sup> that while the above improvement, in fact, does not materialise due to ionization fluctuations, with realizable electronics, timing equal to or slightly better than first electron timing can be achieved with fine time sampling at a lower overall charge gain. This has clear implications for longer device lifetimes.

\_

Recently, high energy physics experiments have been proposed and exist wherein the need for recording many tens of thousands of channels of analog information is required. This increase has been compounded by the application of charge division techniques in wire chambers for the measurement of spatial coordinates. To address this problem, three primary techniques have been employed; use of flash ADCs, CCDs, and fast sampling with discrete components.

Flash ADCs<sup>2</sup> exist now which digitise data at 50-100 MHs sampling rates with 6-7 bit accuracy. In conjunction with fast buffer memories these devices have been developed into large, albeit expensive system.<sup>3,4</sup> LEP experiments containing up to 75,000 channels of FADC's having 7 bits at 20 MHs using bilinear conversion to achieve 9 bit resolution are under investigation.<sup>5</sup>

CCDs have been used in the past<sup>6</sup> and will continue to be used in the future<sup>7</sup> to perform the same function. The latest commercial devices have 9 bit amplitude resolution at 50 MHs, with higher sampling rates available, at proportionately higher cost, by the use of device multiplexing.

Analog systems based on the storage of charge on discrete capacitors have been used successfully for large systems<sup>8,9</sup> and accuracies of up to 12 bits at 20 MHs with 8 hits per channel

storage capability have been achieved. These systems, in use at SLAC and PETRA are packaged in CAMAC, and achieve a density of about 600 channels per CAMAC crate. These techniques would be impractical to extend to the many tens of thousands of channels envisioned as needed at the SLC.

Encouraged by the success of the SLAC Microplex chip,<sup>10</sup> and using many of the same techniques, the Stanford Integrated Circuit Laboratory in collaboration with SLAC has produced a device which meets or exceeds the design goals set forth below. A 200 MHs analog storage device with an anticipated resolution of 11 bits, having 256 storage elements per channel has been designed using NMOS integrated circuit technology and will be described in this paper.

## 2. Design Goals

The SLAC Microstore chip is an analog storage device for very high speed sampling of analog pulse information. It has a high level of accuracy over a wide dynamic range. It is compact, uses little power, and is low in cost. The device provides for the cascading of a number of chips so that sampling can be extended to larger time periods, consistent with the charge storage time of the device. Finally, a reference cell is provided to eliminate, at least to first order, the effects of temperature, leakage drifts, and sampling pedestals.

Presented at the Nuclear Science Symposium, Orlando, Florida, October 31 - November 2, 1984.

<sup>\*</sup> Work supported in part by the Department of Energy, contract DE-AC03-76SF00515, DARPA contract MDA-903-84-K-0062, and SRC contract 84-01-046.

The detailed design specifications are presented here:

|   | Signal inputs:               | 2 per package, 128 storage el-<br>ements per input which can<br>be combined to form 1 input<br>with 256 elements.                                                                                         |
|---|------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|   | Input Signal Range:          | 0 to 2 V (or higher with spe-<br>cial drivers and reduced lin-<br>earity).                                                                                                                                |
|   | Bandwidth:                   | The equivalent RC following<br>time constant of the channel<br>with the sample gates open<br>is 1 ns. ( $C \simeq 1.0 \text{ pF}$ , $Rg \simeq 1k\Omega$ ). Typical signal input<br>bandwidth is 100 MHz. |
|   | Input load:                  | 20 pF capacitive for all 256<br>cells.                                                                                                                                                                    |
|   | Signal to Noise:             |                                                                                                                                                                                                           |
|   | Write Gate Width:            |                                                                                                                                                                                                           |
|   | Write Gate Risetime:         | sultant fall time must be con-<br>sistent with aperture time spec.                                                                                                                                        |
|   | Sample Point Uncertainty:    | Aperture $\leq 1$ ns, Jitter plus<br>differential propagation delays<br>in the fast gate $< 0.2$ ns.                                                                                                      |
|   | Storage Time:                | 10 ms with $< 10\%$ droop.                                                                                                                                                                                |
|   | Signal Outputs:              | 1 per 256 element channel, se-<br>rial analog multiplexed.                                                                                                                                                |
| • | Overall Gain:                | +0.5 typical.                                                                                                                                                                                             |
|   | Output Level:                | 0 to 1.2 V nominal differen-<br>tial.                                                                                                                                                                     |
|   | Readout Clock Rate:          | 1 MHz nominal.                                                                                                                                                                                            |
|   | Reset:                       | Analog reset provided for all cells.                                                                                                                                                                      |
|   | Differential Readout Buffer: | Each cell has a differential tran-<br>sistor connected to a reference<br>level to minimize threshold vari-<br>ations on readout.                                                                          |
|   | Dummy Reference Cell:        | Provided – one device.                                                                                                                                                                                    |
|   | Control Signal Levels:       | 0 and +3 V (TTL compatible).                                                                                                                                                                              |
|   | Cascading:                   | Input and Output capability is provided.                                                                                                                                                                  |
|   | Power:                       | + 5V DC at about 40 mA.                                                                                                                                                                                   |
|   |                              |                                                                                                                                                                                                           |

#### 3. Chip Design

The chip is implemented using primarily a 5  $\mu$  line width HMOS-I VLSI<sup>11</sup> process as used for fabricating 5 V NMOS logic to form an array of 256 storage cells. The cells are arranged to form a 16 × 16 matrix. The rows and columns are each addressed by a separate row and column clock. The column clock operates at one sixteenth of the sampling speed, while the row clock operates at the device sampling speed. Figure 1 is a block diagram of the Microstore chip. All of the read-in and read-out logic for operation of the chip is contained on chip, with the exception — of the fast write clock (row clock). The circuitry is designed using standard two phase dynamic shift register techniques and incorporates features permitting indefinite cascading of both the input and output processes. Therefore, the number of samples taken during a transient event may be extended across several chips, and similarly, the readout process may employ one moderate speed, high precision ADC for a large number of data channels without additional circuitry. This permits easy hybridization of the chips for increased channel density and short lead length.

The logic structure of an individual storage cell is shown in Fig. 2. Each storage capacitor C is connected to the common input signal via two FET switches  $Q_1$  and  $Q_2$ , one controlled by the row clock, the other by the column clock. Data is stored on C only on coincidence of the two clocks. Utilizing this arrangement, the row (fast) clock may be continuously running, while the second clock (column) sequence may be initiated by a separate start pulse upon detection of the input signal, or by a known time marker or trigger, synchronous to the arrival of the data.

An important feature of the memory cell arrangement is the use of metal lines to carry the fast clock signals to each cell with a minimum of dispersion and delay. We are, therefore, forced to carry the slow clock to the cells on polysilicon lines, which have longer delays. Because of the high speed of the row clock, each column needs to be addressed for  $16 \times 5$  ns or 80 ns. To effect this, and at the same time have stable column clock waveforms, each column is split in two, with each half driven by its own clock signal and its own clock driver. Figure 3 illustrates the relative timing of the two column-clock pulses to the row pulses. Thus, in the read-in sequence for rows 1-8, 9-16, row 1-8 will be in coincidence with  $\phi_{c1}$  giving  $\phi_{c2}$  time to turn on, stabilize, and be there for rows 9-16. We see, therefore, that the columnclock shift register has 32 half-cells alternately connected to the first or second group of row signals.

Control signals to the column clock shift register are:

INHS: Which, if low turns off the column clock drivers so that no cell can be turned on, preventing undesired data storage.

 $\phi_{1S}$ ,  $\phi_{2S}$ : The two phase column clock for driving the two sets of half shift registers,

**RESET**: For initialization of column shift register.

START: When low, puts a high level into the first cell of the shift register.

INHRI: Low for triggered writing or cascading of chips, high for continuous writing.

END: If cells are cascades, this signal's output occurs at the proper time to start data storage in the next chip.

Figure 4 is a simplified schematic of the standard two phase dynamic shift register using enhancement-depletion logic for the inverter stages. Figure 5 is a block diagram of the slow clock (column) input shift register. Figure 6 is a schematic diagram of the "Readout" shift registers logic.

Two further transistors  $Q_3$  and  $Q_4$  (Fig. 2) are provided in each cell to further enhance the accuracy of the device.  $\phi_{reset}$ provided to the gate of  $Q_4$  allows for the draining of the stored charge on C after data readout.  $\phi_{ihn}$  is provided to the gate of  $Q_3$  to ground the intermediate node between  $Q_1$  and  $Q_2$  so that any signal feedthrough from  $Q_1$  does not affect the readout of C. In this case,  $\phi_F$  and  $\phi_S$  must be off during the readout.



Fig. 1 A block diagram of the Microstore chip.

As shown in Fig. 2, each cell includes, in addition to the input section, a linear output stage. The depletion FET  $Q_5$ functions as a source follower which isolates C from the voltage transients which occur as a result of the multiplexer operation.  $Q_7$  acts as a current sink to bias  $Q_5$ .  $Q_9$  acts as a source follower which provides signal current to feed the multiplexers.  $Q_{11}$  and  $Q_{13}$  are the row and column switches respectively. Load resistors external to the Microstore chip set the bias current for  $Q_9$ .  $Q_5$ and  $Q_6$  are a carefully matched transistor pair, as are  $Q_7$  and  $Q_8$ ,  $Q_9$  and  $Q_{10}$ ,  $Q_{11}$  and  $Q_{12}$ , and  $Q_{13}$  and  $Q_{14}$ . The circuit formed by  $Q_7$  and  $Q_8$  is frequently called a current mirror. If they are truly identical, the current drawn by  $Q_8$  (that amount necessary to turn  $Q_8$  on hard enough to sink all of the current provided by  $Q_6$ ) will be matched by  $Q_7$ . The use of a fully differential readout amplifier provides first order compensation for process variations.

In order to provide for temperature compensation and compensation for leakage drifts and sampling pedestals, a reference cell is provided, once per chip, which is identical in design to the normal storage cell in the array. The cell is pulsed by INHS after all the data cells have received the data which they are to store. The resultant voltage,  $V_{ref}$  is now applied to the gate of transistor  $Q_6$  and provides a differential reference level which is coupled to the OUT line through  $Q_8$ ,  $Q_{10}$ ,  $Q_{12}$  and  $Q_{14}$ . Thus, the differential output now read between OUT and OUT will accurately represent the analog pulse being sampled.

The successful implementation of the circuit described above requires that careful attention be paid to the details of the cell layout. In particular, the chip layout must be arranged so that the fast clocks, the input signal line, the supply voltages, and the ground line all be metal traces, and these traces run predominantly in the horizontal direction (parallel to the rows). The slower clocks and the other control signal lines are run on polysilicon interconnect lines, which run predominantly vertically (parallel to the columns). Figure 7 is an Applicon plot of the cell layout wherein many of the details discussed in this section are easily seen.

Stray coupling capacitance between the signal input, the storage capacitor, and the signal output is minimized by keeping lines short, and preventing crossovers.

In the design of the output buffer and amplifier, the two halves must be well matched. This is accomplished by choosing a layout technique which will insure their matching independent of mask alignment accuracy. This is effected by maintaining a constant area of intersection of the polysilicon and the diffusion.

To minimize the size of the cell, the storage capacitor is laid out as a polysilicon layer with grounded metal above it, so that the polysilicon has capacitance both to the substrate and the ground metalization.

The parasitic source and drain resistances and capacitances of the input series pass transistors are minimized by making the layout as compact as possible. To minimize leakage currents, and thereby maximize charge storage time on the storage capacitor, the total diffusion area (of  $Q_2$ ) connected to the storage capacitor is minimized.

For reliability and high production yield, a 5  $\mu$  minimum feature size design was adhered to, with the exception of the channel (gate) length of the two FET switches  $Q_1$  and  $Q_2$ . These are 3  $\mu$  by 100  $\mu$  and 3  $\mu$  by 50  $\mu$  respectively. The 3  $\mu$  dimension is responsible for the low channel-on resistance (about 500  $\Omega$ ) of these FETs.



Fig. 2. Schematic diagram of an individual storage cell.



Fig. 3. Timing diagram showing the relative timing of the two column-clock pulses and the row- clock pulses.

# 4. Conclusion

The integrated circuit design described in this paper provides a number of significant advantages in the field of high speed analog data sampling and recording. These advantages are particularly useful for the readout of the large mass of information from high energy physics multiwire detectors. The use of NMOS integrated circuit technology allows an order of magnitude improvement in density, and makes practical the consideration of systems of very large size because of the availability, low power consumption and low cost of the integrated circuit. The accuracy of the analog samples recorded will be significantly improved because of the close matching between cells in the memory cell array. The power requirements will be significantly lower than flash ADC systems, and any circuitry required for





Fig. 4. A simplified schematic of the standard two phase shift register, with the inverting and noninverting drivers.



Fig. 5. A block diagram of the slow-clock input shift register.

drive purposes can be shared by many channels. The dynamic range, the sampling aperture and the accuracy are much improved over any previously discussed methods, eg. 11 bit resolution goal for this device compared to 6 bits for the FADC. The cost per channel in large scale systems will be significantly reduced. The figure of merit (speed  $\times$  dynamic range divided by cost) will yield an improvement of at least 100:1 compared to the current alternatives.

In a companion paper presented at this conference, results of actual measurements on the Microstore chip will be discussed. Simulation programs show that this device can be made to work at sample spacings on the order of 3 ns. Should one wish to push the design to even better performance by the use of standard fine line techniques, a slight redesign of the chip would be in order. The slow clock on the polysilicon lines now takes about 40 ns to stabilize. One can substantially decrease this time by placing the slow clock shift register and its drivers in the center of the array instead of at the bottom. The polysilicon lines would now be half as long and the resultant delay would be approximately one quarter of its former value. This pattern is routinely used in the design of dynamic memory chips.

## Acknowledgements

The authors would like to acknowledge the helpful assistance of Dr. John Shott and the Stanford University Integrated Circuits Laboratory for the fabrication of this device.





- 5 -



Fig. 7. An Applicon plot of the cell layout.

## References

- J. Vá Vra, Search for the Best Timing Strategy in High Precision Drift Chambers, SLAC-PUB-3131 (1983). J. Vá Vra, High Resolution Drift Chambers, CERN EF/84, October 16, 1984.
- S. Dhawan and K. Kondo, New Developments in Flash ADCs; IEEE Nuclear Science Symposium, <u>NS31</u> (1983) 821.
- 3. M. Calvetti et. al., NIM, <u>176</u>, (1980) 255-262.
- 4. G. Hanson, A New Drift Chamber for the Mark II at SLC, SLAC-PUB-3317 (1984).
- S. Dhawan, Report on the Minisession: "New Developments in Flash ADC Integrated Circuits," IEEE Nuclear Science Symposium, <u>NS31</u>, (1983) 826.
- R. Jared, D. Landis, and F. Goulding, Analog Signal Processing for the Time Projections Chamber, IEEE Nuclear Science Symposium, <u>NS29</u>, (1982) 57.

- 7. Lecroy Research Model 1861 64 Channel Image Chamber Analyzer.
- D. Bernstein, MSHAM-A Multihit Sample and Hold Multiplexer, IEEE Nuclear Science Symposium, <u>NS28</u>, 1 (1981) 359.
- W. Farr and J. Heintze, Drift Chamber Electronics for Time and Pulse Height Measurements With Multiple Hit Capacity, NIM, <u>156</u> (1978), 301-309.
- T. Walker, B. Hyams, S. Parker, and S. Shapiro, Development of High Density Readout for Silicon Strip Detectors, presented at the 3rd European Symposium on Semiconductor Detectors, "New Developments in Silicon Detectors," Munich, West Germany, November 14-16, 1983.
- 11. HMOS-I VLSI is understood to be that version of the N-MOS Technology which employs 700 Å thick gate oxide (compared to 1000 Å for normal NMOS), and 3  $\mu$  minimum channel length.