Data Acquisition and Control for the LCLS Pixel Array Detector

Marianne S. Hromalik, Hugh T. Philipp, Lucas J. Koerner, Mark W. Tate and Sol M. Gruner

Abstract—A data Acquisition (DAQ) and control system is being developed for a pixel array detector that will be used for a single-particle scattering experiment at the Linac Coherent Light Source (LCLS). The experiment requires that sub-picosecond pulses of 8keV X-rays are scattered off single particles 120 times a second. The scattered x-rays are converted to charge in a 2-dimensional pixelated diode array. The charge is integrated on to the 760x760 pixel detector readout chip and digitized in-pixel. The full detector is composed of tiled 185x192 pixel readout chips. The DAQ and control system provides low-level control of the integration and read-out processes, sets the detector mode of operation, addresses the pixels and transfers the data to high-speed local storage. Low-level data processing such as re-ordering, frame formatting and data averaging may also be required before data transfer. The DAQ and Control system is designed in a hierarchical and modular manner using a Xilinx XC4V100FX FPGA based DAQ board and is user controlled and monitored in software using master/slave command handshaking across the PCI Express bus. This allows for a very customized yet interactive and flexible system while sustaining a data throughput to disk in excess of 1 Gbps. Full speed detector control and data acquisition have been achieved for a single module chip CMOS ASIC (192x185 pixels) using the system described. Simultaneous to data acquisition, the DAQ and control system will also provide low-latency data transfer to a remote massive storage system on 10 GbE.

I. INTRODUCTION

X-ray diffraction has long been used to determine the structure of proteins crystalline form[1]. The Single Particle Scattering Experiment due to be conducted at the LCLS XFEL (X-ray Free Electron Laser), however, seeks to record diffraction images of single by injecting them into the path of an intense X-ray pulses. This eliminates the sometimes difficult step of protein crystallization. The experiment is conducted in a UHV environment where 10-200fs 8 keV X-ray pulses are fired at the particle 120 times a second. The X-rays are diffracted before the sample is destroyed; it is this diffracted image that is detected [1,2]. The short duration of the diffracted pulse necessitates the use of a charge integrating detector pixilated over a large enough area to allow 3D reconstruction of the sample.

The LCLS PAD is specified to be a 760x760 pixel integrating detector subdivided into a 4 x 4 tiled sub-unit array with pixel size of 110 μm², a full-well depth >2000 photons and the ability to differentiate between 0 and 1 photons per pixel [3].

The front-end of each pixel may be individually programmed using in-pixel memory to have one of two gain (voltage/charge) levels. Since the envelope of the scattering data is expected to be largely reproducible, this effectively allows for an adaptive imaging system with a tailored dynamic range capable of handling areas of high-intensity flux and areas where the average detected flux may be less than 1 photon/pixel. The output of this front-end is digitized in-pixel with 14-bit resolution.

At 120 Hz operating frequency, produces a sustained data stream of 1.14Gbps to be read-off the detector and onto disk storage. The reconstruction algorithms used to extract a particle's 3D structural require the recording of up to several million 2D diffraction images. At 120 Hz, this is equivalent to several hours and several terabytes of data for each experiment.

<table>
<thead>
<tr>
<th>Detector Mode</th>
<th>Purpose</th>
<th>Requirements</th>
</tr>
</thead>
<tbody>
<tr>
<td>Reset</td>
<td>Re-position detector</td>
<td>User access to stored readings / Detector shut off</td>
</tr>
<tr>
<td>Setup</td>
<td>Set detector defaults</td>
<td>User able to upload Gain Map / User able to change on-board reference Voltages/ currents</td>
</tr>
<tr>
<td>Charge Injection</td>
<td>Repeateable in-pixel gain testing</td>
<td>Supply digital timed input pulses to charge injection mechanism / Read out and store status image frame</td>
</tr>
<tr>
<td>Dark Image Acquisition</td>
<td>Determine pixel offsets</td>
<td>Reading taken with no input signal and read out as status image frame / Should not interrupt normal beam operation</td>
</tr>
<tr>
<td>Gain Data Acquisition</td>
<td>To determine gain/pixel</td>
<td>Reading taken with external “flat-field” signal and read out as status frame</td>
</tr>
<tr>
<td>Data Acquisition</td>
<td>Record diffraction image</td>
<td>Reading taken 1ms after event trigger received from beamline and read-out as data frame</td>
</tr>
</tbody>
</table>

The detector also requires calibration controls to determine and track pixel gain and offset values. In-pixel charge-injection circuitry also facilitates a repeatable experiment to monitor pixel integrity over time. The DAQ and control System enables the detector to take these measurements, as required, and integrate them with the appropriate image data. Low frequency access to live processed diffraction images is required, either at the local control station or remotely, as well as the ability to change on-board reference voltages, gain mappings and integration duration settings. The DAQ and control System can put these controls into effect within one frame acquisition time of the user command. This facilitates live control of the experiment by allowing the user to monitor the data being collected and make adjustments in real-time.
The DAQ and Control requirements for this detector are summarized in Tables 1 and 2.

The design and implementation of the Data Acquisition and Control system for the LCLS Pixel Array detector is presented here. It describes a self-contained, flexible, user-friendly system for collecting, processing and storing data; controlling the particulars of detector operation; and maintaining detector calibration. It is designed as a networked system with transparent high-speed control and storage to suit the requirements of the Single Particle Scattering Experiment.

### Table 2: Speed and Storage Requirements for DAQ and Control System

<table>
<thead>
<tr>
<th>Property</th>
<th>Requirements</th>
</tr>
</thead>
<tbody>
<tr>
<td>Image Frame Rate</td>
<td>120 Hz</td>
</tr>
<tr>
<td>Frame Size</td>
<td>760 * 760 * 14 bits = 8.1 Mb</td>
</tr>
<tr>
<td>Viewable Image Rate</td>
<td>5 Hz</td>
</tr>
<tr>
<td>Local Storage</td>
<td>14 hrs DAQ – 6TB</td>
</tr>
<tr>
<td>User Control Latency</td>
<td>8.3 ms</td>
</tr>
</tbody>
</table>

II. THE PIXEL ARRAY DETECTOR

The LCLS Pixel Array Detector is a two layer device consisting of a fully-depleted, high-resistivity, silicon diode detector layer, solder bump bonded to pixilated CMOS ASIC. Each pixel absorbs incident 8KeV X-rays in the diode layer that are scattered by a particle exposed to a focused X-ray laser pulse of 10-200 fs in duration. The absorbed x-ray is converted to charge in the diode and integrated for duration determined by user control in the CMOS ASIC. It then ‘holds’ the integrated voltage using a Sample-and-Hold Mechanism.

The resulting analog voltage is then digitized within each pixel simultaneously by means of a comparator and an externally supplied ramping voltage (Fig 1).

Fig. 1 Pixel Architecture of LCLS PAD.

The slope of the ramp is synchronized to the controller-supplied Analog to Digital Clock which clocks $2^{14}$ times to cover the entire integrated analog voltage range. The digitized voltage is read out serially from the counter when the pixel is addressed.

III. CONTROL AND DAQ LOGICAL ARCHITECTURE

The Data Acquisition and Control System is comprised of the FPGA design implemented on a PCI-SYS development board and controlling software running on a PC platform. These communicate across the PCI Express bus with two distinct links used in different master/slave configurations for control and DAQ. As control and DAQ are designed to operate in parallel with and independently of each other, they will be described separately.

---

**Fig 2.** A block diagram of the logical architecture of the LCLS PAD Control and DAQ system. Viewed from top to bottom the control and DAQ software communicates with the FPGA across the PCIe bus. On the left, the software acts as master in the configuration of the controls to be sent to the PAD while on the right, the FPGA DAQ system acts as master to store the PAD data to disk.
A. LCLS PAD Control

The control system is designed to allow the user easy and comprehensive access to the versatility of the PAD. Control options can be adapted to meet experimental needs and can be adjusted on the fly during an experiment either directly through the Graphical User Interface (GUI) or through a controlling script. The script is automatically generated by the software in response to user options within the GUI.

Examples of the user control options are shown in a screenshot of the GUI (Fig. 3). The underlying C++ software interprets these settings and translates them to a series of bits that the FPGA uses as control flags. These ‘flags’, along with associated values such as integration period and number of required rows for read-out, are written to the FPGA control register across the PCIe bus. As shown in Fig. 4, the software also converts the chosen reference voltages into appropriate hexadecimal values and uploads the user-determined gain mapping file. These are then transferred to the appropriate registers within the FPGA memory. A copy of all the settings for a given DAQ cycle is stored on the controlling machine in a settings file with a unique file ID. This ID is also written to the FPGA control register to be bundled with the data acquired for that cycle. In this manner, the data can be correlated with the appropriate settings file during post-analysis.

In addition to this, the user has the option of viewing the data collected by the PAD DAQ system as images as they are received. This display of data images after basic post-processing is conducted at low-frequency (< 5 Hz) but will allow the user to view the status of the experiment in real time. The software control system (Fig 4) is responsible for this process. Based on a software timer and in synchronization with interrupts received by the DAQ system, the control software queries the data in RAM, performs basic image processing and displays it to the viewer.

The FPGA control memory structure therefore consists of three separate Base Address Registers (BARs) – the Control Register that holds control flags and associated values, the Gain Mapping Register and the Reference Voltage/Bias Current Register. The software writes to these registers asynchronously for any FPGA activity. It sends information to the appropriate register and ‘latches’ this information by asserting and de-asserting a bit within the Control Register flags to inform the FPGA that the control memory structure contains new data. In this way, user control in no way interferes with either the FPGA low-level control or the DAQ process.

The PAD control must also be synchronized with the arrival of the beam. A trigger supplied by the LCLS is therefore used to synchronize the software and hardware within the control process. When the trigger arrives (nominally 1ms before the arrival of the beam) the FPGA control decoder module (Fig. 2) decodes the flags within the control register to determine what options have changed. These are used as inputs to the FPGA controlling state machine and to program the on-chip timers. This allows the State Machine to send the appropriate digital signals to the PAD at the correct time. All control options including changes to the PAD operating mode are put into operation upon the arrival of the following trigger

![Fig. 3 Screenshots of the Graphical User Interface allowing the user easy control over experimental settings.](image)

![Fig. 4 A flowchart of the software control system used to translate and transfer user options to the FPGA control system.](image)
and before the next data acquisition sequence. Programming the on-board potentiometers and the PAD gain-mapping are exceptions to this. These are executed after the following trigger but after the next DAQ sequence due as these processes take longer than 1ms to be completed. A timing diagram of the FPGA control process is shown in Fig 5.

**B. LCLS Data Acquisition**

All digitization of converted X-rays occurs in parallel and within each pixel. Upon completion of this process each of the 562500 pixels contains 14 bits of data. To reduce digital noise during the X-ray detection process, it is desirable that the high-speed read-out clock not be operational during the analog processes of detection, integration and digitization. As such all the digital data must be read out before the following beam arrives. To reduce this read-out clock speed, 8 parallel lines are read out from each PAD tile resulting in 144 data lines input to the FPGA.

As shown in Fig. 6, the PAD is addressed by row and column and the data read out serially from the selected pixel. To allow for 8-bit parallel read-out, each PAD tile is subdivided into banks of columns 26 pixels in width. The same row and column are addressed in each of the banks in each of the PAD tiles simultaneously and the data from the 144 addressed pixels is then read-out.

For a single module design that is being used to test the system, the data is held on the FPGA itself or on on-board RAM as a buffer during read-out. The pixel data is then re-ordered into a coherent frame, bundled with the appropriate settings file ID, a file ID and time stamp, and written across the PCIe to the data storage system.

Buffering is required for two reasons. First, data is read off the PAD in 3.1 ms to allow time for control setup, X-ray detection and digitization to occur without noise contributions from the high speed read-out clock. For the tiled array, this results in read-out data through-put in excess of 3.6 Gb/s. As the DAQ and storage system is remote from the PAD, transfer to storage can occur while X-ray detection and digitization of the following frame are taking place. As such 8.3ms are available for data transfer to storage reducing the disk write speed to 1.33 Gb/s. The second reason that a data buffer is required is that optional dark image offsets may be subtracted from the acquired frame before transfer to storage. Also, the user may choose to average many acquired frames before transfer.

---

**Fig 5.** Timing diagram showing stages of FPGA low-level control of the PAD with time running from left to right. Control at this level is synchronized with the external trigger pulse that nominally arrives 1 ms before the beam. From left to right, the FPGA first interprets the controls sent by software and sets appropriate inputs to its State Machine and timers. It then controls the integration and digitization processes upon arrival of the beam. After this the data is read off at maximum transfer speed of 3.2 Gb/s and finally the Gain and Reference voltages are programmed into the PAD.

**Fig 6.** Banked column read-out addressing system of the LCLS PAD. 8 columns are addressed simultaneously along with one row. The 8 pixels thus addressed are then clocked in parallel to retrieve their stored data.
In the final design, however, reordering of the data into a coherent frame will not occur within the FPGA. Instead, the FPGA will use Scatter/Gather Direct Memory Access (DMA) to the Storage PC RAM to write the acquired data to the correct position in RAM so that it can be transferred to disk as a sequentially occurring frame.

In both the present case and the final design, the buffered data is transferred simultaneously at 1.3 Gb/s to the Storage PC RAM via DMA writes across the PCIe bus (Fig 2). In direct contrast to the control architecture the FPGA firmware acts as master to the DAQ software process. It writes the acquired and processed frames asynchronously to PC memory and interrupts the running software DAQ process to inform it when a full image frame has been transferred to RAM.

The software Interrupt Service Routine (ISR) is then responsible for transferring the data from RAM to storage disk. In the single module test system, this is operating-system-controlled as data-throughputs are relatively low for a single PAD tile. In the final design however, the ISR will consist of function calls to a custom disk-drive controller that allows for data transfer to disk at the required sustained 150MB/s sequential write speed. In this manner, there is a constant stream of PAD image data to disk.

It should be noted here, that unlike many other data intensive DAQ systems for detectors [4,5], there is no data reduction via triggering or any other mechanism in the LCLS Single Particle Experiment. All the data is required for post-analysis resulting in the entire frame being transferred from the PAD, processed, stored to memory and then transferred to disk. As the experiment is designed to be continuous, this results in a maximum data throughput of 3.6 Gb/s from the PAD to the DAQ card and a minimum throughput of 1.33 Gb/s from the DAQ to memory and from memory to storage.

**IMPLEMENTATION**

**A. Control and DAQ**

The development of the LCLS pixel array detector project with all its supporting electronics, control and DAQ systems is currently being fabricated. A block diagram of the physical implementation of a single module detection system (i.e. a single PAD tile) with attendant electronics, control and DAQ, is shown in Fig. 7.

The single bump-bonded tile with both detector and CMOS ASIC layers is wire-bonded onto a daughter PCB connected to a ‘motherboard’ from which it receives power, grounding, control and voltage references. The daughterboard contains a single buffer chip responsible for buffering the 8 data lines leaving the PAD and connects to the backside of the motherboard via a raised SAMTEC SAM160QTH connector.

Digital potentiometers on the motherboard set reference voltages and bias currents for the PAD and are programmed by the FPGA control system. A ramp generation circuit is also located on the mother board and provides a synchronized ramp during digitization. The PAD communicates at present with the remote control and DAQ card via a SAMTEC QSE-060 connector across 3m of high speed parallel cable.

![Fig 7 LCLS PAD DAQ and control implementation. Here the PAD and its supporting circuitry exist as a unit in High Vacuum within the experimental hutch. The control software, DAQ, control board and storage all reside remotely on a networked PC-based workstation. This communicates with the PAD across LVDS cabling and to massive storage via a 10 GE connection](image)

In the fully-tiled system, one motherboard will connect to 8 daughter boards on which there are 16 fully bonded PAD tiles. It will comprise a ramp generator circuit and several programmable potentiometers for each detector tile. The FPGA-based control and DAQ card will communicate with this board via high-speed multiplexed LVDS signaling. This not only greatly reduces the number of wires connecting the two remote devices but also improves noise immunity along...
the cable length. This allows for greater physical separation between the control system and the experimental hutch.

The PAD tiles and their attendant electronic support boards (daughter boards and mother boards) will be placed in a high vacuum chamber where the single particle scattering experiment will take place. The controller however is remote from this hutch and is a PC based system housing the FPGA-based DAQ and control PCIeSYS100FX card.

An FPGA was used as the heart of the DAQ and control system because it offered true parallelism of operation and flexibility of design [6]. The control and DAQ processes could occur at high speeds simultaneously and independently of one another. Its high IO count also facilitated parallel data output from the PAD thus reducing the required read-out clock speed. The PCIeSYS100FX PCI Express card contains a Xilinx Virtex IV XC4V100FX FPGA. This FPGA had the added functionality of built in Rocket IO pins for communication with the external trigger system as well as a PCI Express core to enable the board to communicate freely with its PC based host.

The PCIeSYS100FX card at present sends control data and receives data though a built-in SAMTEC QSE-060 high speed parallel connector interface. An add-on board converting the outgoing and incoming LVDS signals into LVCMOS digital signals is under development for the fully tiled system. The DAQ and control card maintain independent communication with the control and DAQ software across the PCI Express Bus. The independent operation of each of the transmit/receive links on the PCIe fabric allow this architecture without any loss of speed.

At present a custom-made C++ software GUI accepts user input and communicates with the FPGA in order to control the PAD and receive data from it. In later versions this will be upgraded to an EPICS based software GUI to allow easy network interfacing and be fully compliant with other LCLS computing systems.

The local storage system itself is a PC-based Conduant BigRiver Rx00 data recording system capable of sustained simultaneous reads and writes to disk of 175 MB/s. It contains 12 TB of local storage – sufficient for almost 24 hours of continuous full-speed data acquisition. Data is also designed to stream off this system to massive long-term storage via a 10 Gb Ethernet connection.

B The Event Trigger

The external trigger synchronizes all controls to the PAD. It will be supplied externally by LCLS beamline control to arrive 1ms before the beam with a jitter of less than 1us. It is provided using the Event Generator Board EVG-TREF-004 which transmits events as 8-bit event code words with accompanying 8-bit timestamps. These are 8-10 B encoded and transmitted using a Xilinx Virtex FPGA based transfer protocol across fiber cabling.

The PCI-SYS100FX board can receive data under this protocol through its SFP connectors onto its RocketIO inputs. There are also in-built cores within the resident XC4V100FX FPGA to ‘lock’ onto the incoming signal, extract the event trigger code and decode it. The event code will then be used to internally generate a trigger pulse while the time-stamp will be bundled with data to be stored. The event receiver module that accepts and decodes the inputs received from the ECG-TREF-004 is under development.

TESTING AND RESULTS

An full-sized ASIC without a bump-bonded diode detector layer has been mounted onto a daughter board that connects to a mother board for testing. These systems communicate with the PCI-SYS100FX board resident within a high-end desktop across 3m of high-speed SAMTEC EQCD cabling. The FPGA on the controller board communicates fully with the control and DAQ software locally running on the desktop in a Visual C++ environment running under 32-bit Windows XP. The PAD ASIC tile was tested at full 25 MHz read-out speed and framed at 120 Hz. 8-bit parallel data was successfully read off under these conditions.

The software and hardware controls were also successfully tested under stand-alone conditions and data was streamed to PC RAM at full 3.2 Gbps in bursts. Data was streamed to disk at a much lower rate of 14 MB/s due to the standard desktop configuration used without the custom StreamStor disk drive controller. The data rate however was sufficient for the rate at which data was supplied by a single PAD tile.

Further development and testing of a networked version of the control software, as well as testing a fully bonded PAD tile under vacuum conditions, is currently underway. Further features such as full disk-write speed, EPICS-based control interfaces, software data transfer to storage across 10GE and LVDS digital communication with the PAD module will be implemented in the DAQ and control of the fully tiled LCLS PAD.

REFERENCES


IEEE NUCLEAR SCIENCE SYMPOSIUM and MEDICAL IMAGING CONFERENCE

Hawaii 2007
October 27 - November 3
Honolulu, Hawaii, USA
Hilton Hawaiian Village Beach Resort & Spa

N25: Data Acquisition and Analysis Systems II

Wednesday, Oct. 31 13:30-15:30; in Coral I

Session Chair: Alberto Aloisio, University of Naples 'Federico II' and INFN

Show All Abstracts

N25-1: (13:30) A Flexible AdvancedTCA Based Sampling ADC System for Multimodality Positron Emission Tomography

A. B. Mann$^1$, I. Konorov$^1$, S. Paul$^1$, V. C. Spanoudaki$^2$, S. I. Ziegler$^2$

$^1$Physik-Department E18, Technische Universitaet Muenchen, Garching, Germany
$^2$Nuklearmedizinische Klinik und Poliklinik, Klinikum rechts der Isar, Technische Universitaet Muenchen, Muenchen, Germany

N25-2: (13:45) Design and Performance of the COMPASS Online Event Filter

R. Kuhn$^1$, T. Nagel$^1$, R. Konopka$^1$, L. Schmitt$^2$, S. Paul$^1$

$^1$Department Physik, Technische Universität München, München, Germany
$^2$Gesellschaft für Schwerionenforschung, Darmstadt, Germany

N25-3: (14:00) The LHCb Muon Control System: the DAQ Domain

V. Bocci$^1$, S. Cadeddu$^2$, M. Carletti$^3$, C. Deplano$^2$, A. Lai$^2$, R. Nobrega$^1$, D. Pinci$^1$

$^1$Sezione di Roma 1, Istituto Nazionale Fisica Nucleare, Roma, Italy
$^2$Sezione di Cagliari, Istituto Nazionale Fisica Nucleare, Cagliari, Italy
$^3$Laboratori Nazionali di Frascati, Istituto Nazionale Fisica Nucleare, Frascati, Italy

N25-4: (14:15) The PHENIX Experiment in the RHIC Run 7

M. L. Purschke

Physics Department, Brookhaven National Lab, Upton, New York, USA

On behalf of the PHENIX Collaboration

N25-5: (14:30) Data Acquisition and Control for a Pixel Array Detector for Single Particle Scattering at the Linac Coherent Light Source

M. S. Hromalik$^1$, H. T. Philipp$^1$, L. J. Koerner$^1$, M. W. Tate$^1$, S. M. Gruner$^{1,2}$

$^1$Department of Physics, Laboratory of Atomic and Solid State Physics, Cornell University, Ithaca, NY, USA
$^2$Cornell High Energy Synchrotron Source (CHESS), Wilson Laboratory, Cornell University, Ithaca, NY, USA

The Data Acquisition (DAQ) and Control System for a Pixel Array Detector (PAD) for Single Particle Scattering at the Linac Coherent Light Source (LCLS) is currently under development. The experiment will use femtosecond pulses of 8 keV X-rays to probe single particles 120 times a second. The scattered x-rays convert to charge in a 500 micron thick high-resistivity silicon diode layer. This is electrically coupled to a 760x760 pixel, CMOS-based, custom ASIC that integrates the charge and digitizes the result in-pixel. The DAQ and control system sets the detector mode of operation, provides low-level control of the image acquisition and read-out processes and transfers the data to high-speed local storage. Low-level data processing such as re-ordering, frame formatting and lossless compression may also be required before data transfer. The Control and DAQ system
is designed in a hierarchical and modular manner using a Xilinx XC4V100FX FPGA based development board while master/slave command handshaking across the PCI Express bus allows user control and monitoring in software. Such a design makes a customized yet interactive and flexible system possible while sustaining a data throughput to disk in excess of 1.1Gb per second on a single PC-based system. Full speed detector control and data acquisition have been achieved for prototypes of the final PAD using the system described. As the system is designed to operate both as a stand-alone system and to be compatible with the standard LCLS detector interface, the DAQ and control system will also provide low-latency data transfer to a remote massive storage system on 10 Gb Ethernet.

**N25-6: (14:45) Data Acquisition and Trigger System for the Gamma Ray Energy Tracking in-Beam Nuclear Array (GRETINA)**

J. T. Anderson¹, D. Doering², T. Hayden¹, B. Holmes², J. Joseph², S. Zimmermann²

¹High Energy Physics, Argonne National Laboratory, Argonne, USA
²Engineering, Lawrence Berkeley National Laboratory, Berkeley, USA

**N25-7: (15:00) A Spectroscopy System for High Event Rates from Pulsed Interrogations**

D. Wehe, H. Yang

Nuclear Engineering and Radiological Sciences, University of Michigan, Ann Arbor, MI, USA

**N25-8: (15:15) A Highly Integrated Low-Cost Readout System for the COMPASS RICH-1 Detector**

I. V. Konorov

Physik Department, Technical University of Munich, Garching, Germany

On behalf of the COMPASS RICH upgrade group