CERN is the largest laboratory for particle physics in the world. At the laboratory, the fundamental structure of matter is studied. Over the last 40 years, CERN has built a number of accelerators, which has enabled the study of particle collisions at ever increasing energies. From 2005 on, CERN expects to have a new accelerator available for experiments: the Large Hadron Collider (LHC), with a circumference of 27 kilometres. CERN is currently designing and constructing the experiments for this accelerator. ATLAS is one of the four approved experiments for the LHC. The ATLAS detector produces 40 TeraBytes/s of data. This data rate has to be reduced to 100 MBytes/s, a feasible rate for storage in a mass storage system. Only a fraction of all data is interesting. A computer system, called the trigger, selects the interesting data through real-time data analysis. The trigger consists of three subsequent filtering levels: LVL1, LVL2, and LVL3. LVL1 will be implemented using special-purpose hardware. LVL2 and LVL3 will be implemented using a Network Of Workstations (NOW). This designer’s project focuses on the ATLAS LVL2 trigger, a system that reduces the data rate from 100 GigaBytes/s to 1 GigaByte/s. Aiming at a cost-effective solution, the design is based on a NOW, built from commodity products: PCs interconnected by switched Fast and Gigabit Ethernet. A major problem is to make efficient use of the computing power available in each workstation. The computer programs for the trigger are intrinsically fine-grain: typically once per 4000 instructions, each workstation has to perform communication. If the communication and scheduling facilities of a standard operating system would be used to build the trigger, each workstation would have less than 30% of the time available for computation. To avoid this inefficient use of computing power, communication and scheduling facilities that are more efficient are required. The major contribution of this designer’s project is an infrastructure named MESH. MESH enables CERN to cost-effectively implement the LVL2 trigger. Furthermore, due to the use of commodity technology, MESH enables the LVL2 trigger to be cost-effectively upgraded and supported during its 20 year lifecycle. MESH facilitates efficient parallel processing on PCs interconnected by Ethernet. Over the years, efficient Input/Output (I/O) for workstation clusters focused on parallel computing has had much attention. In that context, fault-tolerance issues, such as fault confinement and failure behaviour, have either been ignored or given very little consideration. Since these faulttolerance issues are of major importance to the ATLAS trigger, the software systems resulting from such research could not be used. On the other hand, individual techniques, resulting from such research, have proven to be essential. This work combines the latest techniques to equip workstations with efficient I/O. It extends this research to the use of commodity hardware instead of specialised hardware. The I/O system is tightly integrated with an efficient special-purpose scheduler. Some of the latest scheduling techniques, developed in the context of parallel computing, have been adapted and improved. In addition to this, the developed system addresses the trigger’s fault-tolerance requirements, while maintaining an application interface with a high level of abstraction. CERN considers an Ethernet-based solution for LVL2 to be very promising, and has therefore developed an Ethernet-based prototype of the ATLAS LVL2 trigger. This prototype uses MESH as communication and scheduling infrastructure. CERN has recognised the importance of MESH for LVL2 – research to further exploit MESH has already been initiated.
|Qualification||Doctor of Philosophy|
|Award date||11 Feb 2003|
|Place of Publication||Eindhoven|
|Publication status||Published - 2003|