Ug_book ALICESoftware Manual
User Manual:
Open the PDF directly: View PDF .
Page Count: 546
Download | |
Open PDF In Browser | View PDF |
ALICE Internal Note/DAQ ALICE-INT-2010-001 ALICE DAQ and ECS Manual December 2010 ALICE DAQ Project ALICE ECS Project Preface iii Preface ALICE [1] is a general-purpose detector designed to study the physics of strongly interacting matter and the quark-gluon plasma in nucleus-nucleus collisions at the CERN Large Hadron Collider (LHC). ALICE will operate in several different running modes with significantly different characteristics. The experiment has been primarly designed to run with heavy ions beams, which are characterized by relatively low rates (interaction rates <= 10 kHz for Pb-Pb beams at design luminosity of L=1027 cm-2s-1), relatively short running time (order of few weeks per year) but very high multiplicity and correspondingly large event size. The requirements on the low level trigger selectivity are therefore relatively modest, whereas the trigger complexity is considerable and requires partial or full reconstruction in the high-level trigger. In addition, a large bandwidth DAQ together with efficient data selection and/or data compression in the High-Level Trigger (HLT) are required to collect sufficient statistics in the short running time available. In proton-proton or proton-ion running mode, the interactions rates are much higher than in heavy-ion runs (up to 200 kHz, limited by pile-up in the TPC detector), whereas the event size is small and the running time is typically of several months per year in pp mode. Therefore, the requirements on trigger selectivity is increased while requirements on trigger complexity and bandwidth are much reduced. The ALICE data-acquisition system has been designed to run efficiently in these different modes and to balance its capacity to record the very large events (several tens of MBytes) resulting from central PbPb collisions with an ability to trigger and acquire rare cross section processes. These requirements result in a readout capability of up to 40 GByte/s, an aggregate event-building bandwidth above 2.5 GByte/s and a storage capability up to 1.25 GByte/s to mass storage. The software framework of the ALICE DAQ is called DATE (ALICE Data Acquisition and Test Environment) and consists of a set of software packages described in the Part 1 of this guide. ALICE DAQ and ECS manual Preface iv The global control of the experiment is ensured by the Experiment Control System (ECS) and the ALICE Configuration Tool (ACT) which are described in Part 2. The standalone software developped for the Detector Data Link (DDL) via the DAQ Read-Out Receiver Card (D-RORC) is documented in Part 3. The Part 4 describes the Detector Algorithm framework (DA). The data quality monitoring is performed with the AMORE software package documented in Part 5. The monitoring of the DAQ system itself is performed by the Lemon package described in the Part 6. The Part 7 is dedicated to a description of the electronic logbook. This User’s Guide can be found in the ALICE EDMS: https://edms.cern.ch/document/1056364/ and on the ALICE DAQ web site [19]. The Authors F. Carena, W. Carena, S. Chapeland, V. Chibante Barroso, F. Costa, E. Dénes, R. Divià, U. Fuchs, G. Simonetti, C. Soós, A. Telesca, P. Vande Vyvre, B. Von Haller. ALICE DAQ and ECS manual Preface v Important note on software versions This user’s guide describes the behavior of the software versions listed in Table 0.1. When using different versions of the packages, it is recommended to read the associated release notes. Table 0.1 Software versions corresponding to this guide Package Version DATE 7.00 ECS 4.00 RORC library 5.3.8 AMORE 1.24 LEMON 2.15.0 ALICE DAQ and ECS manual vi Preface ALICE DAQ and ECS manual Contents vii Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . iii . . . . 1 2 3 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 . Part I DATE Reference Manual Chapter 1 DATE overview . . . . . . . . . . . . . . . . . . . . . . 1.1 ALICE data-acquisition architecture . . . . . . . . . . . 1.2 DATE overview . . . . . . . . . . . . . . . . . . . 1.2.1 Parametrization of the hardware configuration . . . . 1.2.2 Interactive setting up of the data-acquisition parameters 1.2.3 Run control . . . . . . . . . . . . . . . . . 1.2.4 Load balancing . . . . . . . . . . . . . . . . 1.2.5 Event monitoring . . . . . . . . . . . . . . . 1.2.6 Information reporting . . . . . . . . . . . . . 1.2.7 Electronic Logbook . . . . . . . . . . . . . . 1.2.8 Performance monitoring system . . . . . . . . . 1.2.9 Detector algorithms . . . . . . . . . . . . . . 1.2.10 Data Quality Monitoring . . . . . . . . . . . . 1.3 DATE architectural strategies . . . . . . . . . . . . . . 1.3.1 Protocol-less push-down strategy . . . . . . . . . 1.3.2 Detector readout via a standard handler . . . . . . 1.3.3 Light-weight multi-process synchronization strategy. . 1.3.4 Common data-acquisition services . . . . . . . . 1.3.5 Detectors integration . . . . . . . . . . . . . 1.3.6 DATE installation . . . . . . . . . . . . . . . Chapter 2 DATE configuration parameters . . . . 2.1 DATE site parameters . . . . . 2.2 Base configuration . . . . . . 2.3 Use of hostnames vs. IP addresses. ALICE DAQ and ECS manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 10 10 11 Contents viii Chapter 3 Data format . . . . . . . . . . . . . . . . . . . . 3.1 Conventions . . . . . . . . . . . . . . . . 3.2 Base header and header extension . . . . . . . . 3.3 Streamlined and paged events . . . . . . . . . . 3.3.1 Streamlined events. . . . . . . . . . . 3.3.2 Paged events . . . . . . . . . . . . . 3.4 Collider and fixed target modes . . . . . . . . . 3.5 The base event header . . . . . . . . . . . . . 3.5.1 eventSize . . . . . . . . . . . . . . 3.5.2 eventMagic . . . . . . . . . . . . . 3.5.3 eventHeadSize . . . . . . . . . . . . 3.5.4 eventVersion . . . . . . . . . . . . . 3.5.5 eventType . . . . . . . . . . . . . . 3.5.6 eventId . . . . . . . . . . . . . . . 3.5.7 eventTriggerPattern . . . . . . . . . . 3.5.8 eventDetectorPattern . . . . . . . . . . 3.5.9 eventTypeAttribute . . . . . . . . . . 3.5.10 eventLdcId and eventGdcId . . . . . . . 3.5.11 eventTimestampSec and eventTimestampUsec 3.6 The super event format . . . . . . . . . . . . 3.7 The complete file format . . . . . . . . . . . . 3.8 Decoding and monitoring on different platforms . . . 3.9 The Common Data Header . . . . . . . . . . . 3.9.1 Common Data Header version . . . . . . 3.9.2 Status and Error bits . . . . . . . . . . 3.10 The equipment header . . . . . . . . . . . . . 3.10.1 equipmentSize . . . . . . . . . . . . 3.10.2 equipmentType/equipmentId . . . . . . 3.10.3 equipmentTypeAttribute . . . . . . . . 3.10.4 equipmentBasicElementSize . . . . . . . 3.11 Paged events and DATE vectors . . . . . . . . . 3.12 Data pools . . . . . . . . . . . . . . . . . Chapter 4 Configuration databases . . . . . . . . . . . 4.1 Overview . . . . . . . . . . . . . 4.2 Information schema . . . . . . . . . . 4.3 The static databases . . . . . . . . . . 4.3.1 Terminology and assumptions . . 4.3.2 The roles database . . . . . . . 4.3.3 The trigger database . . . . . . 4.3.4 The detectors database . . . . . 4.3.5 The event-building control database 4.3.6 The banks database . . . . . . 4.4 Other centrally stored parameters . . . . 4.4.1 DATE globals. . . . . . . . . 4.4.2 DATE sockets . . . . . . . . 4.4.3 DATE detector codes . . . . . . 4.4.4 DATE Environment . . . . . . 4.4.5 DATE Files . . . . . . . . . 4.4.6 DATE Detector Files . . . . . . 4.4.7 DATE readout equipment tables. . 4.5 The database editor . . . . . . . . . . 4.6 Example of a DAQ system . . . . . . . 4.7 The programming interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 14 14 14 15 16 18 19 20 21 21 21 21 22 24 25 27 30 30 31 33 34 36 37 37 37 38 38 38 38 38 41 43 44 44 45 46 47 48 48 49 50 52 53 53 53 54 55 55 55 56 63 68 ALICE DAQ and ECS manual Contents ix Chapter 5 The monitoring package . . . . . . . . . . . 5.1 Monitoring in DATE . . . . . . . . . 5.2 Online monitoring and role name . . . . 5.3 Monitoring and Analysis in C/C++ . . . 5.3.1 Some simple examples . . . . . 5.3.2 The monitoring package files . . 5.3.3 Error codes . . . . . . . . . 5.3.4 The monitoring callable library . . 5.4 Monitoring by detector . . . . . . . . 5.5 Monitoring from ROOT . . . . . . . . 5.5.1 The ROOT system . . . . . . 5.5.2 Direct monitoring in ROOT . . . 5.6 The “eventDump” utility program . . . . 5.7 Monitoring of the online monitoring scheme 5.7.1 The monitorClients utility . . . . 5.7.2 The monitorSpy utility . . . . . 5.8 Monitoring configuration . . . . . . . 5.8.1 Creation of configuration files . . Chapter 6 The readout program . . . . . . . . . . . . 6.1 The readout process . . . . . . . . . 6.1.1 Start of run phases . . . . . . 6.1.2 Main event loop . . . . . . . 6.1.3 End of run phases . . . . . . . 6.1.4 Log messages . . . . . . . . 6.2 The generic readList concept . . . . . . 6.3 Using the generic readList . . . . . . . 6.4 The equipmentList library . . . . . . . 6.4.1 Synopsis of the equipment routines 6.4.2 Accessing the parameters . . . . 6.4.3 The function references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 7 The RORC readout software . . . . . . . . . . . . . . . 7.1 Introduction to the RORC equipment . . . . . . . . . 7.2 Internals of the RORC equipment . . . . . . . . . . 7.2.1 Event Identification . . . . . . . . . . . . 7.2.2 Data transfer mechanism of the RORC device . . 7.2.3 Elements to handle the RORC device . . . . . 7.2.4 Equipments to handle the RORC device . . . . 7.2.4.1 Equipment RorcData. . . . . . . . 7.2.4.2 Equipment RorcTrigger . . . . . . . 7.2.4.3 Equipment RorcSplitter . . . . . . . 7.2.4.4 Configuring the RorcData equipment . 7.2.4.5 Configuring the RorcTrigger equipment 7.2.4.6 Configuring the RorcSplitter equipment 7.2.5 Data flow for multiple RORC devices . . . . . 7.2.6 Pseudo code of the RORC equipment routines . . 7.3 Introduction to the UDP equipment . . . . . . . . . 7.4 Internals of the UDP equipment . . . . . . . . . . . 7.4.1 Data transfer mechanism of the UDP equipment . 7.4.2 The back-pressure algorithm . . . . . . . . . 7.4.3 Equipments to handle the Ethernet port . . . . 7.4.3.1 Equipment RorcDataUDP . . . . . . ALICE DAQ and ECS manual . . . . . . . . . . . . . . . . . . . . . 85 86 . 88 . 89 . 90 . 91 . 92 . 92 . 100 . 101 . 101 . 101 . 102 . 103 . 103 . 104 . 104 . 105 . . 109 110 . 111 . 112 . 114 . 114 . 115 . 117 . 118 . 118 . 123 . 124 . . 125 126 . 127 . 127 . 128 . 129 . 132 . 133 . 134 . 134 . 134 . 138 . 138 . 139 . 140 . 145 . 145 . 146 . 146 . 147 . 148 . . Contents x 7.4.3.2 Equipment RorcTriggerUDP . 7.4.4 Data flow for multiple UDP equipments. Chapter 8 The trigger system . . . . . . . . . . . . . 8.1 The trigger system . . . . . . . . . . 8.1.1 The Central Trigger Processor (CTP) 8.2 LDC synchronization via the equipments . . Chapter 9 COLE - COnfigurable LDC Emulator . . . . 9.1 Introduction . . . . . . . . . . 9.2 Delayed mode vs. free-running mode . 9.3 System requirements and configuration 9.4 COLE as an Equipment . . . . . . 9.5 Basic Design . . . . . . . . . . 9.5.1 ArmHw() . . . . . . . . 9.5.2 EventArrived() . . . . . 9.5.3 ReadEvent() . . . . . . 9.5.4 DisArmHw() . . . . . . . 9.6 The colecheck utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 148 149 150 150 151 . . . . . . . . . . . . . . . . . . . . . . 153 154 155 155 157 157 157 157 158 158 158 Chapter 10 Data recording . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . 10.2 Common data recording procedures . . . . . . . . . . . . 10.3 Recording from the LDC . . . . . . . . . . . . . . . . . 10.4 Recording from the eventBuilder . . . . . . . . . . . . . . 10.4.1 Direct recording . . . . . . . . . . . . . . . . . 10.4.2 Online recording . . . . . . . . . . . . . . . . 10.5 Recording with the Multiple Stream Recorder . . . . . . . . . 10.5.1 Overview . . . . . . . . . . . . . . . . . . . 10.5.2 MSR configuration file . . . . . . . . . . . . . . 10.5.2.1 Configuration file: naming and handling . . . 10.5.2.2 Configuration examples . . . . . . . . . . 10.5.2.3 File names . . . . . . . . . . . . . . . 10.5.2.4 The configuration file syntax: tags and attributes 10.5.2.5 The configuration file structure . . . . . . . 10.5.2.6 Scopes of attributes and rules of precedence . . 10.5.2.7 Summary . . . . . . . . . . . . . . . 10.5.3 Description of the MSR configuration attributes . . . . . 10.5.4 How to build and run MSR . . . . . . . . . . . . 159 160 160 162 163 164 164 167 167 169 169 169 171 172 173 173 176 176 179 Chapter 11 The infoLogger system . . . . . . . . . . . . 11.1 Introduction . . . . . . . . . . . . 11.2 infoLogger configuration . . . . . . . . 11.3 The infoLogger processes . . . . . . . . 11.3.1 infoLoggerReader . . . . . . . 11.3.2 infoLoggerServer . . . . . . . 11.3.3 infoBrowser . . . . . . . . . 11.4 Log messages repository . . . . . . . . 11.4.1 MySQL database . . . . . . . 11.4.2 Archiving . . . . . . . . . . 11.4.3 Retrieving messages from repository 11.5 Injection of messages . . . . . . . . . 183 184 184 185 186 186 186 187 187 188 188 188 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ALICE DAQ and ECS manual Contents xi 11.5.1 Logging from the command line 11.5.2 Logging with the C API . . . 11.5.3 Logging with the Tcl API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 . 189 . 194 . Chapter 12 The eventBuilder . . . . . . . . . . . . . . . . . . . . . . . . 195 12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . 196 12.2 The event-builder architecture . . . . . . . . . . . . . . . 196 12.2.1 The data transfer from the LDC to the GDC . . . . . . . 196 12.2.2 The communication protocol between the LDC and the GDC 197 12.2.3 The communication protocol between the eventBuilder and the edm 197 12.2.4 The event-building process . . . . . . . . . . . . . 197 12.2.5 SOR/EOR records, files and scripts . . . . . . . . . . 198 12.3 Data buffers . . . . . . . . . . . . . . . . . . . . . . 198 12.4 Consistency checks on the data . . . . . . . . . . . . . . . 199 12.5 ALICE events emulation mode . . . . . . . . . . . . . . . 199 12.6 The control of the eventBuilder . . . . . . . . . . . . . . . 200 12.7 Information and error reporting . . . . . . . . . . . . . . . 200 12.7.1 Usage of the infoLogger . . . . . . . . . . . . . . 200 12.7.2 Run statistics update. . . . . . . . . . . . . . . . 200 12.7.3 End-of-run messages . . . . . . . . . . . . . . . 200 12.8 Configuration . . . . . . . . . . . . . . . . . . . . . . 200 Chapter 13 The event distribution manager . . . . . . . 13.1 Overview . . . . . . . . . . . . 13.2 The EDM architecture . . . . . . . . 13.2.1 The edm process . . . . . . 13.2.2 The edmClient process . . . . 13.2.3 The edmAgent process . . . . 13.3 The synchronization with the run control 13.4 Information and error reporting . . . . Chapter 14 The runControl . . . . . . . . . . 14.1 Introduction . . . . . . . . 14.2 Architecture . . . . . . . . 14.3 The runControl process . . . . 14.4 The runControl interface. . . . 14.5 The runControl Human Interface 14.6 The Logic Engine . . . . . . 14.7 The rcServers . . . . . . . . 14.8 The RCS interface . . . . . . 14.9 Run parameters . . . . . . . 14.10 Run-time variables . . . . . . 14.11 Control of the log messages . . 14.12 Log Files . . . . . . . . . ALICE DAQ and ECS manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 15 The physmem package . . . . . . . . . . 15.1 Introduction . . . . . . . . . . . 15.2 Installation of the physmem driver . . . 15.2.1 Configuring the boot loader . . 15.2.2 Setting up the physmem driver . 15.2.3 Testing the physmem driver . . 203 204 . 205 . 207 . 208 . 208 . 210 . 210 . 211 212 . 212 . 213 . 216 . 216 . 216 . 217 . 218 . 218 . 224 . 228 . 228 . . 231 232 . 232 . 232 . 233 . 236 . . Contents xii 15.3 Utility programs for physmem . 15.4 Internals of the physmem driver 15.5 Physmem application library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 16 Utility packages . . . . . . . . . . . . . 16.1 The banks manager package . . . . . 16.1.1 Introduction . . . . . . . . 16.1.2 Architecture . . . . . . . . 16.1.3 Entries and symbols . . . . . 16.1.4 Internals . . . . . . . . . 16.2 The bufferManager package . . . . . . 16.2.1 Introduction . . . . . . . . 16.2.2 Architecture . . . . . . . . 16.2.3 Common entries . . . . . . 16.2.4 Producer entries. . . . . . . 16.2.5 Consumer entries . . . . . . 16.2.6 Internals . . . . . . . . . 16.3 The simpleFifo package . . . . . . . 16.3.1 Introduction . . . . . . . . 16.3.2 Architecture . . . . . . . . 16.3.3 Common entries . . . . . . 16.3.4 Producer entries. . . . . . . 16.3.5 Consumer entries . . . . . . 16.3.6 Internals . . . . . . . . . 16.4 The recording library package . . . . . 16.4.1 Introduction . . . . . . . . 16.4.2 The low-level recording library . 16.4.2.1 The callable interface . 16.4.3 The high-level recording library . 16.4.3.1 The callable interface . 16.4.4 Internals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 17 Interfaces . . . . . . . . . . . . . . . . . 17.1 Interface with the Trigger System . . . . . 17.2 Interface to the High-Level Trigger . . . . 17.2.1 DAQ-HLT interface . . . . . . 17.2.2 HLT-DAQ interface . . . . . . 17.2.3 Installation and operation . . . . 17.2.4 Synchronization between hltAgents 17.3 Interface to AliEn and the Grid . . . . . 17.3.1 Transfer to PDS . . . . . . . . 17.4 File Exchange Server . . . . . . . . . 17.5 Interface to the Shuttle . . . . . . . . . 237 240 244 249 250 250 250 250 253 254 254 254 255 256 258 259 259 259 260 260 262 262 263 264 264 264 264 270 270 273 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 276 277 278 280 281 282 283 283 286 288 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Part II ALICE Experiment Control System Reference Manual Preface . . . . . . Chapter 18 ECS Overview . . . 18.1 Introduction 18.2 Partitions . . . . . . . . . . . . . . . . . . . . . . . 293 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 296 296 ALICE DAQ and ECS manual Contents xiii 18.3 18.4 18.5 18.6 18.7 18.8 18.9 18.10 18.11 18.12 18.13 18.14 18.15 18.16 Stand-alone detectors . . . . ECS architecture . . . . . . Detector Control Agent (DCA) The DCA Human Interface . . Partition Control Agent (PCA) The PCA Human Interface . . ECS/DCS Interface . . . . . ECS/DAQ Interface . . . . ECS/TRG Interface. . . . . ECS/HLT Interface . . . . . logFiles . . . . . . . . . Database . . . . . . . . Interactions with other systems Auxiliary processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 19 ALICE Configuration Tool . . . . . . . . . . . . . . . 19.1 Architecture . . . . . . . . . . . . . . . . . 19.1.1 Overview . . . . . . . . . . . . . . . 19.1.2 Taxonomy . . . . . . . . . . . . . . 19.1.2.1 Items Locking . . . . . . . . . 19.1.2.2 Items Status Mismatch . . . . . . 19.1.2.3 Items Activation Status . . . . . . 19.1.3 ACT Update Request Server . . . . . . . . 19.1.4 Interfaces . . . . . . . . . . . . . . . 19.1.4.1 ACT-ECS interface . . . . . . . 19.1.4.2 ACT-DAQ interface . . . . . . . 19.1.4.3 ACT-HLT interface . . . . . . . 19.1.4.4 ACT-CTP interface . . . . . . . 19.1.4.5 ACT-Detector interface . . . . . . 19.1.5 Workflow . . . . . . . . . . . . . . . 19.2 Database . . . . . . . . . . . . . . . . . . 19.2.1 Overview . . . . . . . . . . . . . . . 19.2.2 Table description . . . . . . . . . . . . 19.2.2.1 ACTsystems table . . . . . . . 19.2.2.2 ACTitems table . . . . . . . . 19.2.2.3 ACTinstances table . . . . . . 19.2.2.4 ACTlockedItems table . . . . . 19.2.2.5 ACTconfigurations table . . . 19.2.2.6 ACTconfigurationsContent table 19.2.2.7 ACTinfo table . . . . . . . . . 19.3 Application Programming Interface . . . . . . . . 19.3.1 Overview . . . . . . . . . . . . . . . 19.3.2 Environment variables . . . . . . . . . . 19.3.3 Data types . . . . . . . . . . . . . . 19.3.4 Database connection functions . . . . . . . 19.3.5 API cleanup functions . . . . . . . . . . 19.3.6 ACT READ access functions . . . . . . . . 19.3.7 ACT WRITE functions . . . . . . . . . . 19.4 Tools . . . . . . . . . . . . . . . . . . . . 19.5 Graphical User Interface . . . . . . . . . . . . . 19.5.1 Overview . . . . . . . . . . . . . . . 19.5.2 Authentication and Authorization . . . . . 19.5.3 Expert Mode . . . . . . . . . . . . . 19.5.3.1 Actions . . . . . . . . . . . . 19.5.3.2 Status . . . . . . . . . . . . ALICE DAQ and ECS manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 . 298 . 298 . 299 . 299 . 300 . 300 . 301 . 301 . 302 . 303 . 303 . 303 . 304 . 305 . 306 . 306 . 307 . 307 . 307 . 307 . 308 . 308 . 308 . 308 . 309 . 309 . 309 . 309 . 311 . 311 . 311 . 311 . 312 . 312 . 313 . 313 . 314 . 314 . 314 . 314 . 314 . 315 . 317 . 318 . 320 . 323 . 325 . 327 . 327 . 327 . 327 . 327 . 328 . Contents xiv 19.5.4 Run Coordination Mode 19.5.4.1 Partitions . . 19.5.4.2 Detectors . . 19.5.4.3 CTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 328 329 329 Part III DDL and D-RORC software Reference Manual Chapter 20 DDL and D-RORC stand-alone software . . . . . . . . . . . . . 20.1 Introduction . . . . . . . . . . . . . . . . . . . . 20.2 Test programs for the RORC, DIU and SIU . . . . . . . . . 20.3 Front-end Control and Configuration (FeC2) program . . . . . 20.3.1 General description of the FeC2 program . . . . . . 20.3.2 Syntax of script files for the FeC2 program . . . . . . 20.3.2.1 FeC2 instructions related to the DDL . . . . 20.3.2.2 FeC2 instructions related to the program flow. 20.3.2.3 Example of an FeC2 script . . . . . . . . 20.4 DDL Data Generator (DDG) program . . . . . . . . . . . 20.4.1 General description of the DDG program . . . . . . 20.4.2 Behavior of the DDG program . . . . . . . . . . 20.4.3 Syntax of the DDG configuration file . . . . . . . . 20.4.3.1 Channel independent keywords . . . . . . 20.4.3.2 Channel dependent keywords . . . . . . 20.4.3.3 Common data header keywords . . . . . . 20.4.3.4 Example of a DDG configuration file . . . . 20.4.4 Syntax of the DDG data files . . . . . . . . . . . 20.5 Stand-alone installation . . . . . . . . . . . . . . . . Chapter 21 RORC Application Library . . . . . . . . . 21.1 Introduction . . . . . . . . . . . 21.2 Header files . . . . . . . . . . . . 21.3 The rorc_driver . . . . . . . . . . 21.4 Description of the routines and functions . 21.5 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 334 335 344 344 345 345 349 351 352 352 352 353 353 356 358 360 361 361 363 364 364 364 365 389 Part IV Detector Algorithms Framework Chapter 22 Detector Algorithms Framework . . . . . . 22.1 Introduction . . . . . . . . . . 22.2 The Detector Algorithms (DAs) . . . 22.3 DA framework architecture . . . . . 22.4 DA framework implementation . . . 22.4.1 DA interface API . . . . . 22.4.2 DA control mechanisms . . . 22.4.2.1 Runtime parameters 22.4.2.2 LDC DA launching 22.4.2.3 MON DA launching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 394 395 395 397 397 399 399 400 400 ALICE DAQ and ECS manual Contents xv Part V Data Quality Monitoring Chapter 23 Automatic MOnitoRing Environment (AMORE) . . . . . . . . . . 23.1 Architecture . . . . . . . . . . . . . . . . . . . . . 23.1.1 Overview . . . . . . . . . . . . . . . . . . . 23.1.2 MonitorObjects . . . . . . . . . . . . . . . . . 23.1.3 AMORE taxonomy . . . . . . . . . . . . . . . 23.1.4 Publishers . . . . . . . . . . . . . . . . . . 23.1.5 Clients . . . . . . . . . . . . . . . . . . . . 23.2 Database . . . . . . . . . . . . . . . . . . . . . . 23.2.1 Overview . . . . . . . . . . . . . . . . . . . 23.2.2 Archives . . . . . . . . . . . . . . . . . . . 23.2.3 Tables descriptions . . . . . . . . . . . . . . . 23.3 Application flow . . . . . . . . . . . . . . . . . . . 23.3.1 Agents and clients Finite State Machines . . . . . . . 23.3.2 Initialization . . . . . . . . . . . . . . . . . . 23.3.3 Agents and clients inheritance and methods calls sequences 23.4 Features details . . . . . . . . . . . . . . . . . . . . 23.4.1 Quality . . . . . . . . . . . . . . . . . . . . 23.4.2 Expert/Shifter MonitorObjects . . . . . . . . . . . 23.4.3 Archiver and FIFO . . . . . . . . . . . . . . . 23.4.3.1 Purpose . . . . . . . . . . . . . . . 23.4.3.2 Implementation of the archiver . . . . . . . 23.4.3.3 Implementation of the FIFO . . . . . . . . 23.4.3.4 Access to the archives . . . . . . . . . . 23.4.4 Access Rights . . . . . . . . . . . . . . . . . 23.4.5 ECS-AMORE interaction . . . . . . . . . . . . . 23.4.5.1 Motivation . . . . . . . . . . . . . . 23.4.5.2 Implementation . . . . . . . . . . . . 23.4.6 Logbook usage . . . . . . . . . . . . . . . . . 23.4.6.1 Motivation . . . . . . . . . . . . . . 23.4.6.2 Usages . . . . . . . . . . . . . . . . 23.4.7 Multi thread image production . . . . . . . . . . . 23.5 Application Programming Interface (API) . . . . . . . . . . 23.5.1 Core . . . . . . . . . . . . . . . . . . . . . 23.5.1.1 MonitorObject . . . . . . . . . . . . . 23.5.1.2 Run . . . . . . . . . . . . . . . . . 23.5.1.3 ConfigFile. . . . . . . . . . . . . . . 23.5.2 Publisher . . . . . . . . . . . . . . . . . . . 23.5.2.1 PublisherModule . . . . . . . . . . . . 23.5.2.2 PublicationManager . . . . . . . . . . . 23.5.3 Subscriber . . . . . . . . . . . . . . . . . . 23.5.3.1 SubscriptionManager . . . . . . . . . . 23.5.4 User Interface (UI) . . . . . . . . . . . . . . . 23.5.4.1 VisualModule . . . . . . . . . . . . . 23.5.5 Detector Algorithms (DA) library . . . . . . . . . . 23.5.6 Archiver . . . . . . . . . . . . . . . . . . . 23.5.6.1 ArchiverModule . . . . . . . . . . . . 23.6 Tools . . . . . . . . . . . . . . . . . . . . . . . . ALICE DAQ and ECS manual 405 406 . 406 . 407 . 407 . 407 . 408 . 408 . 408 . 408 . 409 . 413 . 414 . 414 . 415 . 417 . 417 . 417 . 417 . 417 . 418 . 420 . 420 . 420 . 420 . 420 . 421 . 421 . 421 . 421 . 422 . 422 . 422 . 422 . 424 . 424 . 426 . 426 . 427 . 429 . 429 . 434 . 434 . 436 . 436 . 436 . 437 . . Contents xvi Part VI The ALICE electronic logbook Chapter 24 The ALICE Electronic Logbook . . . . . . . . . . . . . . . . 24.1 Architecture. . . . . . . . . . . . . . . . . . . . . 24.1.1 Overview . . . . . . . . . . . . . . . . . . 24.2 Database . . . . . . . . . . . . . . . . . . . . . . 24.2.1 Overview . . . . . . . . . . . . . . . . . . 24.2.2 Table description . . . . . . . . . . . . . . . 24.2.2.1 logbook table . . . . . . . . . . . . 24.2.2.2 logbook_detectors table . . . . . . . 24.2.2.3 logbook_stats_LDC table . . . . . . . 24.2.2.4 logbook_stats_LDC_trgCluster table . 24.2.2.5 logbook_stats_GDC table . . . . . . . 24.2.2.6 logbook_stats_files table . . . . . . 24.2.2.7 logbook_daq_active_components table . 24.2.2.8 logbook_shuttle table . . . . . . . . 24.2.2.9 logbook_DA table . . . . . . . . . . . 24.2.2.10 logbook_AMORE_agents table . . . . . 24.2.2.11 logbook_trigger_clusters table . . . 24.2.2.12 logbook_trigger_classes table . . . . 24.2.2.13 logbook_trigger_inputs table . . . . 24.2.2.14 logbook_trigger_config table . . . . 24.2.2.15 logbook_stats_HLT table . . . . . . . 24.2.2.16 logbook_stats_HLT_LDC table . . . . . 24.2.2.17 logbook_comments table. . . . . . . . 24.2.2.18 logbook_comments_interventions table 24.2.2.19 logbook_files table . . . . . . . . . 24.2.2.20 logbook_threads table . . . . . . . . 24.2.2.21 logbook_subsystems table . . . . . . 24.2.2.22 logbook_comments_subsystems table . . 24.2.2.23 logbook_users table . . . . . . . . . 24.2.2.24 logbook_users_privileges table . . . 24.2.2.25 logbook_users_profiles table . . . . 24.2.2.26 logbook_filters table . . . . . . . . 24.2.2.27 DETECTOR_CODES table . . . . . . . . . 24.2.2.28 TRIGGER_CLASSES table . . . . . . . . 24.2.2.29 logbook_config table . . . . . . . . . 24.2.3 Stored Procedures . . . . . . . . . . . . . . . 24.2.4 Events . . . . . . . . . . . . . . . . . . . 24.3 Application Programming Interface . . . . . . . . . . . . 24.3.1 Overview . . . . . . . . . . . . . . . . . . 24.3.2 Environment variables . . . . . . . . . . . . . 24.3.3 Database connection functions . . . . . . . . . . 24.3.4 Logging functions . . . . . . . . . . . . . . . 24.3.5 eLogbook READ access functions . . . . . . . . . 24.3.6 eLogbook WRITE functions . . . . . . . . . . . 24.4 Logbook Daemon . . . . . . . . . . . . . . . . . . 24.5 Tools . . . . . . . . . . . . . . . . . . . . . . . 24.6 Graphical User Interface . . . . . . . . . . . . . . . . 24.6.1 Overview . . . . . . . . . . . . . . . . . . 24.6.2 Authentication and Authorization . . . . . . . . . 24.6.3 Features . . . . . . . . . . . . . . . . . . . 24.6.3.1 Run Statistics . . . . . . . . . . . . . 24.6.3.2 Run Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441 442 442 443 443 443 444 446 447 447 448 448 449 449 451 451 452 452 453 454 454 454 455 456 456 456 457 457 457 458 458 458 459 459 459 460 460 461 461 461 461 462 462 469 483 484 487 487 487 487 487 488 ALICE DAQ and ECS manual Contents xvii 24.6.3.3 24.6.3.4 24.6.3.5 24.6.3.6 24.6.3.7 Log Entries . . . . . . . Announcements . . . . . Automatic Email Notification Search Filters . . . . . . Export Run Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 25 LHC machine monitoring . . . . . . . . . . . . . . . . . 25.1 DATA INTERCHANGE PROTOCOL (DIP) . . . . . . . . . . 25.1.1 The DIP architecture . . . . . . . . . . . . . 25.1.2 Setting up development environment . . . . . . 25.1.2.1 DIP installation for C++ user under Linux . 25.2 LHC beam info: DIP client/DIM server . . . . . . . . . 25.3 LHC beam info: off-line cross-check . . . . . . . . . . . . . . . . . . . . . . . . 488 . 488 . 488 . 488 . 489 . 491 492 . 492 . 495 . 495 . 496 . 498 . . Part VII The Transient Data Storage Chapter 26 The Transient Data Storage . . . . . . . . . . . . . . 26.1 Introduction . . . . . . . . . . . . . . . . 26.2 The Transient Data Storage architecture . . . . . . 26.3 The TDSM . . . . . . . . . . . . . . . . . 26.3.1 The TDSM and the DAQ . . . . . . . . 26.3.2 Size of the output files . . . . . . . . . 26.3.3 Links within the TDS and TDSM components 26.3.4 The AliEn spooler . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . List of Figures . . . . . . . . . . . . . . . . . . . List of Listings . . . . . . . . . . . . . . . . . . . List of Tables . . . . . . . . . . . . . . . . . . . List of Acronyms . . . . . . . . . . . . . . . . . . ALICE DAQ and ECS manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 504 . 504 . 504 . 506 . 506 . 507 . 507 . . 511 513 . 515 . 517 . 519 . . xviii Contents ALICE DAQ and ECS manual Part I DATE Reference Manual November 2010 ALICE DAQ Project DATE V7 DATE overview 1 This chapter gives an overview of the architecture of the ALICE DAQ system and of its software framework called DATE. The features of the system are described, with the components that implement such features. For each component, a brief explanation of the underlying mechanism is given. ALICE DAQ and ECS manual 1.1 ALICE data-acquisition architecture . . . . . . . . . . . . . . . .2 1.2 DATE overview . . . . . . . . . . . . . . . . . . . . . . . . . . .3 1.3 DATE architectural strategies . . . . . . . . . . . . . . . . . . . .6 DATE overview 2 1.1 ALICE data-acquisition architecture A broad view of the ALICE data–acquisition architecture is illustrated in Figure 1.1. Rare/All BUSY CTP L0, L1a, L2 BUSY LTU LTU DDL L0, L1a, L2 TTC TTC FERO FERO Event 120 DDLs 360 DDLs Fragment 430 D-RORC D-RORC D-RORC Load Bal. LDC H-RORC FEP HLT Farm FERO FERO 125 Detector LDC FEP 10 DDLs D-RORC D-RORC 10 D-RORC D-RORC D-RORC LDC LDC 10 HLT LDC LDC LDC Sub-event Event Building Network EDM Event GDC TDSM 75 GDC 30 TDSM DA DQM 60 DA/DQM DSS 18 DSS File Storage Network 75 TDS Figure 1.1 Archiving on Tape in the Computing Centre (Meyrin) PDS DAQ architecture overview. The detectors receive the trigger signals and the associated information from the Central Trigger Processor (CTP), through a dedicated Local Trigger Unit (LTU) interfaced to a Timing, Trigger and Control (TTC) system. The readout electronics of all the detectors is interfaced to the ALICE standard Detector Data Links (DDL). The data produced by the detectors (event fragments) are injected on the DDLs. At the receiving side of the DDLs there are PCI-X or PCI-e boards, called DAQ Read-Out Receiver Cards (D-RORC). The D-RORCs are hosted by PCs, the Local Data Concentrators (LDCs). Each LDC can handle one or more D-RORCs. The D-RORCs perform concurrent and autonomous DMA transfers into the PCs memory, with minimal software supervision. In the LDCs, the event fragments originated by the various D-RORCs are logically assembled into sub-events. The role of the LDCs is twofold. Either it can take data isolated from the global system for a test or a calibration run or it can ship the sub-events to a farm of PCs called Global Data Collectors (GDCs), where the whole events are built (from all the sub-events pertaining to the same trigger). The D-RORCs include 2 DDL channels which can be used in two different ways: either both as input from the detector or one as input and the other one as output to the High-Level Trigger (HLT). In the later case, the data shipped by the detector are copied to the HLT for software triggering or data compression. ALICE DAQ and ECS manual DATE overview 3 Besides having a DDL common to all the sub-detectors, the other major architectural feature of the ALICE data acquisition is the event builder, which is based upon an event building network. The sub-event distribution is performed by the LDCs, which decide the destination of each sub-event. This decision is taken by each LDC independently from the others (no communication between the LDCs is necessary); the synchronization is obtained using a data-driven algorithm. The algorithm is designed to fairly share the load on the GDCs. The event–building network does not take part in the decision about the destination; it is a standard communication network supporting the TCP/IP protocol. The event-building network is also used to distribute the HLT decisions from the HLT LDCs to the detector LDCs where the decisions to accept or reject sub-events are applied. The role of the GDCs is to collect the sub-events, assemble them into whole events, and record them to the Transient Data Storage (TDS) located at the experimental area. The data files recorded on the TDS are migrated by the TDS Managers (TDSM) onto Permanent Data Storage (PDS) in the computing centre. The services needed by the DAQ system itself such as the control or the database are performed by the DAQ Services Servers (DSS). Additional servers are used to run the Detector Algorithms (DA) or the Data Quality Monitoring (DQM). All these servers are connected to the event–building network to exchange commands, status and data messages with the other nodes of the system. 1.2 DATE overview DATE (Data Acquisition and Test Environment) is a software system that performs data-acquisition activities in a multi-processor distributed environment. DATE fulfills the requirements of the ALICE data acquisition, therefore it has been designed with scalability features that make it suitable for large systems, involving hundreds of computers. Nevertheless, DATE can cope with a large variety of configurations; in particular, it is well adapted to small laboratory systems as well, where only few machines are used, or even just one. In that case, the DATE system may be based on one single processor, which will then perform all the functions (LDC, GDC, run control, monitoring, etc.). The basic dataflow is organized along parallel data streams working independently and concurrently, followed by a stage of event builders where data are merged and eventually recorded as a complete event. The conditions imposed to the hardware architecture in order to support DATE are minimal: a. The processors must be of the IA32 or IA64 families. b. The operating system of all the processors must be Linux. c. All the processors must be linked to a network supporting the TCP/IP stack ALICE DAQ and ECS manual DATE overview 4 and the socket library. The readout program contains a piece of code that deals with the devices to be read. This piece of code can be tailored to read any type of devices. ALICE, though, has currently standardized its detector readout channel and uses the DDL and the D-RORC; the software to handle this type of device is available and remains the same for all the detectors using it. In view of the ALICE upgrade, new types of readout links will be supported. The support for Ethernet coupled with the UDP protocol has for example been added to the DATE readout. The event triggering is performed via the TTC. The readout program will collect all the data from the DDLs, and the data structure superimposed by the DDL will permit to identify the original blocks belonging to an event. The DATE system, besides the data-flow function, provides many other features, such as the ones described in the following paragraphs. 1.2.1 Parametrization of the hardware configuration The hardware configuration of the system is described by declaring all the available machines and assigning a role to them (LDC, GDC, run control, monitoring, etc.). DATE uses a database repository to obtain this information. The repository is made of records in a SQL database containing the description of all the entities and their relationships. The setting up of the hardware configuration is achieved by editing the records of the database. 1.2.2 Interactive setting up of the data-acquisition parameters The running conditions of the system are described by selecting the machines involved in the data acquisition (which may be a subset of the available machines) and assigning the parameters associated with a given mode of operation. This information is stored on disk and may be changed interactively. 1.2.3 Run control An interactive program gives to the operator the opportunity to centrally control the operations of all the machines involved in the data acquisition. The activities of all the machines in the system proceed through pre-defined sequences with synchronization check-points. Various hooks are provided to perform calibration procedures and to submit foreign data into the event stream. 1.2.4 Load balancing Large configurations, involving a farm of many GDCs, may need to smooth the distribution of events to the various machines, in order to avoid that busy machines slow down the system. A module called event-distribution manager (EDM) checks the occupancy of each GDC and instructs the LDCs to dispatch the events to the machines that are not crowded. ALICE DAQ and ECS manual DATE overview 5 1.2.5 Event monitoring Analysis programs can receive online events, while the data acquisition is active. A monitoring server makes copies of the events requested and dispatches them to the client analysis process. The analysis process does not need to run on the machine where the data are generated. Actually, the analysis can run on any remote non-DATE machine, i.e. not declared as a node of the DATE system. The same routine calls that provide the online events may be used to read offline events that have been previously recorded. 1.2.6 Information reporting All the information messages generated by the processes involved in the data acquisition are centrally handled and made available to the operator via an interactive browser. 1.2.7 Electronic Logbook All the information relevant to the runs (used to keep track run-by-run of the running conditions) may be generated by any process involved in the data acquisition. It is centrally handled and made available to the operator via a Web browser. The electronic logbook can also be used to archive comments or observations made by the people working at the experimental area. 1.2.8 Performance monitoring system The performance of large systems should be closely monitored. The ALICE DAQ uses the LEMON package to perform the collection of performance measurements, their centralized handling, and their visualization using a Web browser. 1.2.9 Detector algorithms A framework has been developed to support in the DAQ system the execution of detector algorithms using data monitored or recorded. 1.2.10 Data Quality Monitoring The AMORE framework has been developed to allow the execution of Data Quality Monitoring (DQM) programs. These programs monitor physics data during the physics run and accumulate plots that can be inspected asynchronously. The DQM framework also provides an archiving of the plots at various stages of their lifetime in order to ease their inspection and any investigation related to their evolution. ALICE DAQ and ECS manual DATE overview 6 1.3 DATE architectural strategies Some of the leading ideas that have determined the DATE architecture are described in the next paragraphs. 1.3.1 Protocol-less push-down strategy Data are pushed down as soon as available. All the actors of the data acquisition, from the detector electronics to the data storage, send the data through open channels to the next processing stage, as soon as they finished their own processing. The DDL and TCP/IP provide the flow control. A back-pressure mechanism (x-on/x-off style) protects the system from congestions. This strategy avoids the synchronization overheads and maximizes the throughput. 1.3.2 Detector readout via a standard handler The fact of standardizing the transmission medium (presently the DDL) and the data structure allows to provide the same piece of code (equipment code) to handle all the detectors using the same medium. The readout system can be adapted to changes of the hardware configuration without modifying the code. The addition of a new medium such as Ethernet coupled with the UDP protocol has been made possible by the development of a new readout equipment. 1.3.3 Light-weight multi-process synchronization strategy Wherever process synchronization is required within a PC, no system services are used, such as semaphores or message queues. An original technology has been developed to be able to use much faster shared-memory mechanisms. This technology is applicable each time the synchronization involves one single data producer and one data consumer. 1.3.4 Common data-acquisition services DATE provides a set of services, such as run control, event delivery to monitoring programs, message logging, run bookkeeping, load balancing, performance measurement and data quality monitoring. These services are common throughout all the components of the system and are available to any additional piece of software cooperating with DATE. 1.3.5 Detectors integration The detectors developers usually provide the code dealing with the various operation phases, such as calibration, initialization, run-down and readout. This code can be fully integrated in DATE and makes use of all the services mentioned above. ALICE DAQ and ECS manual DATE architectural strategies 7 1.3.6 DATE installation A user-friendly DATE installation procedure, based on RPMs, produces a turn-key system readily available to the user. ALICE DAQ and ECS manual 8 DATE overview ALICE DAQ and ECS manual DATE configuration parameters 2 This chapter gives an overview of the configurable parameters for a DATE system. ALICE DAQ and ECS manual 2.1 DATE site parameters . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Base configuration . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Use of hostnames vs. IP addresses.. . . . . . . . . . . . . . . . 11 DATE configuration parameters 10 2.1 DATE site parameters Since DATE v6 only MySQL is used to store the DATE configuration parameters. As a consequence, a single local configuration file is needed to run DATE (it stores the database access parameters). All the other configuration items are stored in MySQL, and edited with editDb (see Section 4.5) or some other package-specific human interfaces. The configuration files are retrieved locally when necessary. The only item that should still be put manually on each host running DATE is the file ${DATE_SITE_PARAMS}. It contains a sequence of lines defining environment variables. Every line contains the name of an environment variable followed by the associated value. Lines starting with character # (followed by a space) are comments and are not taken into account. The following variables must be defined so that the configuration database is accessible: • DATE_DB_MYSQL_HOST : IP name of the host where the MySQL server runs. • DATE_DB_MYSQL_DB : MySQL database name. • DATE_DB_MYSQL_USER : user name to access MySQL database. • DATE_DB_MYSQL_PWD : password to access MySQL database. 2.2 Base configuration To initially setup a minimal working DATE setup, it is recommended to call the interactive script newDateSite.sh. Answer the questions accordingly to your local system and you will have a basic DATE environment running. This script includes the creation of the DATE_SITE_PARAMS file described in Section 2.1 and the setup of some services like the infoLogger system (Chapter 11) and logbook (Chapter 24). These settings are in principle final and do not need to be changed afterwards. The script also creates a minimal setup with a random software readout in a 1 LDC + 1 GDC configuration on the same machine. It can then be extended according to your needs. Please note that for an initial DATE installation, some local system services (DIM DNS, firewall, database engine, xinetd, ...) need to be configured. One may run the script newMySQL.sh to initially create databases and accounts for DATE. One can also use the script dateLocalConfig to configure local services. Finally, some DATE daemons may be started with dateSiteDaemons and runControl/start_daqDomains.sh. Extensive installation instructions are available in separate guides: • ALICE DAQ and ECS installation and configuration (hardware and software) at Point 2 (available in the ALICE DAQ WIKI pages); • ALICE DAQ and ECS installation and configuration guide for external sites (available on the ALICE DAQ Web server). ALICE DAQ and ECS manual Use of hostnames vs. IP addresses. 11 Configuration information for roles, detectors, event-building rules, memory banks, triggers and readout equipment is required to operate DATE. The DATE utility editDb should be used to populate or edit configurations. Chapter 4 describes the configuration of roles, detectors, banks, triggers, and event building rules. Equipment-specific parameters are described in the corresponding hardware chapters. Additionnal package-specific configuration files or environment variables may be stored in the database FILES and ENVIRONMENT sections. Description of the files or variables is done in the relevant packages documentation. Some persistent DATE parameters are also stored in the database and not directly accessible by users from editDb. This is for example the case of the runcontrol parameters edited using the runControlHI human interface, as described in Chapter 14. After a basic DATE_SITE system setup, the first parameters usually modified are: • The ROLES database in order to add new machines or roles to the DATE system. This is described in Chapter 4. • The EQUIPMENTS configuration, to add and configure hardware readout equipment. This is done in editDb, and the parameters are described in the relevant hardware chapters, e.g. Section 7.2.4.4 for the RORC parameters. • The runControl parameters, which define the behavior of the DATE processes at run time: e.g. maximum number of events, enabling CDH checks, enabling monitoring, etc. Note that the LOGLEVEL controlling the verbosity of some DATE processes is also one of these settings. The global run options (e.g. recording mode) may also be saved. This is described in Chapter 14. • The monitoring configuration, e.g. to define adequate buffer sizes. These settings are commented in Section 5.8. • The mStreamRecorder configuration to record ROOT files, as described in Section 10.5. Other features, like the Transient Data Storage (Chapter 26) and the Detector Algorithms (Chapter 22) are usually deployed only at the production area, and therefore rarely need to be configured by end users. 2.3 Use of hostnames vs. IP addresses. During the configuration of DATE, it is necessary to identify several hosts (DIM DNS, database server, infoLogger host). These can be specified either by hostname or by their IP address. The two methods are in principle equivalent. However, they offer different runtime features that may have an impact on the operation of a DATE site. When hosts are specified by their hostname, this means that one or more calls to the Internet Name Domain server (named) are done at run-time in order to associate the name to the IP address of the machine. We recommend - whenever possible - to use IP addresses rather than hostnames during the configuration of a DATE site to ALICE DAQ and ECS manual 12 DATE configuration parameters minimize queries to the IP name server, and avoid problems if it is unavailable or slow. ALICE DAQ and ECS manual Data format 3 This chapter describes the different event types used in DATE, the format of the data produced in the LDCs (events and sub-events), and the format of the super event which are built in the GDCs. ALICE DAQ and ECS manual 3.1 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Base header and header extension . . . . . . . . . . . . . . . . 14 3.3 Streamlined and paged events . . . . . . . . . . . . . . . . . . 14 3.4 Collider and fixed target modes . . . . . . . . . . . . . . . . . 18 3.5 The base event header . . . . . . . . . . . . . . . . . . . . . . . 19 3.6 The super event format . . . . . . . . . . . . . . . . . . . . . . 31 3.7 The complete file format . . . . . . . . . . . . . . . . . . . . . 33 3.8 Decoding and monitoring on different platforms . . . . . . . 34 3.9 The Common Data Header . . . . . . . . . . . . . . . . . . . . 36 3.10 The equipment header. . . . . . . . . . . . . . . . . . . . . . . 37 3.11 Paged events and DATE vectors . . . . . . . . . . . . . . . . . 38 3.12 Data pools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Data format 14 3.1 Conventions All symbols referred in the following pages are defined in the DATE include file ${DATE_COMMON_DEFS}/event.h. Programs using this file are also supposed to be compiled using the DATE makefile rules appropriate to the host architecture. The shell command date-config can be used - on all architecture compatible with the full DATE kit - to get a list of the options required to compile programs making use of the event.h definition file. Several of the macros defined in the DATE environment perform some simple run-time checks. These checks can be disabled by defining the compilation symbol NDEBUG. When the checks are enabled, they may cause an early termination of the process with an appropriate error message and the creation (if possible) of a core dump file as soon as the basic correctness conditions are not met. Three concepts used in this chapter are those of IDs, patterns and masks. An ID is a number, belonging to a fixed range, that identifies one and only one entity in a given set (e.g. a trigger class or a detector). A pattern is a sequence of bits with one bit for each ID of a given set: it can have zero or more bits asserted. A mask is a pattern with one and only one bit asserted. These concepts are, for example, used in the definition of the eventTriggerPattern given in Section 3.5.7. All sizes given in this chapter are expressed in bytes. 3.2 Base header and header extension All DATE events are prefixed by an event header. This header is made of a first mandatory part (the base header) and of an optional header extension. The base header is completely defined by the eventHeaderStruct structure. All the fields of the base header are initialized by DATE to predefined values. The base header includes the size of the complete event, a unique pattern (the DATE event magic number), the version of the eventHeaderStruct structure, all the fields needed to identify the event (in type, origin, trigger set and detector set) and other fields used for rule-driven criteria. A 64-bit pattern is also available to specify user-defined attributes associated to the event. A header extension can be appended to the base header: the size and the format of the header extension is left to the data-acquisition system responsible. 3.3 Streamlined and paged events LDCs must be able to handle synchronous and asynchronous equipments, with either fixed or variable event sizes and with segmented or paged payloads. Two different schemes have been made available: the streamlined events scheme and the paged events scheme. Streamlined events support mainly serial synchronous ALICE DAQ and ECS manual Streamlined and paged events 15 channels to be read in strict sequence. Paged events are better suited for asynchronous equipments to be read in parallel. One should select the scheme that better fulfills the requirements of the DATE site. Within the same run, each machine will handle events of the same type, e.g. only streamlined or only paged events. It is forbidden to switch between modes within one stream of events (online or offline). However, it is possible to have a DAQ system where some LDCs handle paged events and some other LDCs handle streamlined events. DATE events at LDC level always have their payload partitioned into equipments. The equipment is a logical entity controlling a (set of) physical input channel(s). All equipments prefix their data with a standard equipmentHeaderStruct structure designed to identify the equipment itself, to provide some standard description of the channel and to associate some attributes to the payload. GDCs do not produce paged events. Their format is based on UNIX I/O vectors (as in the writev system call). GDCs do not support the concept of equipment. All they do is to receive sub-events from the LDCs, eventually perform event building functions (according to the event-building rules defined in the configuration of the DATE site), add a super event header and send the resulting event to the recording stage (see Section 3.6 for more details on this process). The format of the events at the level of the GDCs is not described here. 3.3.1 Streamlined events Streamlined events are made of a consecutive sequence of bytes, starting from the base event header followed by the header extension (if present) and by the equipments, one after the other. These events are designed to be read sequentially, typically equipment after equipment. This is the natural approach towards network channels or non-shared channels (such as RS232 ). Base event header struct eventHeaderStruct + EVENT_HEAD_BASE_SIZE Payload Equipment-based format Figure 3.1 Streamlined unextended event format ALICE DAQ and ECS manual + eventHeaderStruct.eventSize Data format 16 Base event header struct eventHeaderStruct + EVENT_HEAD_BASE_SIZE Event header extension user-defined format + eventHeaderStruct.eventHeadSize Payload Equipment-based format Figure 3.2 + eventHeaderStruct.eventSize Streamlined extended event format Multiple sequential channels can also be read in pure streamlined mode. The easier approach is to read the channels one after the other, in strict synchronous sequence. This requires the pre-allocation of an event buffer big enough to store the data coming from all the equipments, to be eventually resized at the end of the readout procedure. However, if the amount of data sent over each channel is known in advance, it is possible to allocate the buffer needed for the full event at once, calculate the offset of each equipment’s payload and start a parallel, asynchronous readout from all the channels into the right location. Another option is to read all channels in parallel in separate buffers and them combine then into one single streamlined events using a follow-and-copy process. Streamlined events may be quite difficult to modify. 3.3.2 Paged events Paged events are made of multiple data segments or pages, containing (part of) payloads coming from the input channel(s). This class of events are very efficient for data-driven input channels (such as the DDL) and for parallel, asynchronous input channels schemes, e.g. multiple serial lines. The logical organization of a paged event, starting from the first-level vector used to represent it, is described in Figure 3.3. ALICE DAQ and ECS manual Streamlined and paged events Event header Payload descriptor 17 Equipment descriptor Equipment descriptor Equipment descriptor Equipment payload Extended header Equipment payload Equipment payload Second level vector Equipment data (page N) Equipment data (page 2) Equipment data (page 1) Figure 3.3 Paged event logical format Paged events are made of a first-level vector including the event header, a payload descriptor and a set of one or more equipment descriptors. The pages with the actual data (extended header and payload) are described by various fields of the first level vector. Note that both the extended header and the equipment payloads are optional and may not be present in all paged events. A first level vector must have at least one equipment descriptor. The first-level vector has a number of components (and therefore a size) function of the number of the equipments instantiated on the LDC. Its format is described in Figure 3.4. ALICE DAQ and ECS manual Data format 18 Base event header struct eventHeaderStruct + EVENT_HEAD_BASE_SIZE Payload descriptor struct vectorPayloadDescriptorStruct + EVENT_HEAD_BASE_SIZE + sizeof( struct vectorPayloaDescriptorStruct ) Equipment descriptor(s) struct equipmentDescriptorStruct Figure 3.4 + PAGED_EVENT_SIZE( eventHeaderStruct ) Paged event first-level vector format The actual payload (extended header and equipment payload) is only described by the first-level vector. As a matter of fact, the entity pointed by the first-level vector can itself be a vector called second-level vector, used to represent paged payloads (such as the output from the DDL). In the example given in Figure 3.3, the first equipment has created a paged event made of N pages and described by a second level vector. Equipments creating segmented payloads can avoid the use of the second level vector and let the first level vector point directly to the payload. The representation could be extended beyond two levels of vectors (if this need arises). Paged events can be converted into streamlined events by algorithm of follow-and-copy. This is the approach followed by the DATE recording library where paged events are recorded (to file, pipes or over the network) in a strict sequential manner (although data is not explicitly copied in the process). Paged events allow easy manipulation. It is sufficient to update the pointer(s) to the appropriate page(s) to modify selected portions of the payload. This avoid a lengthy follow-and-copy process as in the case of streamlined events. 3.4 Collider and fixed target modes DATE events can be identified in two different modes: COLLIDER and FIXED TARGET. The main difference between the two modes is the way the event ID field of the base event header is loaded and handled. In COLLIDER mode, events are identified as described in Figure 3.5. The components of the event ID are the period counter (software controlled), the orbit counter (from the machine/trigger systems) and the bunch crossing number (from the machine/trigger systems). The bunch crossing number directly comes from the particle accelerator while the orbit counter is an entity still driven by the machine that can be - from time to time - reset under software control. When this happens, a new run period is started: this is identified by the period counter. ALICE DAQ and ECS manual The base event header 19 period counter orbit counter bunch crossing Figure 3.5 28b 24b 12b Collider mode event identification In FIXED TARGET mode the event ID is under full software control. The format is described in Figure 3.6. This mode is compatible with both fixed-target and stand-alone setups. In the first case, burst number and number in burst can be included in the event ID. For stand-alone setups, only the number in run field should be set: the burst number and number in burst fields can be zero. number in run burst number number in burst Figure 3.6 32b 12b 20b Fixed target mode event identification Within the same stream of events it is not allowed to switch between different modes. The ATTR_ORBIT_BC system attribute bit can be used to select the encoding (collider if set, fixed target if unset) of the eventId. Different macros are available to manipulate (initialize, load, compare, increment) event IDs of equivalent type. 3.5 The base event header A DATE event is always prefixed by a base event header, described by the eventHeaderStruct structure. This structure include several fields that are standard to all events - such as IDs, event type, system attributes - plus some more static information used to identify the base header itself and its representation. The structure of the base event header is described in Table 3.1. Table 3.1 Base event header structure Name Type Content Set by eventSize eventSizeType total size of the event readout eventBuilder eventMagic eventMagicType unique DATE event signature readout eventBuilder ALICE DAQ and ECS manual Data format 20 Table 3.1 Base event header structure Name Type Content Set by eventHeadSize eventHeadSizeType size of the header (base + extension) readout eventBuilder eventVersion eventVersionType base event header structure version readout eventBuilder eventType eventTypeType type of event readout eventBuilder eventRunNb eventRunNbType number of the run associated to the event readout eventBuilder eventId eventIdType unique event identification readout eventBuilder eventTriggerPattern eventTriggerPatternType level 2 trigger pattern associated to the event readout eventBuilder eventDetectorPattern eventDetectorPatternType detector pattern associated to the event readout eventBuilder eventTypeAttribute eventTypeAttributeType attributes associated to the event readout eventBuilder eventLdcId eventLdcIdType ID of the LDC source of the event readout eventBuilder eventGdcId eventGdcIdType ID of the GDC source or destination of the event readout eventBuilder eventTimestampSec eventTimestampSecType Timestamp at the creation of the event (seconds) readout eventBuilder monitoring eventTimestampUsec eventTimestampUsecType Timestamp at the creation of the event (microseconds) readout eventBuilder monitoring A program is included in the DATE distribution kit to dump the base header of any event written in DATE format. This tool is available in the monitoring package and it is called eventDump (see Section 5.6). We will now describe the fields of the base event header and their associated symbols and macros. 3.5.1 eventSize It contains the total size of the event (base header, extended header, equipment header(s) and payload(s)) in bytes. The size must be a multiple of 32 bits. For paged ALICE DAQ and ECS manual The base event header 21 events, this field shall contain the same value as for the eventSize field of the streamlined version of the same event. 3.5.2 eventMagic The eventMagic field contains a “magic” signature used for two purposes: 1. establish the correctness of the eventHeaderStruct, eventually re-synchronizing a corrupted data stream, 2. determine the endianness of the event (header and payload) when this is received over a binary data channel, possibly originating from an architecture with different endianness (network, disk). The two symbols EVENT_MAGIC_NUMBER and EVENT_MAGIC_NUMBER_SWAPPED can be used to detect at run-time the need to apply endianness-correction algorithms. 3.5.3 eventHeadSize The eventHeadSize field contains the length in bytes of the event header (base + extension). This length should be greater or equal to EVENT_HEAD_BASE_SIZE. For paged events it is always equal to EVENT_HEAD_BASE_SIZE (paged events’ headers can be extended only via a pointer from the payload descriptor, as shown in Figure 3.3). The size of the event header must be a multiple of 32 bits. 3.5.4 eventVersion The eventVersion field provides the version ID of the base event header used to create the event itself. The symbol EVENT_CURRENT_VERSION is available to identify the event header structure as defined at compile time. 3.5.5 eventType All DATE events have an associated type used to identify the content of the payload. The possible event types are: • START_OF_RUN • START_OF_RUN_FILES • START_OF_BURST • PHYSICS_EVENT • CALIBRATION_EVENT • START_OF_DATA • END_OF_DATA • SYSTEM_SOFTWARE_TRIGGER_EVENT • DETECTOR_SOFTWARE_TRIGGER_EVENT ALICE DAQ and ECS manual Data format 22 • END_OF_BURST • END_OF_RUN_FILES • END_OF_RUN • EVENT_FORMAT_ERROR The primary use of the event type field is to identify each type of event or record and determine the type of processing to be applied. The event type is used for example by the eventBuilder to determine whether the policy to be applied on a given event (build, partial build or no-build). The symbols EVENT_TYPE_MIN and EVENT_TYPE_MAX are defined to support arrays and enumerated types. Arrays can be defined with [ EVENT_TYPE_MAX - EVENT_TYPE_MIN + 1 ] range and addressed as [ eventHeaderStruct.eventType - 1 ]. The macro EVENT_TYPE_OK can be used to test a possible event type, e.g. EVENT_TYPE_OK( eventHeaderStruct.eventType ) will return TRUE if the content of the eventType field can be associated to one of the event types above. The START_OF_RUN and END_OF_RUN events can (and should) have the system attributes (ATTR_P_START and ATTR_P_END) set to point to the start and to the end of each phase (see Section 3.7 for more information on this subject). 3.5.6 eventId The eventId field must be handled according to the identification system in use (COLLIDER or FIXED TARGET). The system attribute ATTR_ORBIT_BC shall be set for COLLIDER mode and cleared for FIXED TARGET mode. Macros are provided to handle the eventId field. Some macros apply only to one particular mode while other macros can be used for any type of event. If the ID is encoded in COLLIDER mode (ATTR_ORBIT_BC set), the following macros can be used: • LOAD_EVENT_ID( eventHeaderStruct.eventId, period, orbit, bunchCrossing ) load the given eventId with the given period, orbit and bunch crossing • EVENT_ID_SET_BUNCH_CROSSING( eventHeaderStruct.eventId , bunchCrossing ) set the bunch crossing field of the given eventId with the given value • EVENT_ID_SET_ORBIT( eventHeaderStruct.eventId, orbit ) set the orbit field of the given eventId with the given value • EVENT_ID_SET_PERIOD( eventHeaderStruct.eventId, period ) set the period field of the given eventId with the given value • EVENT_ID_GET_BUNCH_CROSSING( eventHeaderStruct.eventId ) get the bunch crossing field of the given eventId • EVENT_ID_GET_ORBIT( eventHeaderStruct.eventId ) ALICE DAQ and ECS manual The base event header 23 get the orbit field of the given eventId • EVENT_ID_GET_PERIOD( eventHeaderStruct.eventId ) get the period field of the given eventId If the ID is encoded in FIXED TARGET mode (ATTR_ORBIT_BC cleared), the following macros can be used: • LOAD_RAW_EVENT_ID( eventHeaderStruct.eventId, numberInRun, burstNumber, numberInBurst ) load the given eventId with the given number in run, burst number and number in burst • EVENT_ID_SET_NB_IN_RUN( eventHeaderStruct.eventId, numberInRun ) set the number in run field of the given eventId with the given value • EVENT_ID_SET_BURST_NB( eventHeaderStruct.eventId, burstNumber ) set the burst number field of the given eventId with the given value • EVENT_ID_SET_NB_IN_BURST( eventHeaderStruct.eventId, numberInBurst ) set the number in burst field of the given eventId with the given value • EVENT_ID_GET_NB_IN_RUN( eventHeaderStruct.eventId ) get the number in run field of the given eventId • EVENT_ID_GET_BURST_NB( eventHeaderStruct.eventId ) get the burst number field of the given eventId • EVENT_ID_GET_NB_IN_BURST( eventHeaderStruct.eventId ) get the number in burst field of the given eventId The following macros can be used for all encoding schemes: • EQ_EVENT_ID( eventHeaderStruct.eventIdA, eventHeaderStruct.eventIdB ) TRUE if eventIdA is equal to eventIdB • LT_EVENT_ID( eventHeaderStruct.eventIdA, eventHeaderStruct.eventIdB ) TRUE if eventIdA is smaller (older) than eventIdB • GT_EVENT_ID( eventHeaderStruct.eventIdA, eventHeaderStruct.eventIdB ) TRUE if eventIdA is greater (more recent) than eventIdB • LE_EVENT_ID( eventHeaderStruct.eventIdA, eventHeaderStruct.eventIdB ) ALICE DAQ and ECS manual Data format 24 TRUE if eventIdA is smaller (older) or equal to eventIdB • GE_EVENT_ID( eventHeaderStruct.eventIdA, eventHeaderStruct.eventIdB ) TRUE if eventIdA is greater (more recent) or equal to eventIdB • COPY_EVENT_ID( eventHeaderStruct.eventIdFrom, eventHeaderStruct.eventIdTo ) copy eventIdFrom into eventIdTo • ZERO_EVENT_ID( eventHeaderStruct.eventId ) clears the given eventId by setting all fields to zero • ADD_EVENT_ID( eventHeaderStruct.eventIdA, eventHeaderStruct.eventIdB ) load eventIdA with the sum of eventIdA and eventIdB: this macro should be used with a eventIdB all zero excepted one field (it makes little sense to use it with complex patterns, although the macro will still do the requested operation) • SUB_EVENT_ID( eventHeaderStruct.eventIdA, eventHeaderStruct.eventIdB ) load eventIdA with the difference between eventIdA and eventIdB: this macro should be used with a eventIdB all zero excepted one field (it makes little sense to use it with complex patterns, although the macro will still do the requested operation) 3.5.7 eventTriggerPattern The eventTriggerPattern field contains the level 2 trigger pattern as published by the trigger system (referred as L2Class[ 50 .. 1 ]). Its size is given by the symbols EVENT_TRIGGER_PATTERN_BYTES (number of 8 bit entities) and EVENT_TRIGGER_PATTERN_WORDS (number of 32 bit entities). The trigger pattern can be either validated or invalidated. In the first form, it is assumed to contain a valid pattern and can be used to activate trigger-based rules, such as event building rules or monitoring selection criteria. In the second form it cannot be used to activate trigger-based rules although its content can still be loaded according to requirements. In both cases an arbitrary number of trigger classes (from none to all of them) can be set in a trigger pattern. The eventTriggerPattern can be used to store trigger IDs (here referenced as triggerId) in the range [ EVENT_TRIGGER_ID_MIN .. EVENT_TRIGGER_ID_MAX ] (currently [ 0 .. 49 ] which correspond to the number of trigger classes that can be used in ALICE). For each trigger ID we have one and only one trigger mask (a set of bits with one and only one bit set, where each bit corresponds to one trigger class) that can be used to create, handle and test trigger patterns (sets of bits with zero or more bits set, where each bit corresponds to one trigger class). ALICE DAQ and ECS manual The base event header 25 The following macros are available to handle trigger patterns: • ZERO_TRIGGER_PATTERN(eventHeaderStruct.eventTriggerPattern) clear and invalidate the given trigger pattern • COPY_TRIGGER_PATTERN( eventHeaderStruct.eventTriggerPatternFrom, eventHeaderStruct.eventTriggerPatternTo ) copy the “from” pattern into the “to” pattern and, if the “from” pattern is validated, validate the “to” pattern • SET_TRIGGER_IN_PATTERN( eventHeaderStruct.eventTriggerPattern, triggerId ) set the bit corresponding to the given triggerId in the given trigger pattern • CLEAR_TRIGGER_IN_PATTERN( eventHeaderStruct.eventTriggerPattern, triggerId ) clear the bit corresponding to the given triggerId in the given trigger pattern • FLIP_TRIGGER_IN_PATTERN( eventHeaderStruct.eventTriggerPattern, triggerId ) flip (xor) the status of the bit corresponding to the given triggerId in the given trigger pattern • TEST_TRIGGER_IN_PATTERN( eventHeaderStruct.eventTriggerPattern, triggerId ) TRUE if the bit corresponding to the given triggerId is set in the given trigger pattern • VALIDATE_TRIGGER_PATTERN( eventHeaderStruct.eventTriggerPattern ) validate the given trigger pattern • INVALIDATE_TRIGGER_PATTERN( eventHeaderStruct.eventTriggerPattern ) invalidate the given trigger pattern • TRIGGER_PATTERN_VALID( eventHeaderStruct.eventTriggerPattern ) TRUE if the given trigger pattern has been validated • TRIGGER_PATTERN_OK( eventHeaderStruct.eventTriggerPattern ) TRUE if the given trigger pattern is syntactically correct 3.5.8 eventDetectorPattern The eventDetectorPattern field contains information based upon the L2a message, published by the ALICE trigger system, associated to the given event. For physics events it contains the detector pattern corresponding to the L2Cluster[ 6..1 ] field. For software triggers (calibration, detector software trigger and system software trigger events) it contains the L2Detector[ 24..1 ] field. The size of the eventDetectorPattern field is ALICE DAQ and ECS manual Data format 26 given by the symbols EVENT_DETECTOR_PATTERN_BYTES (number of 8 bit entities) and EVENT_DETECTOR_PATTERN_WORDS (number of 32 bit entities). The detector pattern can be either validated or invalidated. In the first form, it is assumed to contain a valid pattern and can be used to activate detectorId-based rules, such as event building rules. In the second form it cannot be used to activate detectorId-based rules although its content can still be loaded according to requirements. In both cases an arbitrary number of detectors (from none to the whole lot) can be set in a detector pattern. The pattern is a set of bits, each corresponding to one and only one detectorId (one detector ID for each detector). Detector IDs belong to range [ EVENT_DETECTOR_ID_MIN .. EVENT_DETECTOR_ID_MAX ] (currently [ 0 .. 30 ]). The range [ EVENT_DETECTOR_ID_MIN .. EVENT_DETECTOR_HW_ID_MAX ] is reserved for HW detectors (to be specified in the Common Data Header) while the range ( EVENT_DETECTOR_HW_ID_MAX. .. EVENT_DETECTOR_ID_MAX ] is reserved to SW detectors. A detectorPattern is a set of bits with one and only one bit for each detectorId, bit that can be either one (TRUE) or zero (FALSE). A detectorMask is a detectorPattern with one and only one bit set. The following macros are available to handle detector patterns: • ZERO_DETECTOR_PATTERN( eventHeaderStruct.eventDetectorPattern ) clear and invalidate the given detector pattern • COPY_DETECTOR_PATTERN( eventHeaderStruct.eventDetectorPatternFrom, eventHeaderStruct.eventDetectorPatternTo ) copy the “from” detector pattern into the “to” detector pattern and - if the “from” detector pattern is validated, validate the “to” detector pattern • SET_DETECTOR_IN_PATTERN( eventHeaderStruct.eventDetectorPattern, detectorId ) set the bit corresponding to the given detectorId in the given detector pattern • CLEAR_DETECTOR_IN_PATTERN( eventHeaderStruct.eventDetectorPattern, detectorId ) clear the bit corresponding to the given detectorId in the given detector pattern • FLIP_DETECTOR_IN_PATTERN( eventHeaderStruct.eventDetectorPattern, detectorId ) flip (xor) the status of the bit corresponding to the given detectorId in the given detector pattern • TEST_DETECTOR_IN_PATTERN( eventHeaderStruct.eventDetectorPattern, detectorId TRUE if the bit corresponding to the given detectorId is set in the given detector pattern • VALIDATE_DETECTOR_PATTERN( eventHeaderStruct.eventDetectorPattern ) ALICE DAQ and ECS manual The base event header 27 validate the given detector pattern • INVALIDATE_DETECTOR_PATTERN( eventHeaderStruct.eventDetectorPattern ) invalidate the given detector pattern • DETECTOR_PATTERN_VALID( eventHeaderStruct.eventDetectorPattern ) TRUE if the given detector pattern has been validated • DETECTOR_PATTERN_OK(eventHeaderStruct.eventDetectorPattern) TRUE if the given detector pattern is syntactically correct The following compilation symbols are defined: • EVENT_DETECTOR_ID_MIN set to 0 • EVENT_DETECTOR_ID_MAX set to 30 3.5.9 eventTypeAttribute Every event has two sets of attributes available: the system attributes and the user attributes. The system attributes are common to all events and are usually set by the standard DATE software. The user attributes are specific to a data-acquisition system, to an LDC or to an equipment: their definition is left to the responsible for the DAQ system. All attributes can be used to select events for monitoring purposes. The standard DATE symbols include three set of macros and symbols. One set is dedicated to system attributes. The second set is for user attributes. A third set manipulates all attributes at once: this can be useful for global operations - such as reset of a pattern - but should be used with care for other types of operations. The third set of macros sees the system attributes as an extension of the user attributes, as if they would physically extend it beyond its physical boundaries. Every attribute is identified by a attributeId, a unique number defining one of the allowed attributes. An attribute pattern - defined by the eventTypeAttribute data type - is a set of bits with zero or more bits (each corresponding to one and only one attributeId) asserted. System attributes are defined by special DATE symbols while user attributes are, at the base, defined by their positional number (they can be re-defined as site-dependent symbols is the need arises). The following symbols and macros are available: • SYSTEM_ATTRIBUTES_BYTES/SYSTEM_ATTRIBUTES_WORDS number of bytes (8 bits) and words (32 bits) allocated to system attributes • USER_ATTRIBUTES_BYTES/USER_ATTRIBUTES_WORDS number of bytes (8 bits) and words (32 bits) allocated to user attributes • ALL_ATTRIBUTES_BYTES/ALL_ATTRIBUTES_WORDS number of bytes (8 bits) and words (32 bits) allocated to all attributes (system and user) • RESET_ATTRIBUTES( eventHeaderStruct.eventTypeAttribute ) reset (clear) all attributes (system and user) of the given attribute pattern ALICE DAQ and ECS manual Data format 28 • SET_ANY_ATTRIBUTE( eventHeaderStruct.eventTypeAttribute, attributeId ) set the bit corresponding to the given attributeId (system or user) in the given attribute pattern • CLEAR_ANY_ATTRIBUTE( eventHeaderStruct.eventTypeAttribute, attributeId ) clear the bit corresponding to the given attributeId (system or user) in the given attribute attribute pattern • FLIP_ANY_ATTRIBUTE( eventHeaderStruct.eventTypeAttribute, attributeId ) flip (xor) the bit corresponding to the given attributeId (system or user) of the given attribute pattern • TEST_ANY_ATTRIBUTE( eventHeaderStruct.eventTypeAttribute, attributeId ) return TRUE is the bit corresponding to the given attributeId (system or user) of the given attribute pattern is set • COPY_ALL_ATTRIBUTES( eventHeaderStruct.eventTypeAttributeFrom, eventHeaderStruct.eventTypeAttributeTo ) copy the “from” attribute pattern to the “to” attribute pattern • RESET_SYSTEM_ATTRIBUTES( eventHeaderStruct.eventTypeAttribute ) reset (clear) the system attributes of the given attribute pattern, leaving the user attributes unmodified • SET_SYSTEM_ATTRIBUTE( eventHeaderStruct.eventTypeAttribute, attributeId ) set the bit corresponding to the given attributeId (system) in the given attribute pattern • CLEAR_SYSTEM_ATTRIBUTE( eventHeaderStruct.eventTypeAttribute, attributeId ) clear the bit corresponding to the given attributeId (system) in the given attribute pattern • FLIP_SYSTEM_ATTRIBUTE( eventHeaderStruct.eventTypeAttribute, attributeId ) flip (xor) the bit corresponding to the given attributeId (system) of the given attribute pattern • TEST_SYSTEM_ATTRIBUTE( eventHeaderStruct.eventTypeAttribute, attributeId ) return TRUE if the bit corresponding to the given attributeId (system) is set in the given attribute pattern • COPY_SYSTEM_ATTRIBUTES( eventHeaderStruct.eventTypeAttributeFrom, eventHeaderStruct.eventTypeAttributeTo ) ALICE DAQ and ECS manual The base event header 29 copy the “from” system attributes to the “to” system attributes leaving the user attributes unmodified • SYSTEM_ATTRIBUTES_OK( eventHeaderStruct.eventTypeAttribute ) check the validity of the system attributes of the given attribute pattern • RESET_USER_ATTRIBUTES(eventHeaderStruct.eventTypeAttribute) reset (clear) the user attributes of the given attribute pattern, leaving the system attributes untouched • SET_USER_ATTRIBUTE( eventHeaderStruct.eventTypeAttribute, attributeId ) set the bit corresponding to the given attributeId (user) in the given attribute pattern • CLEAR_USER_ATTRIBUTE( eventHeaderStruct.eventTypeAttribute, attributeId ) clear the bit corresponding to the given attributeId (user) in the given attribute pattern • FLIP_USER_ATTRIBUTE( eventHeaderStruct.eventTypeAttribute, attributeId ) flip (xor) the bit corresponding to the given attributeId (user) in the given attribute pattern • TEST_USER_ATTRIBUTE( eventHeaderStruct.eventTypeAttribute, attributeId ) return TRUE if the bit corresponding to the given attributeId (user) is set in the given attribute pattern • COPY_USER_ATTRIBUTES( eventHeaderStruct.eventTypeAttributeFrom, eventHeaderStruct.eventTypeAttributeTo ) copy the user field of the “from” attribute pattern into the user field of the “to” attribute pattern leaving the system attributes unmodified The following system attributes are currently defined at the DATE level: • ATTR_P_START phase start, used for START_OF_RUN and END_OF_RUN events • ATTR_P_END phase end, used for START_OF_RUN and END_OF_RUN events • ATTR_START_OF_RUN_START synonym for ATTR_P_START • ATTR_START_OF_RUN_END synonym for ATTR_P_END • ATTR_END_OF_RUN_START synonym for ATTR_P_START • ATTR_END_OF_RUN_END synonym for ATTR_P_END ALICE DAQ and ECS manual Data format 30 • ATTR_EVENT_SWAPPED set when the base header of the given event has been swapped (different endianness). The header extension and payload of the events have not been swapped • ATTR_EVENT_PAGED set for paged event, unset for streamlined events • ATTR_SUPER_EVENT set for events created on GDCs • ATTR_ORBIT_BC set when the eventId follows the COLLIDER mode encoding, not set for FIXED TARGET mode encoding • ATTR_KEEP_PAGES set when the data pages (carrying the payload) of the event are not to be disposed after the event is recorded • ATTR_HLT_DECISION set when the payload of the event starts with an HLT Decision record • ATTR_BY_DETECTOR_EVENT set when the event has been created via a “monitoring by detector” scheme • ATTR_EVENT_DATA_TRUNCATED set when the payload of the event has been truncated due to insufficient buffer space • ATTR_EVENT_ERROR set if the base header of the given event is syntactically incorrect 3.5.10 eventLdcId and eventGdcId The eventLdcId field contains the ID (according to the DATE role database) of the LDC source of the event. The field is loaded with VOID_ID if the event has not been created on a LDC. The eventGdcId contains the ID (according to the DATE role database) of the GDC source of the event or of the GDC destination of the event. The symbols HOST_ID_MIN and HOST_ID_MAX are available, as well as the symbol VOID_ID. No LDC or GDC can be assigned the ID VOID_ID. 3.5.11 eventTimestampSec and eventTimestampUsec The eventTimestampSec and eventTimeStampUsec fields contain the host system time taken the moment the event is created (trigger arrived on the LDC, first sub-event received on the GDC, event ready for monitoring by detector) split in two 32-bit parts: seconds (eventTimestampSec) and milliseconds (eventTimeStampUsec). The eventTimestampSec field may eventually have been truncated to 32 bit (if the size of the “time_t” unit on the generating host is > 32 bit) and must be assigned to a native time_t entity prior of using it (see below). For more system-specific details concerning these two fields, check the definition of the system call “gettimeofday” and of the system structure “timeval”. These ALICE DAQ and ECS manual The super event format 31 fields can also be used with the standard Unix system library for printing and for handling (see the definition of the system call “time”). For portability issues across different platforms, this field must be copied into a variable of type “time_t” - as defined by thesystem include file before using it. Failure to do so may give unexpected results and may terminate the calling process. This procedure takes care of issues such as sizing and signess of the two fields. The content of these fields may be inaccurate due to clock drifts, system clock adjustment, and latencies (hardware and software), both within the same machine and across different machines. If a more accurate timestamp is required, we recommend to use the LHC clock instead (as available in the eventId field). 3.6 The super event format The output of a DATE system is a stream of events. These can be created either by a LDC or by a GDC. In the second case, the events are marked as super events. Super events have the same structure as events or sub-events: their payload however is guaranteed to contain a series of one or more sub-events. The data format structure described before applies to sub-events and to super events. Each event will include a header and a data block. In the cases of a super event assembled by the eventBuilder, the data block is itself subdivided into sub-events. Each sub-event will include a header and a data block. The eventBuilder assembles the sub-events pertaining to the same event and prepends one header relative to the complete event. An example of this representation, with two LDCs (IDs 5 and 7) merging on one GDC (ID 1) is shown in Figure 3.7. The sub-event refers here to the data read-out by one LDC and assembled later on by one GDC. The super event refers here to the full set of data collected by a DAQ system for an event uniquely identified by a eventType-eventId pair. Events can be decoded using the same algorithm. Super events, however, can have the same algorithm applied to their payload, where the payloads split into blocks of one sub-event each. ALICE DAQ and ECS manual Data format 32 eventSize: 400 + 120 + EVENT_HEAD_BASE_SIZE eventSize: 400 eventType: PHYSICS eventHeadSize: EVENT_HEAD_BASE_SIZE eventId: 0/0/1 eventType: PHYSICS eventTriggerPattern: 1 eventId: 0/0/1 eventDetectorPattern: 1+2 eventTriggerPattern: 1 eventLdcId: 5 eventDetectorPattern: 1+2 eventGdcId: 2 eventTypeAttribute: ATTR_SUPER_EVENT PayloadA eventLdcId: VOID_ID eventGdcId: 2 eventSize: 400 eventType: PHYSICS eventId: 0/0/1 eventTriggerPattern: 1 eventDetectorPattern: 1+2 eventLdcId :5 eventGdcId: 2 PayloadA eventSize: 120 eventSize: 120 eventType: PHYSICS eventType: PHYSICS eventId: 0/0/1 eventId: 0/0/1 eventTriggerPattern: 1 eventTriggerPattern: 1 eventDetectorPattern: 1+2 eventDetectorPattern: 1+2 eventLdcId: 7 eventLdcId: 7 eventGdcId: 2 eventGdcId: 2 PayloadB PayloadB Figure 3.7 The full event format ALICE DAQ and ECS manual The complete file format 33 3.7 The complete file format The Table 3.2 shows the sequence of records constituting a complete DATE raw data file. Table 3.2 The successive list of records in a data file generated by DATE Event type Event attribute Comments START_OF_RUN ATTR_P_START Unique Zero or more records START_OF_RUN_FILES START_OF_RUN ATTR_P_END Unique START_OF_BURST Optional, unique per each burst START_OF_DATA Zero or one record PHYSICS_EVENT Zero or more records CALIBRATION_EVENT Zero or more records SYSTEM_SOFTWARE_ TRIGGER_EVENT Zero or more records DETECTOR_SOFTWARE_ TRIGGER_EVENT Zero or more records END_OF_DATA Zero or one record END_OF_BURST Optional, unique per each burst END_OF_RUN ATTR_P_START Zero or more records END_OF_RUN_FILES END_OF_RUN Unique ATTR_P_END Unique Of the above events, only the START_OF_RUN*1 and END_OF_RUN*2 are to be found in all runs. It is possible to have empty runs (without START_OF_DATA, END_OF_DATA, PHYSICS, CALIBRATION, SYSTEM_SOFTWARE_TRIGGER orDETECTOR_SOFTWARE_TRIGGER events). START_OF_BURST and END_OF_BURST events shall be used only when burst-like beam structure is available (typical of fixed target installations). 1. START_OF_RUN with ATTR_P_START and ATTR_P_END and START_OF_RUN_FILES 2. END_OF_RUN with ATTR_P_START and ATTR_P_END and END_OF_RUN_FILES ALICE DAQ and ECS manual Data format 34 3.8 Decoding and monitoring on different platforms Information of all kind is exchanged between computers’ memories. The way these computers order their memory may differ. Most of the times, they will follow either the Little-Endian scheme or the Big-Endian scheme [13]. Little-Endian (LE) computers assign bit 0 to the Least Significant Byte (LSB) of their word and the top bit to the Most Significant Byte (MSB) of their word. Big-Endian (BE) computers do just the opposite: bit 0 is in the MSB and the top bit is in the LSB. It is evident that exchanging data between LE and BE computer (either via network channels or through files - shared or on permanent media) can create several problems when memory ordering becomes important. Due to efficiency and practical constraints, data acquisition systems based on DATE will have to handle transfer of data between LE and BE computers on their own and on a case-by-case basis. The conversion is necessary every time an event is decoded on a platform of different endianness from the platform where the event was created. A short, non-exhaustive list of platforms that could take part in such a process is given in Table 3.3. Table 3.3 Commonly used platforms and their endianness Platform Endianness type Intel (x86, Pentium) Little-Endian COMPAQ (HP) ALPHA Little-Endian Motorola PPC Big-Endian Sun SPARC Big-Endian HP PA-RISC Big-Endian SGI IRIX Big-Endian IBM RS6000 Big-Endian Programs decoding raw events may have to detect the need for swapping. If portability is not an issue (programs will always run on the same type of platform and data will always be generated on the same type of platform) a rigid swapping policy - if needed - can be systematically applied (always swap events without checking). Programs that may run on different platforms (e.g. generic monitoring programs or roaming programs who may migrate from computer to computer) will have to check on the fly for the appropriate swapping policy. When data is transferred via a DATE library (monitoring or eventBuilder), a check is always performed on the header of all the events. If the need for swapping is detected, the DATE library adjusts the event header and sets the ATTR_EVENT_SWAPPED bit of the type field accordingly. Please note that only the event header is “adjusted”: the data portion of the event remains in its original ALICE DAQ and ECS manual Decoding and monitoring on different platforms 35 status. Monitoring programs can use the ATTR_EVENT_SWAPPED bit to trigger their internal swapping algorithm and correct the data portion of the event. When data is fetched directly from a DATE stream (pipe, file or socket), then programs should check the magic field of the event header. When this field is equal to EVENT_MAGIC_NUMBER no swapping is needed. When this field is set to EVENT_MAGIC_NUMBER_SWAPPED swapping of the event header as well as of the event data will be needed. Examples of these two checks can be found in Listing 3.1. Listing 3.1 Detecting swapping of the event data 1: 2: 3: 4: 5: 6: /*** Examples of detection of different endianness data ***/ struct eventHeaderStruct header; /* Load the header structure (not shown) */ if (TEST_SYSTEM_ATTRIBUTE(header.eventTypeAttribute, ATTR_EVENT_SWAPPED)) 7: printf( “Swapping needed (SWAPPED bit)\n” ); 8: 9: if (header.eventMagic == EVENT_MAGIC_NUMBER_SWAPPED) 10: printf(“Swapping needed (MAGIC_SWAPPED)\n”); Special situations may arise when LDCs and GDCs are of different endianness. In this case, the eventBuilder process detects the need for swapping and adjusts the sub-event header (setting the ATTR_EVENT_SWAPPED bit of the sub-event header type field). As not all the LDCs may be of the same type, a different swapping policy may have to be applied on a sub-event by sub-event basis. How to cope with the swapping of data depends on the content of the event itself. Due to the way events are transferred (via network channels, files or on permanent data storage), we must apply different treatment for 8 bit entities (characters, RS-232, small I/O channels), 16 bit entities (such as CAMAC data), 32 bit entities (DATE event headers, VMEbus, PCI, wide I/O channels) and 64 bit entities (wide PCI, very wide I/O channels). In our experience, 8 bit entities do not need any swapping; bigger data entities (16, 32, 64 bit) need some sort of conversion that depends on the data internal structure. To facilitate the swapping process, the DATE monitoring library provides the monitorSetSwap entry. This entry will apply (if needed) the given swap policy to entire events, assuming they contain only and always 8, 16 or 32 bit entities. Events with non-uniform data content must be swapped with ad-hoc algorithms. We suggest to write a small data file with a couple of good examples of events and debug the decoding/monitoring swapping scheme using this file, then move to the production platform. We also strongly recommend to always check for the need of the swapping of data (using one of - or both - the methods illustrated in Listing 3.1), as the same stream may take different routes and therefore undergo to the swapping process more than once. ALICE DAQ and ECS manual Data format 36 3.9 The Common Data Header All events sent over the ALICE DDL must be prepended by a Common Data Header as defined in [2] and refined in [15]. The main component of the definitions dedicated to the Common Data Header is the structure commonDataHeaderStruct, defined in Table 3.4. Table 3.4 Common data header structure Name Type Content cdhBlockLength unsigned:32 Length of the block cdhVersion unsigned:8 Version ID of the CDH cdhL1TriggerMessage unsigned:10 Level 1 trigger message cdhEventId1 unsigned:12 Bunch-crossing field of the event ID cdhEventId2 unsigned:24 Orbit-number field of the event ID cdhMiniEventId unsigned:12 BC counter at the moment of the L1 trigger signal cdhBlockAttributes unsigned:8 Attributes of the block cdhParticipatingSubDetectors unsigned:24 Pattern of sub-detectors cdhStatusAndErrorBits unsigned:16 Status and error bits cdhTriggerClassesHigh unsigned:18 Trigger classes (high 18 bits) cdhTriggerClassesLow unsigned:32 Trigger classes (low 32 bits) cdhRoiHigh unsigned:32 Region Of Interest (high 32 bits) cdhRoiLow unsigned:4 Region Of Interest (low 4 bits) All the definitions given in the table are relative for Little-Endian architectures. All fields can be directly handled by 32- and 64-bit CPUs, including handling of bit patterns and bit masks. All symbols ending with a _BIT suffix refer to a bit number (LSB:0). The size of the common data header structure is defined in the compilation constant CDH_SIZE. The structure contains several fields that must be set to zero. Those fields are not specified in the above table but can be found in the definitions given by the DATE event.h include file. For compatibility with future versions of the Common Data Header, we recommend - for newly allocated structures - to zero the whole structure and then set/update the fields that have to be set/updated. Using this method, the Must Be Zero fields and the not handled fields will always be zeroed, independently from their location and from their definition. ALICE DAQ and ECS manual The equipment header 37 3.9.1 Common Data Header version The version of the Common Data Header as defined during the compilation of the handling module is given in the constant CDH_VERSION. This constant is an incremental number and can be used for arithmetic comparisons. New versions of the Common Data Header will be marked with newer version IDs. Code setting and/or using the common data header should always check the version ID found in a common data header vs. the version ID defined during the compilation stage. When a mismatch is found, this must trigger either an error condition or (if possible) a translation between the two versions. 3.9.2 Status and Error bits The status and error bits - given in the cdhStatusAndErrorBits field of the commonDataHeaderStruct structure - can - for the Common Data Header version 1 - carry the information as given in Table 3.5. Table 3.5 Common Data Header Status and Error bits Name Status/ Error Content CDH_TRIGGER_OVERLAP_ERROR_BIT Error L1 received while processing another L1 CDH_TRIGGER_MISSING_ERROR_BIT Error L1 received when no L0 has been received CDH_CONTROL_PARITY_ERROR_BIT Error Control parity error (instruction and/or address) CDH_DATA_PARITY_ERROR_BIT Error Data parity error CDH_FEE_ERROR_BIT Error Front-end electronics error CDH_TRIGGER_INFORMATION_UNAVAILABLE_BIT Status Trigger information unavailable CDH_HLT_DECISION_BIT Status HLT decision available in payload CDH_HLT_PAYLOAD_BIT Status HLT payload follows CDH_DDG_PAYLOAD_BIT Status DDG payload follows The assertion of the CDH_HLT_DECISION_BIT implies the assertion of the CDH_HLT_PAYLOAD_BIT bit. Events whose Common Data Header CDH_HLT_DECISION_BIT status bit is set while the CDH_HLT_PAYLOAD_BIT is not set are considered wrong and must be rejected. 3.10 The equipment header An LDC can include one or more equipments. Each equipment is associated to one logical input channel, usually paired with one physical channel. All DATE events ALICE DAQ and ECS manual Data format 38 must describe the equipments contributing to the payload. This is done using the equipment header structure. The structure of the equipmentHeaderStruct structure is described in Table 3.6. Table 3.6 Equipment header structure Name Type Content equipmentSize equipmentSizeType total size of the payload equipmentType equipmentTypeType type of the equipment equipmentId equipmentIdType ID of the equipment equipmentTypeAttribute equipmentTypeAttributeType attributes of the payload equipmentBasicElementSize equipmentBasicElementSizeType size of the basic element for the equipment We will now review the individual fields of the equipment header structure. 3.10.1 equipmentSize This field contains the combined size of the payload created by the equipment. This size does not include the equipment header whose size is fixed. The value should be aligned to a 32 bits boundary. 3.10.2 equipmentType/equipmentId The type and ID of the equipment as defined in the DATE site configuration. 3.10.3 equipmentTypeAttribute The type attributes associated to the equipment. The same rules, symbols and macros as for the eventTypeAttribute are applicable. 3.10.4 equipmentBasicElementSize The size of the basic element accepted by the equipment itself. This field is mainly used to adjust the content of the payload when crossing endianness boundaries. 3.11 Paged events and DATE vectors DATE paged events must provide two main capabilities: a. support for multi-page payloads with multiple data pools ALICE DAQ and ECS manual Paged events and DATE vectors 39 b. support for efficient exchange of events between different processes To achieve these capabilities the eventVectorStruct entity has been defined as by Table 3.7. Table 3.7 Event vector structure Name Type Content eventVectorBankId eventVectorBankIdType ID of the bank supporting the pointed entity eventVectorPointsToVector unsigned type of the pointed entity eventVectorSize eventVectorSizeType size of the pointed entity eventVectorStartOffset eventVectorOffsetType start offset of the pointed entity The eventVectorStruct is used to point to an entity, vector or payload. It fully describes the pointed entity. A NULL vector has eventVectorSize set to zero. Entities are pointed by a bankId-startOffset pair: the bank ID is a unique identifier defined by the DATE database/banksManager packages according to the run-time configuration of the DATE site. When the pointed entity is a vector, the eventVectorPointsToVector field of the pointer is TRUE and the eventVectorSize contains the number of entries of the pointed vector. When the pointed entity is a data page, the eventVectorPointsToVector of the pointer is FALSE and the eventVectorSize contains the size of the data page. An example of use of the above structure is given inFigure 3.8 where an event with 6 payloads spreading over two banks is described. The event is made of PayloadA through PayloadF for a total of 1652 bytes. ALICE DAQ and ECS manual Data format 40 First level vector bankId 0 bankId 2 bankId 1 pointsToVector: TRUE pointsToVector: TRUE pointsToVector: FALSE size: 3 size: 2 size: 36 startOffset: 20 startOffset: 600 startOffset: 16 +20 Bank 0 bankId: 1 bankId: 1 bankId: 1 pointsToVector: FALSE pointsToVector: FALSE pointsToVector: FALSE size: 56 size: 88 size: 400 startOffset: 76 startOffset: 528 startOffset: 576 +16 Bank 1 +52 +76 PayloadF +132 +148 PayloadA +164 +528 PayloadD PayloadB +616 +576 +976 PayloadC +600 Bank 2 bankId: 1 bankId: 3 pointsToVector: FALSE pointsToVector: FALSE size: 16 size: 360 startOffset: 148 startOffset: 308 +360 +668 PayloadE Bank 3 Figure 3.8 Example of use of DATE event vectors To complete the definition of paged event we must define the vectorPayloadDescriptorStructure. This structure defines all components of paged events following the base event header. Its format described in Table 3.8. Table 3.8 Payload descriptor structure Name Type Content eventNumEquipments eventNumEquipmentsType Number of equipments contributing to the payload eventExtensionVector Single entry eventVectorStruct Pointer to the header extension ALICE DAQ and ECS manual Data pools 41 Including the above definitions, a complete paged event looks as follows: Base event header eventHeader Payload descriptor Extended header (optional) vectorPayloadDescriptorStruct Equipment header equipmentHeaderStruct Equipment vector Equipment payload or 2nd level vector Single entry equipmentVectorStruct Equipment header equipmentHeaderStruct Equipment vector Equipment payload or 2nd level vector Single entry equipmentVectorStruct Figure 3.9 Example of complete paged event 3.12 Data pools A data pool is a contiguous block of memory reserved to a particular function, e.g. the data pages available to the readout process. Each pool must be used exclusively for one function. If paged events mode is adopted, separate pools must be allocated for first level vectors, for second level vectors and for data pages. DATE systems can (and usually do) have multiple data pools. ALICE DAQ and ECS manual 42 Data format ALICE DAQ and ECS manual Configuration databases 4 All actors belonging to a DATE system need access to configuration parameters. The DATE databases package, described in this chapter, provides the relevant interfaces to the static DATE configuration information. ALICE DAQ and ECS manual 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2 Information schema . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3 The static databases . . . . . . . . . . . . . . . . . . . . . . . . 45 4.4 Other centrally stored parameters . . . . . . . . . . . . . . . . 52 4.5 The database editor . . . . . . . . . . . . . . . . . . . . . . . . 56 4.6 Example of a DAQ system . . . . . . . . . . . . . . . . . . . . 63 4.7 The programming interface . . . . . . . . . . . . . . . . . . . . 68 Configuration databases 44 4.1 Overview Every DATE system can be fully defined by several pieces of information, consisting of a static part (described in a MySQL database and editable with editDb, see Section 4.5) and a dynamic part (for run-specific configuration parameters, accessible via the runControl Human Interface, see Section 14.5). Static information is mostly hardware related and valid across many consecutive runs: it includes definitions for available hosts (LDCs, GDCs, EDMs, etc.), readout links, detectors and triggers setup, and DATE components parameters. Based on these static definitions, the operator can then choose a specific dynamic configuration (set of hosts and their run-time parameters) for a given run. The DATE database package (dateDb) provides an access layer to the static configuration, whereas the dynamic part is handled internally by the relevant DATE packages. 4.2 Information schema The DATE static configuration is stored in a MySQL database. A graphical editor is provided to enter data. To create the required database structure in MySQL, in a database named DATE_CONFIG by default, you need to define the database access parameters (see Section 2.1 and Section 4.4.4), execute the DATE setup procedure, and then type: > ${DATE_DB_BIN}/createtables This utility creates the tables structure. The existing configuration stored in MySQL is destroyed. It is recommended to rather use the newDateSite.sh which creates a working DATE_SITE directory and the corresponding DATE_CONFIG database from scratch, populating it with a minimal running set of local roles. It is then easy to augment the configuration with more roles. For convenience, the configuration database can be backed up using the command ${DATE_DB_DIR}/dbBackup.sh. It creates a SQL dump of the full content, which is handy in case one needs to recover from a hardware failure or a wrong operation on the data. It can easily be reloaded in an empty database using the source filename.sql syntax from the mysql client command line. Figure 4.1 describes the current database structure and the relations between the main tables. Details about the semantics of each table are given in the following sections. ALICE DAQ and ECS manual The static databases Figure 4.1 45 DATE configuration database structure - main tables. 4.3 The static databases The DATE static configuration items are grouped in different families, historically named ‘databases’. Information was originally stored in flat ASCII files, and it was moved to a MySQL database storage starting from DATE V5. To follow the historical design conventions (and the vocabulary still in use in the older part of the DATE source code), what we describe below as ‘databases’ are the original categories of static parameters used to describe a DATE setup, despite the fact that all of them are now stored in the same and unique DATE_CONFIG database hosted on a MySQL server, mapping the information in the same way it was before in flat files. Newer categories of configurable items are described in Section 4.4. ALICE DAQ and ECS manual Configuration databases 46 The static databases include: • the roles database: it contains the definitions of all the logical entities part of a given DATE_SITE: LDCs, GDCs, EDM hosts, detectors and trigger masks; • the triggers database: it defines the detectors involved in each trigger mask; • the detectors database: it defines the composition of each detector and/or sub-detector in terms of sub-detectors and/or LDCs; • the event building control database: it defines the event-building strategies to be applied by the eventBuilder process; • the banks database: it defines the memory banks to be provided on the hosts defined in the roles database. The information stored in each database is, in principle, static, i.e. it evolves slowly according to hardware or experimental conditions changes. The static databases always describe a superset of the actual run-time configuration. An entity must be defined in the appropriate database to be able to participate in a run. The actual list of run-time actors is selected from these databases. The current content of the static databases can be retrieved using the command ${DATE_DB_BIN}/dumpDbs. The output of this utility looks as shown in Section 4.6. This tool also verifies the consistency of the database information (in particular, the references between tables) and may be useful for debugging purposes. 4.3.1 Terminology and assumptions The following terms are used throughout this document: • Role: role of the entity (LDC, GDC, EDM, etc.). • Name: unique name of the entity defined by the given record. This name must be unique across all roles of the database. It is not case sensitive. • ID: identifier associated to the entity. Defined in the roles database, it can be associated to the corresponding bit of a bit mask or of a bit pattern. Each ID is unique within its role. Entities of different roles may share the same ID. An ID can assume any value between its associated min ID (included) and its associated max ID (included). Not all the values in this range need to correspond to an entity: it is possible to have IDs with no associated entity. • Bit mask: a set of bits with one and only one bit set: this bit corresponds to a given ID and it can be tested using the DB_TEST_BIT macro. A bit mask can be described by the same type and size of storage as the corresponding bit pattern. • Bit pattern: a set of bits with any number of bits (including none) set. Each bit of a bit pattern correspond to a given ID and can be tested using the DB_TEST_BIT macro. A bit pattern can be considered like the combination of zero or more bit masks (of equal semantic) and it is described by the same type and size of storage as the equivalent bit mask. • Min ID and Max ID: minimum and maximum value assumed by IDs associated to a given role. For any given role, an entity exists with ID equal to ALICE DAQ and ECS manual The static databases 47 min ID and max ID and no entities exist with ID smaller than min ID or greater than max ID. The max ID definition is dynamic and is - in principle unlimited. However, three implicit limitations are given by: 1. the architecture where the code is executed (see the constant INT_MAX defined in limits.h); 2. the corresponding (if any) entity part of the event header (eventTriggerPattern, eventDetectorPattern, eventLdcId, eventGdcId); 3. for all hosts, the HOST_ID_MIN and HOST_ID_MAX definitions given in ${DATE_COMMON_DEFS}/event.h (currently equal to 0 and 511). • maskElementType: the basic type used to describe a bit mask or a bit pattern. The DATE event header imposes the following rules on the IDs ranges: • trigger patterns must correspond to ID values in the [EVENT_TRIGGER_ID_MIN..EVENT_TRIGGER_ID_MAX] range; • detector patterns must correspond to ID values in the [EVENT_DETECTOR_ID_MIN..EVENT_DETECTOR_ID_MAX] range; • LDC IDs and GDC IDs must be in the [HOST_ID_MIN..HOST_ID_MAX] range. The corresponding constants are defined in ${DATE_COMMON_DEFS}/event.h Each DATE site may define its IDs within the limits imposed by the DATE event header and in respect of the rules described above. IDs are usually assigned automatically by editDb. 4.3.2 The roles database The roles database is used to define all entities part of a DATE system. The definition is given using a unique role-ID scheme, where role can be one of LDC, GDC, EDM, TRIGGER_MASK, TRIGGER_HOST, DETECTOR, SUBDETECTOR, DDG, FILTER, MON and ID is a positive integer (corresponding to a bit in all the relative masks and patterns). Once the definition is complete, each component can be uniquely identified either by its role plus ID or by the presence of the associated bit in a related mask or pattern. The DDG,FILTER and MON roles are used by runControl to start some extra processes at run-time. See Chapter 14 for more information on these roles. All the records of this database include a symbolic name, the associated ID and a textual description. Other optional or role-specific parameters are available. All the other DATE databases use the definitions given in the roles database to reference DATE components by name rather than by absolute value. Therefore, to fully decode a record belonging to another database, a scan of the roles database is implicitly required. This is also needed to size the variable-size entities, such as bit masks and bit patterns. Entities that should be directly selectable from the runControl Human Interface can be given a topLevel attribute. When this attribute is set to “Y” ALICE DAQ and ECS manual Configuration databases 48 the entity is under direct control of the DAQ operator. On the other hand, when the attribute is set to “N” (or not given at all) the operator has no direct control and can only indirectly act on the given entity. Let’s take an example of a DAQ system made of two detectors D1 and D2 and six LDCs L1 to L6. D1 is made of L1, L2, and L3, D2 is made of L4 and L5; L6 doesn’t belong to any detector. A possible definition of this DAQ system gives to D1, D2 and L6 the attribute topLevel to “Y” and to the L1, L2, L3, L4, and L5 the value “N”. Entities that correspond to physical hosts (e.g. LDCs, GDCs, EDMs and TRIGGER_HOSTs) may be given an optional TCP/IP hostname. This will associate the TCP/IP host to the appropriate DATE entity. We can therefore implement “virtual hosts” (e.g. detectorOneLdcOne) and associate them to the actual machine via the database. We can also have machines with multiple roles, e.g. GDC and EDM : same TCP/IP hostname and different role names. One can even have several LDC roles on the same machine if resources and readout equipment allows (e.g. Rand equipment for test setups). LDCs may be assigned a HLT role, see Section 17.2 for details. 4.3.3 The trigger database The trigger database is used to define the detectors active for each possible trigger mask. For each trigger mask, the list of detectors to be read out is given. This should be repeated for all the trigger masks defined in the roles database. Each event should have at least one trigger mask active. The information contained in this database, combined with the runControl dynamic information, gives an exact description of all the possible triggering scenarios. This corresponds to an exact list of all the detectors that transfer data over their DDL(s) for each level-2 accepted trigger. When running with a real LTU, it is recommended to define a trigger role for all possible class bits (usually 50), and associate each mask with all detectors. One can use the command ${DATE_DB_DIR}/daqDB_createAllTriggerClassMasks for this purpose. Note that the role ID of a DATE trigger mask corresponds to the class number in the trigger class mask sent by the LTU. 4.3.4 The detectors database The detectors database defines - on a detector by detector basis - the connections between the front-end equipment and the LDCs. The list of LDCs or sub-detectors belonging to a each detector and sub-detector is stored there. A host can be statically defined as part of a detector even if it is physically disconnected from it. This is the case - for example - of progressive installations or run-time disconnections/replacements. The information contained in this database, combined with the runControl dynamic information, shall return the exact set of LDCs connected to each detector during any given run. ALICE DAQ and ECS manual The static databases 49 The tree-like structure needs to be browsed recursively to know the top detector of a given role (if any). One can use the command ${DATE_DB_DIR}/getTopDetector.tcl to retrieve it. 4.3.5 The event-building control database For each event, the trigger system activates all the front-end equipments involved. This may or may not correspond to all the detectors/sub-detectors which are part of the DAQ system. The data coming from the “active” detectors is then collected by the LDCs who ship their events to one of the available GDCs. Here the event builder receives the sub-event(s) and acts on them following the directives given in the event building control database. At this moment the DATE event builder can follow three different policies: a. build: all the LDCs in the runControl dynamic database must contribute to a given event. b. no-build: LDCs can create sub-events independently from the rest of the DAQ system and these sub-events will be recorded individually. c. partial build/no-build: a well-defined set of LDCs will contribute to a given event that will be recorded either as a unique event or sub-event by sub-event. The first case requires a sub-event from all LDCs participating in the run before building the event and delivering it to the recording channel. Typical use of this policy could be start-of-run or physics records. Please note that whenever the information stored in the event header, namely the trigger pattern and the detector pattern, is valid, the event builder will use it to establish the list of contributors to the event. A “build” may become a “partial build” whenever the detector pattern contains a subset of the LDCs which are active in the system. The second case is very simple: LDCs may or may not create sub-events and these are recorded whenever the recording channel is ready. This policy could usually be applied to start-of-run-files and end-of-run-files. The last case is the more complex and needs great care. Partial event building can be driven in three ways: by source, by detector set and by trigger mask. By source means that the list of expected sub-events for a given event can be derived by the source of the sub-event itself. For example, a calibration event coming from a given detector most likely covers only that detector. In the case of ALICE, all events have an associated detector mask and trigger mask specifying (indirectly) the LDCs who are expected to provide sub-events for the given event. Therefore it is possible to declare policies solely on a “detector set by detector set” or “trigger by trigger” basis. In all cases (by source and by trigger mask) the associated policy can instructs either to build or not to build the given event, according to the requirement dictated by the DAQ system. All the rules are driven by an associated event type. It is possible to have different rules for the same event type, i.e. calibration events coming from one detector must be built while calibration events coming from another detector shall not be built. Each record in this database should specify the event type (SOR, EOR, etc.), the optional list of trigger masks, detectors or sub-detectors and the action (BUILD or NOBUILD). As many rules as needed can be given. If multiple rules can be activated ALICE DAQ and ECS manual Configuration databases 50 by the same event, the first one in order of the associated PRIORITY value is used. A rule accepts only specifiers of the same type, i.e. only trigger masks or only detectors. 4.3.6 The banks database The banks database is used to define the memory banks required by each DATE host/entity, their sizes and the support mechanism(s) with their details. Each record of the banks database contains a description of the banks to be implemented and their characteristics: type, name, size and content. The same host can implement multiple banks: one set per role and several subsets for each set. A proper definition of the memory banks should define an optimal and safe usage of the memory resources for each node of a DATE system. The available supports are: • IPC: the key is the full path of a file to be used to map to the memory segment. This file is used to create a system-wide unique key used to identify the memory segment (see “man ftok” for more details). It is not necessary to specify it: when this field is empty, a unique name is assigned. The file is then created automatically at run-time in the ${DATE_SITE_CONFIG} directory, with access permissions allowing read operation from everybody and write operation from DATE_USER_ID. It is not recommended to manually specify a key file: great care must be taken so that the key is unique for each IPC bank, in case they are used on the same machine. • PHYSMEM: the key is the device used by the physmem driver. • BIGPHYS: the key is the device used by the bigphys driver. • HEAP: the process heap will be used (no multi-process sharing is possible). The key is dummy and not used. The size gives the amount of memory to be allocated in bytes. It may also be given in kilobytes or megabytes (e.g. 10K, 1M). If the block has to be used exclusively for the DATE control block, the size can be specified as “-1”: this creates a block of the exact size needed to store the DATE control block. The elements that can be allocated are: • control: the DATE control block (needed on all hosts part of the DAQ system). • readout: all the resources needed by readout. • readoutReadyFifo: the FIFO used to transfer events out of the readout process. • readoutFirstLevelVectors: pool of first level vectors used to describe paged events. • readoutSecondLevelVectors: pool of second level vectors used to describe the data pages of paged events. • readoutDataPages: the pool for the payload of all types of events. ALICE DAQ and ECS manual The static databases 51 • edmReadyFifo: the FIFO used to transfer events out of the edmAgent. • hltAgent: all the resources needed by the hltAgent. • hltReadyFifo: the FIFO used to transfer events out of the hltAgent. • hltSecondLevelVectors: the pool of second level vectors available to the hltAgent. • hltDataPages: the pool for payloads created by the hltAgent. • eventBuilder: all the resources needed by the eventBuilder. • eventBuilderReadyFifo: the FIFO used to store events out of the eventBuilder. • eventBuilderDataPages: the pool used by the eventBuilder to store the events received from the LDCs. The readout, hltAgent and eventBuilder processes allocate all the resources they need (e.g. readout allocates readoutReadyFifo, readoutFirstLevelVectors, readoutSecondLevelVectors, readoutDataPages and edmReadyFifo). If needed, the specific resources can be tuned using the appropriate keyword, e.g.: ROLE_NAME=myldc,SUPPORT==physmem1,KEY_PATH=/dev/physmem1,SIZE =100M,PATTERN=readoutDataPages ROLE_NAME=myldc,SUPPORT=ipc,KEY_PATH=,PATTERN=readout,SIZE=1M This allocates 100 MB using PHYSMEM (device /dev/physmem1) to store the readout data pages and 1MByte using IPC for all other resources needed by the LDC role named myldc. If the same memory block has to be used for multiple purposes (e.g. ready FIFO and first level vectors), the block is split evenly between the two resources. If the same block has to be used also for data pages (readoutDataPages, hltDataPages, eventBuilderDataPages) an heuristic algorithm is used to distribute the memory for the non-data and data blocks. Data blocks are given much more space than non-data blocks. As this algorithm may not result in an appropriate partitioning, we suggest to separate data from non-data blocks and to explicitly size the two banks separately. If the same DATE host is used - under different role names - for different roles (e.g. LDC and EDM), different keys must be used, one for each role. Failure to do so may produce unpredictable results. This is done automatically for the IPC type if no key is provided. When the database defines resources that are not active at run-time (e.g. EDM when the EDM checkbutton in the runControl window is not selected), these are not allocated. However, when certain resources are first required and then not needed, DATE will not remove them. For example, if a DATE run at one given moment includes the EDM and a memory bank is allocated for the edmAgent ready FIFO, the block will remain available (unused), even if the EDM is subsequently disabled. If the removal of the block is needed, this must be done via external methods (Operating System reboot or specific procedures). The run-time configuration of the banks allocated by DATE can be dumped using the utility: ALICE DAQ and ECS manual Configuration databases 52 ${DATE_BANKS_MANAGER_BIN}/dumpBanks This tool can only run on hosts where DATE is currently running (or has run) and dumps the status of the various banks, their size and their addresses. The output of this utility reflects the run-time allocation of blocks and fifos according to the combination of static and dynamic information. The output from the utility is shown in Listing 4.1. . Listing 4.1 Example DATE banks dump 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: > ${DATE_BANKS_MANAGER_BIN}/dumpBanks ldc rcShm: @0x40195000 offset:0 size:11088 bank:0 readoutReady: @0x40197b50 offset:11088 size:1037488 bank:0 readoutFirstLevel: @0x40295000 offset:0 size:1048576 bank:1 readoutSecondLevel: @0x40395000 offset:0 size:1048576 bank:2 readoutData: @0x40495000 offset:0 size:262144000 bank:3 edmReady: NOT AVAILABLE hltReady: NOT AVAILABLE hltSecondLevel: NOT AVAILABLE hltData: NOT AVAILABLE eventBuilderReady: NOT AVAILABLE eventBuilderData: NOT AVAILABLE physmem: @0x10060000 bank:3 FIFOs: readoutReadyFifo == recorderInputFifo edmAgent: disabled (0) hltAgent: disabled (0) This example describes an LDC where the DDL is active. The first bank (bank number 0) is used for the DATE control block (line 2) and the readout ready FIFO (line 3). One bank is allocated for the first level vectors (line 4) and another bank for the second level vectors (line 5). Finally the PHYSMEM - declared as bank 3 (line 13) - is used to store the readout data pages (line 6). EDM, HLT and event builder are not active on this node (lines 7-12 and 15). Finally there is one FIFO connecting the output of readout to the input of recorder (line 14). 4.4 Other centrally stored parameters In addition to the above ‘static databases’ describing the architecture and relations between the run-time entitites, the numerous DATE distributed processes also need some common parameters centrally defined and accessible to all of them Items that can be modified by users are grouped in two families of DATE information: the Environment parameters and the Files. The details about the package-specific configuration items is not described here but in the corresponding packages chapters. A third category, Detector Files, is devoted to store files handled by the detector software (electronics initialization scripts, calibration procedures), but not used by DATE packages. There are also a few tables meant to store internal DATE persistent parameters, not supposed to be modified directly. This is the case of the GLOBALS, SOCKETS, and DETECTOR_CODES tables. Finally, the information related to readout equipments is saved in a set of tables specific to each kind of equipments: DDLin, DDLout, EQUIP_PARAM_...(one ALICE DAQ and ECS manual Other centrally stored parameters 53 table per type of equipment), EQUIP_TYPES, EQUIP_TYPES_FORMAT. This information is accessible with editDb. The runControl also uses a dedicated table to store its run parameters: this is the runControl_runParameters table, editable through the runControl Human Interface. 4.4.1 DATE globals This table stores only a few hidden (i.e. not supposed to be modified) values: • DB version : tags the database structure version. It is used to check that the installed version of DATE can run with this database. A mismatch will prevent DATE to run. Either the database should be updated (check the release notes, this is done with ${DATE_DB_DIR}/upgrade.tcl), or the correct version of DATE should be installed. • LHC period : used by mStreamRecorder to store and register the files in the correct location to be retrieved by offline analysis. One can get the current value with ${DATE_DB_BIN}/getLHCperiod.sh • Run Number : the latest run number used, incremented at each start of run. There is no API to write to this table. Modifications are done directly with MySQL commands. 4.4.2 DATE sockets TCP/IP sockets and ports are used to communicate between the DATE processes. To allow having several roles on the same machine, the port number used for each type of service are not fixed but dynamically assigned at the time of the definition of the ROLES table. This table is modified automatically by editDb when new roles are added or removed. It calls ${DATE_DB_BIN}/daqDB_fillSocketTable, which assigns a port number for each service provided by a given DATE role on each machine. The port numbers are allocated within a fixed range of port numbers (DATE_PORT_MIN and DATE_PORT_MAX defined in daqDB_fillSocketTable.c, typically between 6001 and 6100). There is in principle no need to access (read or write) this information. The DATE services retrieve the information at run-time using the dbGetPort()function defined in ${DATE_DB_DIR}/dateDbFile.h or directly from the shell utility ${DATE_DB_BIN}/daqDB_getPort. 4.4.3 DATE detector codes Depending on the context, a detector can be identified by a number (e.g. in a bit mask, or for for a DATE role ID), by a name (for human interfaces, or for a DATE role name) or by a 3 letter code. The table to convert between one form and the other is fixed, and defined in ${DATE_DB_DIR}/detCodes.h This header file also provides means to retrieve quickly the information from a in-memory table. However, for convenience in SQL queries, the same information is also stored in the database DETECTOR_CODES table. ALICE DAQ and ECS manual Configuration databases 54 Consistency between the two is ensured with the utility ${DATE_DB_BIN}/daqDB_fillDetectorCodes producing the corresponding statements to populate the DB at creation time, and ${DATE_DB_BIN}/daqDB_fillDetectorRoles, which outputs the statements needed to create the corresponding DATE roles. The output of both utilities in included in the DB creation script, which should be updated accordingly whenever the hardcoded list is modified. It is very important that the detector code matches the role id in the roles database (this is ensured by the initial DB populating script). When adding a detector manually, editDb tries using the detector ID as role ID, if not already used elsewhere. 4.4.4 DATE Environment The Environment table stores system environment variables that may be loaded at run-time. Each record consists of a name (the name of the environment variable), a value, a class telling in which context it is used ( one of General Database Infologger User), and a flag LOAD_BY_DEFAULT specifying if it should be loaded by the global DATE setup procedure. Default entries are populated when the configuration database is created. Further entries can be added manually, under the User class only. Example variables include access parameters to the databases (configuration, logging, logbook, AMORE), the DIM DNS node, the path to the File Exchange Server, etc. For performance reasons, care should be taken not to extend the run-time environment with unnecessary variables loaded by default and used only by a few processes. Other methods exist to store configuration information related to a limited number of processes (see next section). All variables needed in the DATE environment should be defined here. The only exception being the initial access parameters to the database which need to be put in the file ${DATE_SITE_PARAMS}. These parameters are needed by the DATE setup script to load all other environment variables (and are overwritten in this process by the ones defined in the database). It is the only information that needs to be distributed manually on all the DATE hosts, everything else being then available from the central database. ${DATE_SITE_PARAMS} should contain the definition (on each line, one variable name and its value separated by a space) of DATE_DB_MYSQL_USER, DATE_DB_MYSQL_PWD, DATE_DB_MYSQL_HOST, DATE_DB_MYSQL_DB. It is populated automatically by the script creating a new DATE site. The variables with the LOAD_BY_DEFAULT flag set are loaded in the environment by the DATE setup, using the loadEnvDB.tcl tool. This script prints the commands necessary to load the corresponding variables in the environment for bash and csh, and allows to filter them by class. Use ${DATE_DB_DIR}/loadEnvDB.tcl -h for details on the available options. ALICE DAQ and ECS manual Other centrally stored parameters 55 4.4.5 DATE Files The Files table stores any kind of binary content. It can be seen as a shared filesystem, available from all DATE components. Each entry is made of a PATH to identify the file (usually with a directory-like structure to sort the information), an optionnal HOST (in case of a file specific to a given host or role; this can be empty if it is of general use), a VALUE (it can be binary, but is usually textual for configuration files), a DESCRIPTION of the content, and a CLASS (General for default resources, or User for the ones added later). The unique key to access the data is the couple PATH - HOST. There are two ways to access (read or write) the content of a file from a DATE process. The first involves direct access to the MySQL table and issues SQL queries loading the file in memory. The second is done with the shell utility ${DATE_DB_DIR}/copyFileDB that allows to copy a file from the database to the local disk. Its content can then be read by classical means. Files with a relative path (not starting with ‘/’) are loaded to/from ${DATE_SITE} (or ${DATE_SITE}/${DATE_HOSTNAME} for host specific files). The script either takes a local file and stores it in the database, or copies to the local disk a file from the database. The -help command line option gives an exhaustive list of possible options. This tool is mostly used to retrieve files from the database, whereas editDb (Section 4.5) provides a user friendly way to store and edit files in the database. In the case of a file storing key/value pairs, the API provided by the header ${DATE_DB_DIR}/dateDbFile.h offers and easy way to load the file and access the values. A command line tool, ${DATE_DB_BIN}/dumpDbFile, is based on this interface and gives a listing of the parsed file contents. 4.4.6 DATE Detector Files A table named DETECTOR_CFG_FILES stores all the files for each detector defined in the detector codes table. A view DETECTOR_CFG_XXX is also created to selectively access the files of a given detector, XXX being the detector code. This table structure is optional and not needed to run DATE. To create and remove it, the following utilities may be used: ${DATE_DB_DIR}/daqDetDB_create and ${DATE_DB_DIR}/daqDetDB_destroy. A set of shell-like tools are provided to list/get/store/remove the files available: ${DATE_DB_DIR}/daqDetDB_ls, ${DATE_DB_DIR}/daqDetDB_get, ${DATE_DB_DIR}/daqDetDB_store, ${DATE_DB_DIR}/daqDetDB_remove. A graphical interface, ${DATE_DB_DIR}/daqDetDB_browser, is also provided to edit the files. To use some of the commands above, the environment variable $DATE_DETECTOR_CODE must be defined in order to access the files of the given detector. 4.4.7 DATE readout equipment tables The readout equipment configuration defines the readout system on the LDCs. This item is not in the dateDb package, but is part of the static configuration stored ALICE DAQ and ECS manual Configuration databases 56 in the database tables. The details about the various parameters are described in the relevant hardware chapters. The DDLin table holds the mapping between DDL ids (e.g. used offline) and their space-optimized numbering used for the HLT to DAQ protocol (single bit mask reporting all the links).The static mapping used for the decoding of HLT decisions is defined in ${DATE_DB_DIR}/dbHLTmask.c and the corresponding SQL statements to populate the database created by ${DATE_DB_BIN}/dbHLTmask. These statements are again included in the DB creation script every time the static definition changes (e.g. new detector or links). It usually goes together with an update in the HLTagent protocol. In order to verify the consistency of the readout links and the correct cabling, the DDLin table has a field to register the remote SIU IDs. This is not used at run-time by DATE, but provides a convenient way to track changes in the cabling or hardware configuration, if needed. The command ${DATE_DB_DIR}/checkSIUs.tcl allows to take a snapshot of the current hardware setup, and then check for changes. This is especially useful to notice cabling errors after detectors shutdown periods. Note that for this procedure all the SIUs should be up and running, and the DDL not used by other processes (in particular, it would not work during a run or an electronics configuration via DDL in progress). To communicate with the electronics through the DDL, it is needed to know what RORC should be used on the client to access a given detector equipment ID. This information can be read from the database with shell commands ${DATE_DB_DIR}/daqDB_getRorcFromEqId, ${DATE_DB_DIR}/daqDB_getRorcsFromLDC and ${DATE_DB_DIR}/getDdlLinks.sh.The necessary details to open the link (e.g. RORC serial number and channel) are returned by these tools, with different flavours of queries and filters. 4.5 The database editor The configuration can be edited with the graphical user interface named editDb. It is a Tcl/Tk application using SQL transactions to display and update the DATE database content (actually, only the subset accessible to users and directly related to the DATE static information). This tool relies on the tables definition and semantics of the database structure at the time it was developped, in order to provide high-level consistency checks and simple editing. This chapter describes the features of the human interface. For a description of the configuration parameters, please consult Section 4.3. To launch editDb, type: > editDb All the menus have a Commit and a Rollback button. The configuration database is actually changed only after you click on the Commit button. You can edit the parameters, and then undo all the modifications made since last Commit with the Rollback button. ALICE DAQ and ECS manual The database editor 57 The editDb interface starts with the roles configuration display, as shown in Figure Figure 4.2. Use the buttons at the top of it to select a configuration item. The current one is highlighted in red. All the configuration displays share the function buttons at the bottom of the display, however, some configuration displays have extra controls. To exit editDb, click on the Quit button. You may quit only if all the changes to the database have been applied or canceled. Figure 4.2 The initial editDb view. To create a new role, click on the New button. This allows you to enter the details in the entry fields on the right hand side of the display. Once you are finished entering the details for the new role, click on the Add button. This will cause the new role to be added to the database. It will appear in the roles list on the left of the display. Now you are able to click on the Commit or Rollback button, to apply or to undo the changes in the database. To delete a role, first select it in the roles list on the left of the display, then click on the Delete button, finally click on the Commit button to accept the changes. You can clone LDC and GDC roles. From the roles configuration display, first select the GDC or LDC you want to clone, then click on the Clone Role button. A window will pop up with some options. For cloning a GDC role you need to enter a space separated list of hostnames, as shown in Figure 4.3. Figure 4.3 GDC cloning window. For cloning an LDC role you need to enter the hostnames, choose whether to clone the LDC’s equipment and choose a detector if you want to add the cloned LDCs to it, as shown in Figure 4.4. For both LDC and GDC roles you can change the naming schema: occurences of $host in the character string are replaced by the hostname. ALICE DAQ and ECS manual Configuration databases 58 Figure 4.4 LDC cloning window. After creating a new LDC role you can add an equipment to it, either by selecting it in the roles list and clicking on the extra button View Equipment or by clicking on the Equipment configuration display button and then selecting the LDC in the list on the left of the display. Clicking on the Add button lets you first choose what type of equipment to create by selecting it in the drop down box as shown in Figure 4.5. After selecting an equipment type click on the Create button. Now you are able to enter the equipment details. Once finished you need to click on the Add button to add the equipment to the database. Click on the Commit or Rollback button to accept or to undo the addition. Figure 4.5 New equipment creation display. Once you have added some equipments to a LDC, you will be able to see their details when you select them in the equipment list as shown in Figure 4.6. You can now edit any of the equipment fields. If you edit a field you will have to click on the Commit or Rollback button before you can change to a different configuration display. Inactive equipments appear in red in the equipment listbox. ALICE DAQ and ECS manual The database editor Figure 4.6 59 Equipment configuration display. Once you have a Detector or Sub-detector role you can add components to it. Either select the detector in the roles configuration display and press on the View Components button, or click on the detectors configuration display button, then select the detector in the detector list. You should see a list of components belonging to the detector in the Made Of list, and a list of available components in the Available Components list, as shown in Figure 4.7. Figure 4.7 Detectors configuration display. Now you are able to add components by selecting them in the Available Components list and then clicking on the Add button. You can remove components by selecting them in the Made Of list and clicking on the Remove button. If you make any change, then you need to click on the Commit or Rollback button. Once you have a TriggerMask or TriggerHost role you can add components to it. Either select the trigger in the roles configuration display and press on View Components, or click on the triggers configuration display button, then select the detector in the trigger list. You should see a list of components belonging to the trigger in the Made Of list, and a list of available components in the Available Detectors list, as shown in Figure 4.8. ALICE DAQ and ECS manual Configuration databases 60 Figure 4.8 Triggers configuration display. Now you are able to add components by selecting them in the Available Detectors list and clicking on the Add button. You can also remove components by selecting them in the Made Of list and clicking on the Remove button. If you make any changes, then you need to click on the Commit or Rollback button. Clicking on the Membanks configuration display button shows the memory banks that are defined in the Membanks list, as shown in Figure 4.9. The details for the currently selected memory bank are displayed on the right. To add a new membank click on the New button. This clears the entry fields under the Membank Details label. Enter the details of the new memory bank, and then click on the Add button. You can cancel the addition of a new memory bank by clicking on the Cancel button. You can also edit the details of the currently selected memory bank. You need to click on the Commit or Rollback button when you have finished the changes. Figure 4.9 Membanks configuration display. Clicking on the Event Building configuration display button shows the event building rules that are defined in the Rules list, as shown in Figure 4.10. The details for the currently selected rule are displayed on the right. To add a new rule click on the New button. This clears the entry fields under the Rule Details label. Enter the details of the new rule, and then click on the Add button. You can cancel ALICE DAQ and ECS manual The database editor 61 the addition of a new rule by clicking on the Cancel button. You can also edit the details of the currently selected rule. Click on the Commit or Rollback button when you have finished the changes. Figure 4.10 Event building rules configuration display. Clicking on the Environment configuration display button shows you a dropdown list where you can choose the class of variables you want to see. In the list below you will see the variables defined for the current class. On the right of the display you see the details for the currently selected variable from the listbox. For all the classes except the User class you can only change the value field, as shown in Figure 4.11. Figure 4.11 Environment variables configuration display. If you select User from the dropdown list you are able to add and delete user defined variables. You can also edit the value and description fields for each variable, as shown in Figure 4.12. If the LOAD_BY_DEFAULT flag is set, the environment variable is loaded into the environment when calling the DATE setup procedure. ALICE DAQ and ECS manual Configuration databases 62 Figure 4.12 Environment variables configuration display showing user defined variables. The DATE MySQL configuration system allows to store files (ASCII or binary). This is convenient to avoid deploying a shared file system to distribute files on DATE hosts. Clicking on the Files configuration display button shows a list of file paths with the host they will be placed on, or a path alone which means the file will be placed on all DATE hosts. On the right of the display are the details for the currently selected item in the listbox, as shown in Figure 4.13. To create a new file entry click on the New button and fill in the entry fields, clicking on the Get file button brings up a file selection display which lets you select the file you want to upload. Leaving the Host entry field blank is interpreted to mean all hosts. Click on the Add button to add the details and the file to the database. Click on the Commit or Rollback button when you have finished the changes. Figure 4.13 Files configuration display. To edit a file click on the entry in the Files list then click on the Edit file button. Tthis will launch an editor where you may make changes to the file. Once you have made the changes, click save then exit the editor. Clicking on the Commit button will apply the changes to the database. SOR/EOR commands/files are automatically copied on to the target host when readout or the eventBuilder starts. ALICE DAQ and ECS manual Example of a DAQ system 63 Other files can be copied locally with the copyFileDB.tcl script (see Section 4.4.5 for details). 4.6 Example of a DAQ system We will now see how to define an example DAQ system, as described in Figure 4.14. DetOneSubTwo DetOneSubOne triggerHost1 DetOneSubThree DetOneLdc1 DetOneLdc2 DetOneLdc3 gdc1 Figure 4.14 DetTwoLdc1 DetTwoLdc2 DetThreeLdc AloneLdc gdc2 Example of a DAQ system. This DAQ system is made of three detectors, attached to six LDCs. One extra LDC has been allocated for non-detector related tasks (e.g. Trigger System). The three detectors are called DetOne, DetTwo and DetThree. The first of the three detectors (DetOne) has been partitioned into three sub-detectors: DetOneSubOne, DetOneSubTwo and DetOneSubThree. The experiment allows two triggers: one that activates all the three detectors and a second trigger that activates DetOne alone. Similarly, for calibration events we want DetOne to receive a stand-alone calibration and a second calibration to go to all the detectors. The event builders must build all PHYSICS and CALIBRATION events. This example is described in ASCII files format in order to give an idea of what parameters should be stored in the database with editDb. The files can not be used directly, but give a realistic dump of the parameters to be defined in the database to ALICE DAQ and ECS manual Configuration databases 64 implement such a DAQ system. These example files are available in the directory ${DATE_DB_DIR}/testConfig, and are shown in Listing 4.2. Listing 4.2 Example of configuration files 1: 2: 3: 4: 5: 6: 7: > cd ${DATE_DB_DIR} > ls -1 testConfig dateBanks.config dateRoles.config detectors.config eventBuildingControl.config triggers.config The first database we examine is the roles database, stored in ${DATE_DB_DIR}/testConfig/dateRoles.config. Listing 4.3 Example of roles database 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: > cat ${DATE_DB_DIR}/testConfig/dateRoles.config >LDC DetOneLdc1 1 "DetOne LDC #1" hostname=host1 DetOneLdc2 2 "DetOne LDC #2" hostname=host2 DetOneLdc3 3 "DetOne LDC #3" hostname=host3 DetTwoLdc1 DetTwoLdc2 10 "DetTwo LDC #1" hostname=host4 11 "DetTwo LDC #2" hostname=host5 DetThreeLdc 20 "DetThree LDC" hostname=host6 AloneLdc hostname=host7 topLevel=Y 30 "Single LDC" >GDC gdc1 1 "GDC #1" hostname=host8 topLevel=Y gdc2 2 "GDC #2" hostname=host9 topLevel=Y >TRIGGER_HOST triggerHost1 1 "Trigger host 1" hostname=hostT >DETECTORS DetOne 1 "Detector 1" topLevel=Y DetTwo 2 "Detector 2" topLevel=Y DetThree 3 "Detector 3" topLevel=Y >SUBDETECTORS DetOneSubOne 1 "Detector 1 Sub-detector 1" DetOneSubTwo 2 "Detector 1 Sub-detector 2" DetOneSubThree 3 "Detector 1 Sub-detector 3" >TRIGGER_MASK TriggerMask1 1 "Trigger mask 1" TriggerMask2 2 "Trigger mask 2" The LDCs declaration (lines 2-12) concerns the seven LDCs. For each machine we have a DATE name, the identifier, a short description and the hostname. Please note that most of the LDCs are not marked as “topLevel” and therefore cannot be directly selected from the runControl Human Interface. Only AloneLdc can be directly selected or deselected. The GDCs are declared in lines 14-16. All GDCs can be directly selected or deselected via the runControl Human Interface. Lines 18-19 declare the trigger host. ALICE DAQ and ECS manual Example of a DAQ system 65 The three detectors and the three sub detectors of the first detector are declared in lines 21-29. The three detectors can be (de)selected via the runControl Human Interface. Finally lines 31-33 declare the two trigger masks available in this DAQ system. The two trigger masks declarations are illustrated below: Listing 4.4 Example of trigger configuration 1: > cat ${DATE_DB_DIR}/testConfig/triggers.config 2: >TRIGGER_MASK 3: TriggerMask1 DetOne DetTwo DetThree 4: TriggerMask2 DetOne The first trigger (triggerMask1) activates all the detectors while the second trigger mask (triggerMask2) activates the first detector only. The detectors are defined as follows: Listing 4.5 Example of detectors configuration 1: > cat ${DATE_DB_DIR}/testConfig/detectors.config 2: >DETECTORS 3: DetOne DetOneSubOne DetOneSubTwo DetOneSubThree 4: DetTwo DetTwoLdc1 DetTwoLdc2 5: DetThree DetThreeLdc 6: 7: >SUBDETECTORS 8: DetOneSubOne DetOneLdc1 9: DetOneSubTwo DetOneLdc2 10: DetOneSubThree DetOneLdc3 The three detectors are defined in lines 2-5. The sub-detectors of the first detector are defined in lines 7-10. The event building policies are defined as: Listing 4.6 Example of event-building configuration 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: > cat ${DATE_DB_DIR}/testConfig/eventBuildingControl.config >EVENT_BUILDING_CONTROL SOR nobuild SORF nobuild EOR nobuild EORF nobuild PHY TriggerMask1 build PHY TriggerMask2 build PHY build CAL DetOne build CAL DetOne DetTwo DetThree build The above configuration example drives the event builder not to build SOR and EOR events (lines 3-6). Physics events triggered by TriggerMask1 will be built as well as events triggered by TriggerMask2. Physics events without trigger mask will be built from all the LDCs. Calibration events involving DetOne alone will be ALICE DAQ and ECS manual Configuration databases 66 built using this detector, while calibration events involving all detectors will be built using data coming from all LDCs. The detectors involved in each calibration trigger are extracted from the eventDetectorPattern field of the event header. The DATE banks are defined as: Listing 4.7 Example of banks configuration 1: > cat ${DATE_DB_DIR}/testConfig/dateBanks.config 2: >BANKS 3: 4: # --- LDCs --5: DetOneLdc1 IPC ${DATE_SITE_CONFIG}/LDC.key 150000 control readout 6: DetOneLdc2 IPC ${DATE_SITE_CONFIG}/LDC.key 150000 control readout 7: DetOneLdc3 IPC ${DATE_SITE_CONFIG}/LDC.key 300000 control readout 8: 9: DetTwoLdc1 \ 10: PHYSMEM /dev/physmem1 5M readoutDataPages \ 11: IPC ${DATE_SITE_CONFIG}/LDC.key 1.5M readout \ 12: IPC ${DATE_SITE_CONFIG}/LDC.key1 * control 13: DetTwoLdc2 \ 14: PHYSMEM /dev/physmem1 5M readoutDataPages \ 15: IPC ${DATE_SITE_CONFIG}/LDC.key 1.5M readout \ 16: IPC ${DATE_SITE_CONFIG}/LDC.key1 * control 17: 18: DetThreeLdc IPC ${DATE_SITE_CONFIG}/LDC.key 100K control readout 19: 20: AloneLdc IPC ${DATE_SITE_CONFIG}/LDC.key 10000 control readout 21: 22: # --- GDCs --23: gdc1 IPC ${DATE_SITE_CONFIG}/GDC.key 10M control eventBuilder 24: gdc2 IPC ${DATE_SITE_CONFIG}/GDC.key 10M control eventBuilder The banks to be implemented in the three LDCs of the first detector are defined in lines 5-7. They all contain the resources needed for control and data flow. They are all implemented using IPC shared memory and their sizes are 150000 (LDCs 1 and 2) and 300000 (LDC 3) bytes. These block are partitioned into the separate regions needed to store the DATE control block, the data pages, the FIFOs and all other resources needed by readout. The two LDCs of the second detector use PHYSMEM to allocate their readout data pages. This is the typical case for a DDL-based DAQ system. The other resources needed by the readout process are allocated using IPC via the given keys for a size of 1.5 MB. The DATE control segment is handled via IPC, using a separate block whose size is equal to the size of the DATE control block itself. The LDC of the third detector has 100 KB handled via IPC. The same mechanism is used for the stand-alone LDC, with a size of 10000 bytes. The two GDCs use an identical configuration of a single 10 MB IPC segment for the DATE control block and for all the resources needed by the event builder. Using the above example configuration to fill the database, here is the report from the dumpDbs utility: ALICE DAQ and ECS manual Example of a DAQ system Listing 4.8 67 Example of dumpDbs output 1: > dumpDbs 2: Roles DB: 3: 0) id: 1 LDC DetOneLdc1 hostname:host1 "DetOne LDC #1" madeOf:Undefin ed bankDescriptor:0 4: [...] 5: 9) id: 1 Detector DetOne hostname:N/A "Detector 1" madeOf:Subdetector TOP-LEVEL 6: [...] 7: 12) id: 1 Subdetector DetOneSubOne hostname:N/A "Detector 1 Sub-detec tor 1" madeOf:LDC 8: [...] 9: 15) id: 1 Trigger-Host triggerHost1 hostname:hostT "TriggerHost 1" ma deOf:Undefined 10: 16) id: 1 Trigger-Mask TriggerMask1 hostname:N/A "Trigger mask 1" ma deOf:Undefined 11: 17) id: 2 Trigger-Mask TriggerMask2 hostname:N/A "Trigger mask 2" ma deOf:Undefined 12: Max LDC:30, Max GDC:2, Max Detector:3, Max Subdetector:3, Max Trigge r-Host:1, Max Trigger-Mask:2 13: ......................................................... 14: Trigger DB: 15: 0) id: 1 TriggerMask1 16: detectorPattern:0000000e = DetOne+DetTwo+DetThree => 17: ldcPattern:DetOneLdc1+DetOneLdc2+DetOneLdc3+DetTwoLdc1+DetTwoL dc2+DetThreeLdc (6 LDCs) 18: 1) id: 2 TriggerMask2 19: detectorPattern:00000002 = DetOne 20: => ldcPattern:DetOneLdc1+DetOneLdc2+DetOneLdc3 (3 LDCs) 21: ......................................................... 22: Detectors DB: 23: 0) Detector id: 1 DetOne (made of:Subdetector) 24: subdetectorPattern:DetOneSubOne+DetOneSubTwo+DetOneSubThree (3 su bdetectors) 25: => ldcPattern:DetOneLdc1+DetOneLdc2+DetOneLdc3 (3 LDCs) 26: 1) Detector id: 2 DetTwo (made of:LDC) 27: ldcPattern:DetTwoLdc1+DetTwoLdc2 (2 LDCs) 28: => ldcPattern:DetTwoLdc1+DetTwoLdc2 (2 LDCs) 29: [...] 30: 3) Subdetector id: 1 DetOneSubOne (made of:LDC) 31: ldcPattern:DetOneLdc1 (1 LDC) 32: [...] 33: ......................................................... 34: Banks DB: 35: LDC DetOneLdc1 (host1): descriptor:0 1 bank(s) 36: ipc "${DATE_SITE_CONFIG}/LDC.key" size:150000 => control readout re adoutReadyFifo readoutFirstLevelVectors readoutSecondLevelVector readoutDataPages edmReadyFifo 37: [...] 38: LDC DetTwoLdc1 (host4): descriptor:3 3 bank(s) 39: physmem "/dev/physmem.device" size:5242880 => readoutDataPages 40: ipc "${DATE_SITE_CONFIG}/LDC.key" size:1572864 => readout readoutRe adyFifo readoutFirstLevelVectors readoutSecondLevelVectors edmRea dyFifo 41: ipc "${DATE_SITE_CONFIG}/LDC.key1" size:-1 => control 42: [...] 43: GDC gdc1 (host8): descriptor:7 1 bank(s) 44: ipc "${DATE_SITE_CONFIG}/GDC.key" size:10485760 => control eventBui lder eventBuilderReadyFifo eventBuilderDataPages 45: [...] 46: ......................................................... 47: Event building control DB: 48: 0) eventType:StartOfRun all-events NO-BUILD 49: [...] 50: 4) eventType:Physics triggerPattern:00000000-00000002=1 BUILD 51: 1:TriggerMask1 52: 5) eventType:Physics triggerPattern:00000000-00000004=2 BUILD 53: 2:TriggerMask2 54: [...] 55: ......................................................... ALICE DAQ and ECS manual Configuration databases 68 The output of the utility has been edited for brevity and formatting purposes. The following declared roles are shown: • LDCs (line 3). • Detector and SubDetectors (lines 5,7). • Trigger host (line 9). • Trigger masks (lines 10-11). The two elements added dynamically by the database package to each role are the madeOf field (used to specify the components of a role) and the bankDescriptor ID pointing to the specific host role entry in the banks database. Note the TOP-LEVEL attribute that specifies the entities that can be directly selected using the runControl Human Interface. Line 12 reports the number of entities declared in the database. These values are used by DATE to allocate global structures within a run. Follows the trigger database (lines 14-20). The dumpDbs utility complements the static information retrieved from the databases with some derived information, such as the detector and LDC pattern corresponding to each trigger mask. For the detectors database (lines 22-32) the dumpDbs utility appends the derived information of the list of LDCs corresponding to each detector and subdetector. The banks database is reported (lines 34-45) for each host with the list of all banks, their support, size and entities. The individual sizes are not given, since these can be computed only at run-time according to the actual configuration. Finally, the event-building control is shown (lines 47-54) with some NO BUILD rules (for start of run records) and some “by-trigger” rules. 4.7 The programming interface The DATE database package provides a common interface to access some of the database content, in particular the data of the ‘static databases’ described in Section 4.3. It is not required to use this interface to operate DATE. Information in this chapter is given for developer information only, since it is mainly used by DATE actors. The way to access data is the same for all information: the database is opened, loaded, and is mapped onto the process address space. Access is provided via memory-mapped, read-only operations. After a successful mapping the following information is made available to the calling process: a. pointer to an array describing the database, the size (number of entries) of the database is given by an int and can also be given by the array itself (each DB has the last entry with invalid ID set to DB_ID_INVALID). b. a set of max*Id variables where the maximum defined ID is given. This value can be used to size at run-time structures with one element for each ID. The value of the maximum ID is guaranteed to be less than or equal to the static ALICE DAQ and ECS manual The programming interface 69 maximum value as defined in ${DATE_COMMON_DEFS}/event.h. Once a database is successful mapped into the process address space, it can be reloaded only explicitly via an unload/load sequence. Consecutive load calls produce no effect. All entries require the definitions given by the files ${DATE_COMMON_DEFS}/event.h and ${DATE_DB_DIR}/dateDb.h. These files can be included either by specifying the full path or via “-I” C include directive. In this section, a definition is given for the following entities: • macros used to manipulate and test bit masks and bit patterns. • base types used to represent the entities defined in the static databases. • entries used to load, unload and perform other operations through the static databases. • pointers and variables where the databases and their associated information is made available to the calling process. Several other access methods to more specific data are available and defined in ${DATE_DB_DIR}/dateDb.h and fully documented in the header file. It includes means to list equipment, retrieve run parameters, parse configuration files, browser detector ID/name/code mapping, etc. Bit test macro C Synopsis #include “event.h” #include “dateDb.h” #define DB_TEST_BIT( bitMaskOrPattern, id ) Description The DB_TEST_BIT macro can be used for all bit masks and bit patterns to test for the assertion of a given ID. It returns the boolean value TRUE if the id is set, FALSE otherwise. The macro can be used directly or indirectly in boolean-driven statements, e.g. the following lines of code: if ( DB_TEST_BIT(mask, id) ) idIsSet(); if ( DB_TEST_BIT(mask, id) == TRUE ) idIsSet(); shall execute the function idIsSet() if id is set in mask. ALICE DAQ and ECS manual Configuration databases 70 dbIdType DB_ID_INVALID dbLdcPatternType eventDetectorPatternType eventTriggerPatternType C Synopsis #include “event.h” #include “dateDb.h” Description The basic type dbIdType defines the data storage element used to represent all IDs used in the DATE static databases, being LDC, GDC, Trigger Host, Trigger Mask, Detector, Subdetector or EDM Host. Within the same role, different entities must have different IDs. The same ID can be used for entities of different roles. An ID equal to DB_ID_INVALID has either not been set or it is not applicable to a given record/entity. DB_WORDS_IN_LDC_MASK C Synopsis #include “event.h” #include “dateDb.h” Description Number of 32-bit words used to store a dbLdcPatternType. Can be used to scan and size a LDC pattern. dbRoleType C Synopsis #include “event.h” #include “dateDb.h” typedef enum { dbRoleUndefined, dbRoleUnknown, dbRoleGdc, dbRoleLdc, dbRoleEdmHost, dbRoleHltProxy, dbRoleHltProducer, dbRoleHltRoot, dbRoleDetector, dbRoleSubdetector, dbRoleTriggerHost, dbRoleTriggerMask, dbRoleDdg, dbRoleFilter } dbRoleType; ALICE DAQ and ECS manual The programming interface Description 71 The dbRoleType enumeration type is used to define the role of a given record. dbMemType C Synopsis #include “event.h” #include “dateDb.h” typedef enum { dbMemUndefined, dbMemIpc, dbMemHeap, dbMemBigphys, dbMemPhysmem } dbMemType; Description Define the method used to implement the control and (optionally) data buffers. This enumeration applies only to hosts where DATE actors need run-time memory support (LDC, GDC, EDM, Trigger Host). Where this is not applicable or has not been correctly defined, the dbMemUndefined value is used. dbMemTypeNames C Synopsis #include “event.h” #include “dateDb.h” const char * const dbMemTypeNames[]; Description Maps any memory type to a description string. Use as dbMemTypeNames[ dbMemType ]. dbRoleDescriptor C Synopsis #include “event.h” #include “dateDb.h” typedef struct { char *name; char *hostname; char *description; dbIdType id; dbRoleType role; dbHltRoleType hltRole; unsigned topLevel : 1; unsigned active : 1; dbRoleType madeOf; int bankDescriptor; } dbRoleDescriptor; ALICE DAQ and ECS manual Configuration databases 72 Description This is the structure used to represent the records of the roles database. It includes the following fields: • name of the entity. • hostname of the entity (where applicable). • description of the entity. • ID of the entity. • role of the entity. • hltRole of the entity. • flag topLevel to designate the entity as selectable from the runControl Human Interface (if TRUE). • active flag to indicate whether the entity is active or not. • the madeOf field to tell what the role is made of (e.g. a detector will be made of either subDetectors or LDCs). • index of the bankDescriptor that defines the banks and supports to be made available on the given role (where applicable). dbTriggerDescriptor C Synopsis #include “event.h” #include “dateDb.h” typedef struct { dbIdType id; eventDetectorPatternType detectorPattern; } dbTriggerDescriptor; Description Represent a record from the trigger static database. The record includes: • ID of the trigger, as defined in the roles database. • the detector pattern associated to the given trigger. Each record of this type associates a trigger (identified by its ID) to the corresponding detector pattern where TEST_BIT( detectorPattern, detectorId ) returns TRUE if the detector with the given ID is active in the given trigger mask. dbDetectorDescriptor C Synopsis #include “event.h” #include “dateDb.h” typedef struct { dbIdType id; dbRoleType role; ALICE DAQ and ECS manual The programming interface 73 dbLdcPatternType componentPattern; } dbDetectorDescriptor; Description Represent a record from the detectors static database. The record includes: • ID of the detector, as defined in the roles database. • role (detector or subDetector) of the described entity. • the ldc pattern or detector pattern associated to the given detector. The structure dbDetectorDescriptor describes the sub-detectors or the LDCs that belong to the given detector (regardless of the status of their actual connection). dbEventBuildingRule C Synopsis #include “event.h” #include “dateDb.h” typedef struct { eventTypeType eventType; unsigned build : 1 enum { fullBuild, useDetectorPattern, useTriggerPattern } type; union { eventDetectorPatternType detectorPattern; eventTriggerPatternType triggerPattern; } pattern; } dbEventBuildingRule; Description Represent a record from the event building database. The record includes: • the type of the event associated to the rule. • a build (TRUE)/no-build (FALSE) flag. • the type of build: full, partial by detector pattern or partial by trigger pattern. • the detector pattern or the trigger pattern used for partial event building (where applicable). The structure eventBuildingRule describes the rules followed by the event builder. These rules can specify either a build or a no-build rule on a per-event type basis. Furthermore, the rule can result in a request for partial building. ALICE DAQ and ECS manual Configuration databases 74 dbBankType C Synopsis #include “event.h” #include “dateDb.h” typedef enum { dbBankControl, dbBankReadout, dbBankReadoutReadyFifo, dbBankReadoutFirstLevelVectors, dbBankReadoutSecondLevelVectors, dbBankReadoutDataPages, dbBankHltAgent, dbBankHltReadyFifo, dbBankHltSecondLevelVectors, dbBankHltDataPages, dbBankEventBuilder, dbBankEventBuilderReadyFifo, dbBankEventBuilderDataPages } dbBankType; Description Describes the various banks that can be made available on the hosts where DATE actors can run. The dbBankControl bank contains the control section. It must be present on all DATE hosts. The dbBankReadout bank contains all banks needed by a LDC. This bank can be partitioned into the dbBankReadoutReadyFifo, dbBankReadoutFirstLevelVectors, dbBankReadoutSecondLevelVectors and dbBankReadoutDataPages banks. Similarly, the dbBankHltAgent bank contains all banks needed by a HLT agent. This bank can be partitioned into the dbBankHltReadyFifo, the dbBankHltSecondLevelVectors and the dbBankHltDataPages banks. The dbBankEventBuilder bank contains all entities needed on GDCs. It can be partitioned into the sub-entities dbBankEventBuilderReadyFifo and dbBankEventBuilderDataPages. dbBankNames C Synopsis #include “event.h” #include “dateDb.h” const char * const dbBankNames[]; ALICE DAQ and ECS manual The programming interface Description 75 Maps any memory bank to a description string. Use as dbBankNames[ dbBankType ]. dbBankPatternType C Synopsis #include “event.h” #include “dateDb.h” Description Basic type used to describe a Memory Bank pattern. dbBankComponents C Synopsis #include “event.h” #include “dateDb.h” const dbBankPatternType dbBankComponents[]; Description Read-only array containing, for each DATE bank, the pattern of its components (if any). The information stored in this array can be used to find out what the sub-entities are. For example, the entry corresponding to the dbBankEventBuilder contains dbBankEventBuilderReadyFifo and dbBankEventBuilderDataPages, therefore the two corresponding bits of dbBankComponents[ dbBankEventBuilder ] has the two bits dbBankEventBuilderReadyFifo and dbBankEventBuilderDataPages set. dbSingleBankDescriptor dbBankDescriptor C Synopsis #include “event.h” #include “dateDb.h” typedef struct { dbMemType support; char *name; int size; dbBankPatternType pattern; } dbSingleBankDescriptor; typedef struct { int numBanks; dbSingleBankDescriptor *banks; } dbBankDescriptor; Description Structures used to describe a single bank (within one DATE host) and all the banks (on one DATE host). ALICE DAQ and ECS manual Configuration databases 76 The size field contains either an explicit total amount of memory (in bytes) to be used to store the bank or the value -1, meaning that the bank will be sized according to its content (if possible). DB_LOAD_OK DB_UNLOAD_OK DB_LOAD_ERROR DB_PARSE_ERROR DB_INTERNAL_ERROR DB_BAD_SIZING DB_PAR_ERROR DB_UNKNOWN_ID C Synopsis #include “event.h” #include “dateDb.h” Description The above definitions cover all case of errors during handling of the static databases. DB_LOAD_OK and DB_UNLOAD_OK report the successful load/unload of a database. DB_LOAD_ERROR reports a problem loading a database due to the underlying file system. If this error is received, check for the presence of the file and for the access permissions. DB_PARSE_ERROR reports a problem parsing a database. The line that generated the error can be retrieved using the dbGetLastLine call. DB_INTERNAL_ERROR reports an unexpected error condition. More explanations may be found in the messages stored in dateDb log facility. This error should be reported to the DATE support team. DB_BAD_SIZING is the result of a ID out of range or a limit out of range. DB_PAR_ERROR is returned when one or more input parameters have invalid values. DB_UNKNOWN_ID corresponds to an ID that is within the valid values but has no corresponding DATE role associated. All the above errors can be decoded and printed using the dbDecodeStatus call. dbDecodeStatus C Synopsis #include “event.h” #include “dateDb.h” char *dbDecodeStatus( status ); ALICE DAQ and ECS manual The programming interface Description Returns 77 This routine will map the given status code - as returned by any of the dateDb routines - to a description string. The description string if the error code is amongst the allowed choices, an error message otherwise. The pointer is to a static location, overwritten by subsequent calls to the routine. dbGetLastLine C Synopsis #include “event.h” #include “dateDb.h” char *dbGetLastLine(); Description Get the last line decoded by the last dbLoad*() call. This line can be used for diagnostic purposes, e.g. to understand parse errors. This call is relevant only when operating with file-based databases, not in MySQL mode. Returns Pointer to a char array containing the last line decoded. The pointer is to a static location, overwritten by subsequent parsing of the databases. dbDecodeRole C Synopsis #include “event.h” #include “dateDb.h” char *dbDecodeRole( dbRoleType role ); Description Returns a description string for the given role (or an error message if the input parameter is not correct). Returns Pointer to a read-only char array. dbDecodeBankPattern C Synopsis #include “event.h” #include “dateDb.h” char *dbDecodeBankPattern( dbBankPatternType bankPattern ); Description Returns a description string for the given bank pattern (or an error message if the input parameter is not correct). Returns Pointer to a read-only char array. ALICE DAQ and ECS manual Configuration databases 78 dbRolesDb dbSizeRolesDb dbMaxLdcId dbMaxGdcId dbMaxTriggerMaskId dbMaxDetectorId dbMaxSubdetectorId dbMaxHltProxyId dbMaxHltProducerId dbMaxHltRootId dbMaxTriggerHostId dbMaxEdmHostId dbMaxDdgId dbMaxFilterId dbLoadRoles dbUnloadRoles C Synopsis #include “event.h” #include “dateDb.h” dbRoleDescriptor *dbRolesDb int dbSizeRolesDb; dbIdType dbMaxLdcId; dbIdType dbMaxGdcId; dbIdType dbMaxTriggerMaskId; dbIdType dbMaxDetectorId; dbIdType dbMaxSubdetectorId; dbIdType dbMaxHltProxyId; dbIdType dbMaxHltProducerId; dbIdType dbMaxHltRootId; dbIdType dbMaxTriggerHostId; dbIdType dbMaxEdmHostId; dbIdType dbMaxDdgId; dbIdType dbMaxFilterId; int dbLoadRoles(); int dbUnloadRoles(); Description The roles database is fully described by the above entities that can be loaded using the dbLoadRoles call. All the IDs described by the database are limited by the values given in dbMax*Id. The entry dbLoadRoles loads all the above entities when called the first time. Successive calls to dbLoadRoles do not force a reload of the entries. This can be achieved by using the dbUnloadRoles call followed by dbLoadRoles. On failure to load the database, the values given in the associated variables are undefined. ALICE DAQ and ECS manual The programming interface Returns 79 DB_LOAD_OK/DB_UNLOAD_OK in case of success, otherwise an error code. dbRolesFind dbRolesFindNext C Synopsis #include “event.h” #include “dateDb.h” int dbRolesFind( char *roleName, dbRoleType role ); int dbRolesFindNext(); Description These routines implement a fast-find algorithm on the roles database. The dbRolesFind entry initializes a find for the given entity with the given role (dbRoleUnknown will search for any entity with the given name). The search can be continued starting from the point of the last match using the dbRolesFindNext entry. For semantically-correct databases, whenever the role is different from dbRoleUnknown, at most one match should be returned and therefore dbRolesFindNext should never return a match. Returns -1 for no match, index to dbRolesDb is a match is found. dbTriggersDb dbSizeTriggersDb dbLoadTriggers dbUnloadTriggers C Synopsis #include “event.h” #include “dateDb.h” dbTriggerDescriptor *dbTriggersDb; int dbSizeTriggersDb; int dbLoadTriggers(); int dbUnloadTriggers(); Description The dbTriggersDb pointer can be loaded and unloaded using the dbLoadTriggers and dbUnloadTriggers routines. Consecutive calls to the dbLoadTriggers routine do not reload the database. To achieve this, use a dbUnloadTriggers/dbLoadTriggers sequence. Returns DB_LOAD_OK/DB_UNLOAD_OK in case of success, otherwise an error code. ALICE DAQ and ECS manual Configuration databases 80 dbGetDetectorsInTriggerPattern C Synopsis #include “event.h” #include “dateDb.h” int dbGetDetectorsInTriggerPattern( eventTriggerPatternType triggerPat, eventDetectorPatternType detectorPat ); Description The detectorPat is loaded with the detector pattern that corresponds to the given triggerPat. This information is static and needs to be combined with the dynamic run-time mask of active detectors to give the actual run-time pattern. Returns DB_LOAD_OK in case of success, otherwise an error code. dbGetLdcsInTriggerPattern C Synopsis #include “event.h” #include “dateDb.h” int dbGetLdcsInTriggerPattern( eventTriggerPatternType triggerPat, dbLdcPatternType ldcPat ); Description The ldcPat is loaded with the LDC pattern that corresponds to the given triggerPat. This information is static and needs to be combined with the dynamic run-time mask of active LDCs to give the actual run-time pattern. Returns DB_LOAD_OK in case of success, otherwise an error code. dbDetectorsDb dbSizeDetectorsDb dbLoadDetectors dbUnloadDetectors C Synopsis #include “event.h” #include “dateDb.h” dbDetectorDescriptor *dbDetectorsDb; int dbSizeDetectorsDb; int dbLoadDetectors(); int dbUnloadDetectors(); ALICE DAQ and ECS manual The programming interface Description Returns The dbDetectorsDb pointer can be loaded and unloaded using the dbLoadDetectors and dbUnloadDetectors routines. Consecutive calls to the dbLoadDetectors routine do not reload the database. To achieve this, use a dbUnloadDetectors/dbLoadDetectors sequence. DB_LOAD_OK/DB_UNLOAD_OK in case of success, otherwise an error code. dbGetLdcsInDetector C Synopsis #include “event.h” #include “dateDb.h” int dbGetLdcsInDetector( dbIdType detectorId, dbLdcPatternType ldcPat ); Description The ldcPat is loaded with the LDC pattern that corresponds to the given detectorId. This information is static and needs to be combined with the dynamic run-time mask of active LDCs to give the actual run-time pattern. Returns DB_LOAD_OK in case of success, otherwise an error code. dbGetLdcsInDetectorPattern C Synopsis #include “event.h” #include “dateDb.h” int dbGetLdcsInDetectorPattern( eventDetectorPatternType detectorPat, dbLdcPatternType ldcPat ); Description The ldcPat is loaded with the LDC pattern that corresponds to the given detectorPat. This information is static and needs to be combined with the dynamic run-time mask of active LDCs to give the actual run-time pattern. Returns 81 DB_LOAD_OK in case of success, otherwise an error code. ALICE DAQ and ECS manual Configuration databases 82 dbEventBuildingControlDb dbSizeEventBuildingControlDb dbLoadEventBuildingControl dbUnloadEventBuildingControl C Synopsis #include “event.h” #include “dateDb.h” eventBuildingRule *dbEventBuildingControlDb; int dbSizeEventBuildingControlDb; int dbLoadEventBuildingControl(); int dbUnloadEventBuildingControl(); Description The dbEventBuildingControlDb pointer can be loaded and unloaded using the dbLoadEventBuildingControl and dbUnloadEventBuildingControl routines. Consecutive calls to the dbLoadEventBuildingControl routine do not reload the database. To achieve this, use a dbUnloadEventBuildingControl/dbLoadEventBuildingControl sequence. Returns DB_LOAD_OK/DB_UNLOAD_OK in case of success, otherwise an error code. dbBanksDb dbSizeBanksDb dbLoadBanks dbUnloadBanks C Synopsis #include “event.h” #include “dateDb.h” dbBanksDescriptor *dbBanksDb; int dbSizeBanksDb; int dbLoadBanks(); int dbUnloadBanks(); Description The dbBanksDb pointer can be loaded and unloaded using the dbLoadBanks and dbUnloadBanks routines. Consecutive calls to the dbLoadBanks routine do not reload the database. To achieve this, use a dbUnloadBanks/dbLoadBanks sequence. Returns DB_LOAD_OK/DB_UNLOAD_OK in case of success, otherwise an error code. ALICE DAQ and ECS manual The programming interface dbUnloadAll C Synopsis #include “event.h” #include “dateDb.h” int dbUnloadAll(); Description Unload all the static databases from the memory of the calling process. If the operation fails, the final result is unpredictable. Returns DB_UNLOAD_OK if the operation succeeds, error from individual unload routines otherwise. ALICE DAQ and ECS manual 83 84 Configuration databases ALICE DAQ and ECS manual The monitoring package 5 This chapter describes how to write a monitoring program. After a brief introduction to the monitoring in DATE, the monitoring library is explained and its use from all the most commonly used programming languages is shown. ALICE DAQ and ECS manual 5.1 Monitoring in DATE . . . . . . . . . . . . . . . . . . . . . . . . 86 5.2 Online monitoring and role name . . . . . . . . . . . . . . . . 88 5.3 Monitoring and Analysis in C/C++ . . . . . . . . . . . . . . . 89 5.4 Monitoring by detector . . . . . . . . . . . . . . . . . . . . . 100 5.5 Monitoring from ROOT . . . . . . . . . . . . . . . . . . . . . 101 5.6 The “eventDump” utility program. . . . . . . . . . . . . . . 102 5.7 Monitoring of the online monitoring scheme . . . . . . . . . 103 5.8 Monitoring configuration . . . . . . . . . . . . . . . . . . . . 104 The monitoring package 86 5.1 Monitoring in DATE A data-acquisition system requires monitoring of experimental data (online and offline data, on online and offline hosts). Some possible applications for monitoring tasks are: • statistical analysis of the experimental stream to evaluate the quality of the physics conditions. • detailed analysis of the experimental data. • occasional checking of the overall status of the data-acquisition system (e.g. operator status panel). To perform these and other functions, DATE provides the monitoring package, whose objective is to offer a uniform interface for the development and the support of user-written monitoring programs tailored to specific needs. The monitoring interface allows access to events coming from the live experimental stream or from a Permanent Data Storage (PDS)1 media, with statistical or strict monitoring purposes, on online (part of the data-acquisition system) or offline (totally detached) hosts. When monitoring is performed in its full online configuration (see Figure 5.1 top diagram), the monitoring program gets the data from a local monitoring buffer, filled from the online data producer (the readout process on LDCs and the eventBuilder process on GDCs). This approach is the most efficient for what concerns the use of system resources but might impose an unacceptable load on the online host, already charged with acquisition and control tasks. ONLINE host (LDC or GDC) local readout OR eventBuilder monitoring buffer program ONLINE host (LDC or GDC) readout OR eventBuilder Figure 5.1 monitoring OFFLINE host remote monitoring buffer monitoring program The DATE online monitoring, local and remote configurations 1. The term PDS - defined in the ALICE technical proposal [14]- is used here with a wider meaning, also covering permanent, semi-permanent and temporary storage, usually located in the physical path between the data-acquisition system online buffer and the final PDS. ALICE DAQ and ECS manual Monitoring in DATE 87 To “off-load” the online environment, it is possible to run the monitoring program on another host, linked to the first via LAN or WAN (see Figure 5.1 bottom diagram). The result is similar to what we achieved in the first configuration, with the advantage of freeing resources on the data-acquisition host, at the price of an increased load on the interconnecting network between the two machines. The same data-acquisition system can have - without reconfiguration - several local and remote monitoring programs, all running simultaneously and getting their data from the same source. However, each monitoring program can receive its data to monitor from one source at a time. It is possible to switch forth and back between different data sources within the same monitoring program, although this practice is not recommended. The need of monitoring only sub-events coming from selected detector(s) exists in ALICE. A special function has therefore been added to the monitoring library: monitoring by detector. This function extends the remote monitoring scheme, applying it to a set of LDCs, the active hosts attached to one or more detectors. The monitoring library gets the sub-events from all the LDCs, performs a “reduced” event building procedure and delivers the result to the monitoring program. Only events where all the selected LDCs contribute with one sub-event will be selected. Another operating mode of the monitoring library - shown in Figure 5.2 - allows the same functions on offline streams, usually coming from the Permanent Data Storage. This setup allows direct monitoring from the PDS server or from other hosts (batch server, desktop or workstation) not connected to the PDS media. This configuration can optionally make use of the SHIFT/CASTOR disks servers available at CERN. Remote host PDS-attached host PDS Figure 5.2 local remote monitoring monitoring program program The DATE offline monitoring During the connection phase, monitoring programs can declare themselves to the monitoring scheme. This allows easy tracing of each client and makes it possible to “fine tune” the runtime parameters of the monitoring system. When a monitoring program connects itself to the experimental stream, it has the capability to declare a monitoring policy for any given event type. This policy can require all events for monitoring (must policy), as many as possible of the events (most policy) a random share of events for monitoring (yes policy), whatever events are available (few policy) or no monitoring at all (none policy). It is important to understand the impact of a given monitoring policy on the data-acquisition system and on the monitoring environment. A monitoring program requesting a “must” ALICE DAQ and ECS manual The monitoring package 88 policy must process the information as fast as it will be offered or it might stall the entire data-acquisition stream. On the other hand, the exclusion of certain classes of events - unwanted for a given type of monitoring - will reduce the overhead on the online host and on the interconnecting network, as less data will be stored and transferred between the online producer (readout or eventBuilder) and the consumer (the monitoring program). Monitoring programs have the choice to stall if no data is available or to continue with their execution (knowing that no data has been received). This allows the implementation of event-driven processes (such as X11 clients) that should not be blocked in absence of data. Another feature of the monitoring library is to let a monitoring program discard all data eventually stored in the monitoring buffer. This is useful to access only future events at any given point in time. Some experimental setups might “hide” their data-acquisition hosts behind routers or firewalls, making remote monitoring difficult or impossible. To solve this problem, the DATE monitoring library allows a mechanism called “relayed monitoring”, where the monitoring channel travels through a dedicated relay host (visible from the offline host and with access to the hidden online host). The scheme is described in Figure 5.3. It is possible to filter the access through the relay host only to a restricted set of clients, according to the type of monitoring requested. Relayed monitoring performs worse than direct monitoring and should be used only whenever absolutely unavoidable. ONLINE HOST or PDS-attached host monitoring buffer PDS Figure 5.3 FIREWALL OFFLINE HOST RELAY remote HOST monitoring program The DATE relayed monitoring 5.2 Online monitoring and role name DATE allows any given host (LDC or GDC) to operate within multiple independent setups (e.g. one setup for production and a second setup for commissioning), also simultaneously. To identify the environment of each setup, DATE assigns to the same host different role names, one for each setup. The monitoring library uses the same mechanism in order to monitor a setup when a choice between multiple data streams is available. ALICE DAQ and ECS manual Monitoring and Analysis in C/C++ 89 The recommended way to select the appropriate setup is to use role names rather than host names during the declaration of the data source (see the description of the monitorSetDataSource routine). When doing so, the monitoring library automatically sets the environment in order to access the appropriate data stream. This mechanism requires at runtime an active and valid connection to the DATE configuration database in order to resolve the role name and the associated host name. This is also the mechanism recommended for local monitoring of a machine that belongs to multiple setups (the monitoring library sets the environment according to the selected role and then proceeds with the same path as for local online monitoring, therefore not using TCP/IP to move the events). If it is not possible to specify a role name (e.g. for monitoring clients that do not have a connection to the DATE configuration database) it is still possible to select any setup by defining the environment variable DATE_ROLE_NAME before connecting to the remote host (in this case the host name must be used as data source). If no role name is selected at runtime, the monitoring library chooses the first alphabetical match on the target host name (or for the local host in case of local monitoring). For single-setup environments this solution chooses the only available data stream and therefore always give the expected result. For machines running multiple instances of DATE, an arbitrary selection is implied and this may lead to unexpected behaviors at runtime (according to the content of the DATE configuration database). For this reason we strongly recommend to use the role name whenever this is possible. To summarize: 1. if the monitoring program has access to the DATE configuration database, always use the role name as data source (also for local monitoring): this procedure gives the maximum flexibility, is fully reconfigurable via the DATE database and always connect to the right data source regardless of the HW/SW configuration in use with no runtime overhead. 2. If the machine being monitored plays a single role, it is still possible to use the anonymous syntax “:” for local monitoring or the TCP/IP hostname for remote monitoring. This scheme is less flexible, is not recommended but still works. 3. If the monitoring program has no access to the DATE configuration database, then it is not possible to use the role name to connect to the (obviously remote) data source. In this case, the data source must contain the TCP/IP hostname and the DATE_ROLE_NAME should be given as environment variable within the monitoring process. 5.3 Monitoring and Analysis in C/C++ A monitoring program should accomplish the following steps in order to perform its function: 1. declare the source providing the data to monitor. 2. declare itself to the monitoring scheme. ALICE DAQ and ECS manual The monitoring package 90 3. declare - if necessary - the monitor policies to be followed. 4. declare - if necessary - the wait/nowait policy to be followed. 5. get the available event(s) from the monitoring stream. The monitoring scheme can be used from programs written in C or C++. This chapter describes the C/C++ callable interface available within the DATE monitoring package and its characteristics. 5.3.1 Some simple examples In Listing 5.1 we have a very simple example of a monitoring program written in C. Listing 5.1 Example of event dump in C: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: #include #include #include #include “event.h” “monitor.h” void printError( char *where, int errorCode ) { fprintf( stderr, “Error in %s: %s\n”, where, monitorDecodeError( errorCode ) ); exit( 1 ); } /* End of printError */ int main() { int status; status = monitorSetDataSource( “:” ); if ( status != 0 ) printError( “monitorSetDataSource”, status ); status = monitorDeclareMp( “C demo mp” ); if ( status != 0 ) printError( “monitorDeclareMp”, status ); for (;;){ /* Start of endless loop */ void *ptr; struct eventStruct *event; status = monitorGetEventDynamic( &ptr ); if ( status != 0 ) printError( “monitorGetEventDynamic”, status ); event = (struct eventStruct *)ptr; printf( “Run #:%d, EventId #:%08x%08x, Type:%ld, size:%ld, Data size:%d\n”, event->eventHeader.eventRunNb, event->eventHeader.eventId[0], event->eventHeader.eventId[1], event->eventHeader.eventType, event->eventHeader.eventSize, event->eventHeader.eventSize event->eventHeader.eventHeadSize ); free( ptr ); } /* End of endless loop */ } /* End of main */ The program consists of a declaration phase followed by an endless loop where events are fetched from the monitoring stream and their header is printed. Please note that the program never terminates: the process must be killed via an external signal (e.g. ^C - obtained pressing the “control” and the “C” keys - via the keyboard for interactive processes). ALICE DAQ and ECS manual Monitoring and Analysis in C/C++ 91 Looking at the example more in details, we can observe the following features: Line 3: inclusion of the DATE event declaration module. Line 4: inclusion of the DATE monitoring declaration module. Line 16: declaration of the source of monitoring data (in this case, the online local host). Line 19: declaration of the monitoring program. Line 26: the next available event is transferred from the monitoring buffer. Other examples are available in the directory ${DATE_MONITOR_DIR}, namely the source code for the eventDump utility (described in Section 5.6), named eventDump.c. 5.3.2 The monitoring package files The distribution point for the monitoring package is ${DATE_MONITOR_DIR} (defined by the DATE setup procedure). In this area it is possible to find the following files: • ${DATE_MONITOR_DIR}/monitor.h: prototypes and definitions for monitoring programs written in C. • ${DATE_MONITOR_DIR}/${DATE_SYS}/libmonitor.a: monitoring library for any language capable of calling C code (e.g. C and C++). • ${DATE_MONITOR_DIR}/${DATE_SYS}/libmonitorstdalone.a: monitoring library with reduced functionality for non-SHIFT hosts (see below). • ${DATE_MONITOR_DIR}/${DATE_SYS}/libmonitor.so: non-SHIFT shareable monitoring library. C monitoring programs should include the prototypes declaration monitor.h either including in the C compilation statement the output of the command date-config --cflags (on machines running DATE or with DATE installed) or copying the prototypes declaration locally (non-DATE machines) and providing the appropriate C compilation directives (specifications vary from architecture to architecture). Monitoring programs require the libraries specified by the output of the command date-config --monitorlibs (with SHIFT access) or date-config --monitorlibs=noshift (without SHIFT access). The SHIFT library - referenced by the monitoring I/O package - is used to access hosts whose PDS is available on SHIFT servers, e.g. the CERN ALICE WorkGroup server (LXPLUS), the CERN batch processing facility (SHIFT) and the CERN CASTOR servers. If access to any of these facilities is not required, the inclusion of SHIFT libraries is not necessary. The output image can therefore be used for local file I/O or for remote network monitoring. These libraries are distributed by the CASTOR support team at CERN. Hardware and software platforms not part of the standard DATE distribution - but possible clients of the DATE monitoring scheme - can still use the monitoring library by copying the necessary files and performing local compilation and link. ALICE DAQ and ECS manual The monitoring package 92 5.3.3 Error codes The entries belonging to the monitoring library may return a monitoring-specific error code. This code can be either zero for success or non-zero for failure. To decode an error code please refer to the ${DATE_MONITOR_DIR}/monitor.h file or call the entry monitorDecodeError described in the next section. 5.3.4 The monitoring callable library This section describes the entries available in the monitoring library. Each entry is described in the C version. For the decoding of error codes eventually returned by the entries, please refer to Section 5.3.3. monitorSetDataSource C Synopsis #include “monitor.h” int monitorSetDataSource( char* source ) Description Table 5.1 The source of events to monitor is declared. The syntax of the monitor source parameter is given in Table 5.1. Monitor source parameter syntax “:” local online (default) “file” local file (both full and relative paths are accepted, full path recommended) “@target:” remote online on machine “target” “file@target” remote file on machine “target” (the full path to the file should be given) “@target1@target2:” remote online on machine “target1” via the relay host “target2” “file@target1@target2” remote file on machine “target1” via the relay host “target2” (the full path to the file should be given) “^det[+det]” remote online on the LDCs belonging to the detector “det” (plus-separated lists can be used to specify more than one detector) and active in the current run “@*:” remote online on the GDCs active in the current run “=partition” remote online on the GDCs active in the partition The parameter “target” can specify either a role name (recommended) or a TCP/IP host name (see Section 5.2). If remote monitoring is used and “target” points to the local host, then local monitoring is assumed and no transfer take place over TCP/IP (not even via local sockets). The monitoring library is able to resolve host aliasing and multi-interface hosts. ALICE DAQ and ECS manual Monitoring and Analysis in C/C++ 93 For detector and GDCs monitoring, the run number must be specified in the environment variable DATE_RUN_NUMBER. The run number can also be re-defined during the monitoring, but in this case a monitorLogout followed by a new call to monitorSetDataSource is recommended. For partition monitoring, the run number is not needed. The monitoring library will reconfigure at each start of run adding or removing GDCs according to the configuration of the partition itself. Monitoring programs must take care in handling the events when these come from consecutive runs. Returns Zero in case of success, else an error code (see Section 5.3.3 for more details). monitorDeclareMp C Synopsis #include “monitor.h” int monitorDeclareMp( char* mpName ) Description Returns The given string is used to declare the monitoring program. This can be used for debugging, for fine tuning and to monitor the online monitoring scheme (see Section 5.7). Zero in case of success, else an error code (see Section 5.3.3 for more details). monitorDeclareTable C Synopsis #include “monitor.h” int monitorDeclareTable( char** table ) Description A table describing the desired monitoring policy is declared within the monitoring scheme. Each monitoring program can declare a monitoring table at any time. This table will be used for all subsequent calls to monitorSetDataSource and will be kept valid in case monitorLogout is called. It is possible to declare a table in the middle of a monitoring stream: this will force a flush of all events eventually available in the monitoring buffer and in the monitoring channel. The input parameter should have the following C syntax: char *table[] = { [ “event type”, “monitor type”, ]* NULL }; where the fields event type and monitor type can assume one of the values and aliases given in Table 5.2 and in Table 5.3. ALICE DAQ and ECS manual The monitoring package 94 Table 5.2 Table 5.3 Event types Event type Single-word alias Short alias “All events” “All_events” “ALL” “Start of run” “Start_of_run” “SOR” “Start of run files” “Start_of_run_files” “SORF” “Start of data” “Start_of_data” “SOD” “Start of burst” “Start_of_burst” “SOB” “End of burst” “End_of_burst” “EOB” “Physics event” “Physics_event” “PHY” “Calibration event” “Calibration_event” “CAL” “System Software Trigger event” “System_software_ trigger_event” “SST” “Detector Software Trigger event” “Detector_software_ trigger_event” “DST” “End of data” “End_of_data” “EOD” “End of run” “End_of_run” “EOR” “End of run files” “End_of_run_files” “EORF” “Event format error” “Event_format_error” “FERR” Monitor types Monitor type Action “all” all events of this type are monitored (100%) “most” a priority sample of the events of this type is monitored “yes” a sample of the events of this type is monitored “few” events of this type may be monitored “no” no events of this type are monitored All declarations are case-insensitive and can be shortened to the nearest unique string (watch out for ambiguous shortening, e.g. “end of run” can match either “end of run” or “end of run files”). The features of the various sampling modes are: – all: all the events that match the selection are stored in the monitoring buffer. This mode must be used with extreme care as the DAQ stops if the monitoring program(s) cannot keep up with the throughput of the data flow. – most: as long as there is buffer space, the events that match the selection are copied in the monitoring buffer. These events may be dropped to make space to “all” events. If the monitoring program cannot keep up with the data flow, the overflowing events are dropped. This monitoring mode should be used only to ALICE DAQ and ECS manual Monitoring and Analysis in C/C++ 95 select rare events, not to disrupt the distribution of events received by other monitoring programs. Although delivery of events is not guaranteed (they may be dropped to make space to “all” events and they may not get recorded if the monitoring buffer contains only “most” events), it should have a much higher delivery success compared to a “yes” policy in case of multiple monitoring programs running with high input event rates. – yes: as long as there is buffer space, the events that match the selection are stored in the monitoring buffer. These events may be dropped if “all” or “most” events need space to be stored and may drop “few” events if they cannot be stored due to lack of space in the monitoring buffer. – few: events matching the selection are monitored as long as they can be stored and may be removed in case events matching other monitoring type criteria need space in the monitoring buffer. This policy may be useful for very slow monitoring programs such as event displays. – no: the events of the given type are not published for monitoring. Please note that for the case of one monitoring program active, “most”, “yes” and “few” will yield the same result. For setups with multiple monitoring client, events monitored as “no” by one client may still be stored in the monitored buffer if other monitoring program(s) requested them. The default table is: char *defaultTable[] = { “All”, “yes”, NULL }; Returns Zero in case of success, otherwise an error code (see Section 5.3.3 for more details). monitorDeclareTableExtended C Synopsis #include “monitor.h” int monitorDeclareTableExtended( char** table ) Description The purpose of this entry is to declare a monitoring policy where event attributes and/or trigger classes can be used for the selection method. Functionally, this routine is equivalent to monitorDeclareTable. The input parameter has the following syntax: char *table[] = { [ “event type”, “monitor type”, “attributes”, “triggers”, ]* NULL }; The event type and monitor type fields have the same syntax as for monitorDeclareTable (see Table 5.3 and Table 5.2). ALICE DAQ and ECS manual The monitoring package 96 The attributes and the triggers fields may contain either one number, a list of numbers separated by “|” (any of the given patterns would select the event) or by “&” (all the bits of the pattern must be asserted to select the event). For example: • “2” selects events with bit 2 asserted. • “2|3” selects events with bit 2 or with bit 3 asserted. • “2&3” selects events with bits 2 and 3 asserted. It is not possible to mix “|”s and “&”s in the same declaration, e.g. “2&3|4” returns a runtime error. Empty lists or wildcard “*” lists can be specified to disable the selection criteria. For example: • “PHY” “Y” “1” ““ selects physics events with attribute 1 asserted, regardless of the status of their trigger pattern. • “PHY” “Y” “*“ “1” selects physics events with trigger pattern 1 asserted. • “PHY” “Y” “1” “2” selects physics events with attribute 1 and trigger pattern 2 asserted. • “PHY” “Y” selects physics events (same as “PHY” “Y” “*” “*”). Both system and user attributes can be specified: for user attributes, use the attribute number (as used in the *_USER_ATTRIBUTE() macro). System attributes should be specified via the corresponding symbol from the ${DATE_COMMON_DEFS}/event.h definition file. If a non-empty trigger pattern is declared, events whose trigger pattern has not been validated are NOT selected for monitoring. If the trigger pattern is not specified, all events are potentially selected regardless of the validation status of their trigger pattern. Returns Zero in case of success, otherwise an error code (see Section 5.3.3 for more details). monitorDeclareTableWithAttributes C Synopsis #include “monitor.h” int monitorDeclareTableWithAttributes( char** table ) Description The purpose of this entry is to declare a monitoring policy where event attributes can be used for the selection method. Functionally, this routine is equivalent to monitorDeclareTableExtended. This entry is deprecated and is left for backward compatibility only. The entry monitorDeclareTableExtended should be used instead (with the triggers.field left empty). The input parameter has the following syntax: char *table[] = { [ “event type”, “monitor type”, “attributes”, ]* ALICE DAQ and ECS manual Monitoring and Analysis in C/C++ 97 NULL }; For the description of the attributes parameter, see the description of monitorDeclareTableExtended. Returns Zero in case of success, otherwise an error code (see Section 5.3.3 for more details). monitorGetEvent C Synopsis #include “monitor.h” int monitorGetEvent( void *buffer, long size ) Description The next available event (if any) is copied in the region pointed by buffer for a maximum length of size bytes. In case of failure, a zero-length event is returned. Returns Zero in case of success, otherwise an error code (see Section 5.3.3 for more details). monitorGetEventDynamic C Synopsis #include “monitor.h” int monitorGetEventDynamic( void **buffer ) Description Returns The next available event (if any) is copied on space reserved from the process heap and returned to the caller. The caller must take care of properly disposing the event via the free system call: failure to do so will exhaust the resources associated to the process and can severely degrade the overall system performances. If no data is available and the channel set in noWait mode the pointer returned will be NULL; in this case the event does not need to be disposed. Zero in case of success (also if no event is available), otherwise an error code (see Section 5.3.3 for more details). monitorFlushEvents C Synopsis #include “monitor.h” int monitorFlushEvents( void ) Description All the data available in the monitoring buffer is discarded. The next event transferred over the monitoring channel will be injected in the monitoring stream after this call terminates. ALICE DAQ and ECS manual The monitoring package 98 Returns Zero in case of success, otherwise an error code (see Section 5.3.3 for more details). monitorSetWait C Synopsis #include “monitor.h” int monitorSetWait( void ) Description After this call completes, if the monitoring program requests an event when the monitoring buffer and the monitoring channel are empty, the monitoring program will stop and wait for new events. This is the default behaviour of the monitoring library. Returns Zero in case of success, otherwise an error code (see Section 5.3.3 for more details). monitorSetNowait C Synopsis #include “monitor.h” int monitorSetNowait( void ) Description After this call completes, if the monitoring program requests an event when the monitoring buffer and the monitoring channel are empty, the monitoring program will continue and an empty event will be returned. Returns Zero in case of success, otherwise an error code (see Section 5.3.3 for more details). monitorControlWait C Synopsis #include “monitor.h” int monitorControlWait( int flag ) Description Returns The wait/nowait behavior of the monitoring library is set accordingly to the flag parameter: • true (wait): TRUE or (0 == 0) • false (nowait): FALSE or (0 == 1) Zero in case of success, otherwise an error code (see Section 5.3.3 for more details). ALICE DAQ and ECS manual Monitoring and Analysis in C/C++ 99 monitorSetNoWaitNetworkTimeout C Synopsis #include “monitor.h” int monitorSetNoWaitNetworkTimeout( int timeout ) Description Set the timeout for nowait reads of events via the network. When the timeout parameter is negative or zero, nowait reads of events through the network return immediately. When the timeout parameter is > 0, reads of events through the network may wait - if no events are available - up to the given time expressed in milliseconds. The default value of the timeout is -1 (no timeout). This call does not apply to the “monitoring by detector” scheme. Returns Zero in case of success, otherwise an error code (see Section 5.3.3 for more details). monitorSetSwap C Synopsis #include “monitor.h” int monitorSetSwap( int 32BitWords, int 16BitWords ) Description Table 5.4 This entry controls the behavior of the monitoring library when a network channel is opened with a host of different endianness (e.g. PC/ALPHA vs. Motorola/IBM/Sun). The two parameters are used to control the swapping algorithm to be used for the data portion of the incoming events; their possible use depends on the actual content of the data (payload) portion of the event and can be summarized as in Table 5.4. Bytes swapping control Data buffer data type 32BitWords 16BitWords 8-bit entities (signed or unsigned characters) FALSE FALSE 32-bit entities (e.g. VMEbus data) TRUE FALSE 16-bit entities FALSE TRUE In case swapping is not known beforehand, monitoring programs should set the two flags to FALSE and swap the data manually once their type is known: this will avoid unnecessary double-swapping at run-time. The values that can be given to the two flags are: Returns • true (perform swapping): TRUE or (0 == 0) • false (do not swap): FALSE or (0 == 1) Zero in case of success, otherwise an error code (see Section 5.3.3 for more details). ALICE DAQ and ECS manual The monitoring package 100 monitorDecodeError C Synopsis #include “monitor.h” char *monitorDecodeError( int code ) Description Returns The entry returns the pointer to a string describing the given error code. Pointer to a zero-terminated static, read-only string. monitorLogout C Synopsis #include “monitor.h” int monitorLogout( void ) Description The monitoring link is closed and all resources allocated for this monitoring program are freed. The link will be automatically re-opened when the monitoring program requests the next event. This entry can be used whenever the monitoring program expects long pauses, such as operator input. It imposes a certain overhead on the monitoring scheme and therefore should not be used too frequently. Returns Zero in case of success, otherwise an error code (see Section 5.3.3 for more details). 5.4 Monitoring by detector Monitoring by detector is the mechanism where events are received from all the active LDCs belonging to a particular detector (or a set of detectors). This is done by opening a set of TCP/IP channels and handling them individually. Events are received on a channel-by-channel basis and a reduced event building procedure is applied. All the LDCs active within the given detectors set must give one sub-event in order to perform a successful event building. Whenever one or more LDCs fail to provide a sub-event for a given event, this will be discarded. The output of a “monitoring by detector” procedure is a event with the following characteristics: • Event ID, event type, version ID and run number as for the input events. • Trigger pattern: picked up at random from any of the sub-events. • Detector pattern set according to the monitoring data source. • LDC ID and GDC ID set to VOID_ID. • ATTR_SUPER_EVENT and ATTR_BY_DETECTOR_EVENT set. • Time set to the moment the event has been delivered to the monitoring ALICE DAQ and ECS manual Monitoring from ROOT 101 program. Almost all the calls of the monitoring library are supported by the “monitoring by detector” scheme. Please note that, as the library must handle a set of monitoring channels, if a runtime error is returned by any call the error could come from any of the active channels. More error may occur during such an operation, in which case only the first error is reported. The requested operation is anyway attempted on all the channels, regardless from the status of each individual transactions. If any of the calls returns an error status, the status of the connected channels is unpredictable: a full logout/login procedure is therefore recommended. 5.5 Monitoring from ROOT 5.5.1 The ROOT system The ROOT [10] system provides a set of Object Oriented frameworks with all the functionality needed to handle and analyze large amounts of data in a very efficient way. As the data is defined as a set of objects, specialized storage methods are used to get direct access to the separate attributes of the selected objects without having to touch the bulk of the data. Included in ROOT are histogramming methods in 1, 2 and 3 dimensions, curve fitting, function evaluation, minimization, graphics and visualization classes to allow the easy setup of an analysis system that can query and process the data interactively or in batch mode. Thanks to the built-in CINT C++ interpreter the command language, the scripting, or macro, language and the programming language are all C++. The interpreter allows for fast prototyping of the macros since it removes the time consuming compile/link cycle. It also provides a good environment to learn C++. If more performance is needed, the interactively developed macros can be compiled using a C++ compiler. The system has been designed in such a way that it can query its databases in parallel on MPP machines or on clusters of workstations or high-end PC’s. ROOT is an open system that can be dynamically extended by linking external libraries. From ROOT, the “standard” C library is loaded inside ROOT and called directly. The features available to the monitoring module are the same offered to any C/C++ program. Special care must be given to all asynchronous events, such as timers, signals and graphics handling. Monitoring from ROOT can be done either directly - using the libraries and files provided by the basic DATE distribution - or via AMORE (see Chapter 23). 5.5.2 Direct monitoring in ROOT When the direct monitoring approach is used, the “standard” DATE monitoring library is loaded in the ROOT context and programs run as any other C/C++ applications. In this environment, special care must be given to a proper integration between the monitoring program and the ROOT framework, especially for what concerns handling of signals, timers and graphics events. ALICE DAQ and ECS manual The monitoring package 102 The calling sequence for direct monitoring in ROOT is the same as for standard C/C++ programs. No support is available for ROOT/CINT, only compiled code can be used. 5.6 The “eventDump” utility program Part of the standard DATE kit is the utility eventDump. This program allows easy monitoring of any stream, useful for a quick check or debugging of a running system. The standard DATE kit provides a version of the eventDump utility for each architecture fully supported or only with monitoring support. To run the utility on DATE hosts, execute the standard DATE setup and issue the command > eventDump buffer For non-DATE hosts, copy the utility in your ${PATH} (or declare a proper alias) and then issue the same command as for DATE hosts. A list of all available options can be shown via the “-?” command-line flag. Some of the parameters are: • -b brief output (does not display event data). • -c check events data against a pre-defined data pattern (test environment only). • -s use static data buffer rather then dynamic memory. • -a use asynchronous reads (nowait mode). • -i interactive: pauses after each event and proposes a mini-menu with several options. • -t allows the declaration of a monitoring table, e.g.: -t “SOR yes EOR y” will show only start-of-run and end-of-run events, skipping all events of other types. • -T allows the declaration of an extended monitoring table e.g.: -T “SOR y * * PHY Y 1&3 *” will monitor start-of-run records and physics events with attributes 1 and 3 set, skipping all the other events. The buffer parameter must always be specified. The syntax to be used is the same as for the parameter of the monitorSetDataSource entry (see Table 5.1). The attributes shall be specified according to the SITE-specific conventions (USER attributes) or the central ${DATE_COMMON_DEFS}/event.h definition (e.g. ATTR_P_START - corresponding to a phase start event - can be monitored specifying the attribute 64). ALICE DAQ and ECS manual Monitoring of the online monitoring scheme 103 5.7 Monitoring of the online monitoring scheme The DATE monitoring environment may influence the performance of the data acquisition system it is connected to, usually reducing its performance. This can be caused either by high-debit monitoring schemes (too many clients and/or too demanding clients) or by “bad” architectural designs. For this main reason, DATE provides a set of tools to monitor the status of the monitoring scheme in a “live” environment. This set includes the following utilities (whose alias is defined by the DATE setup procedure): • monitorClients: live display of the list of all registered clients, eventually with the name of the host they are running on. • monitorSpy: live, highly detailed snapshot of the monitor scheme data structures. It is not possible - and not logical - to monitor the status of an offline or relayed monitoring host. For machines belonging to multiple DATE setups, it is mandatory to set the environment variable DATE_ROLE_NAME to its appropriate value prior to run any of the monitoring utilities. If this is not done, the monitoring library arbitrarily chooses the first match in alphabetical order with the host name and uses it, which may lead to incorrect results for multi-role hosts. 5.7.1 The monitorClients utility The monitorClients utility gives a report on the usage of the monitoring scheme local to the host it runs on. Without parameters, it gives the list of monitor programs currently registered and their monitoring policies. If used with the “-t” option, it gives a continuous list of the clients and their usage of the monitoring streams (in number of events and number of bytes transferred): the display process can be interrupted using the ^C (obtained pressing the “control” and the “C” keys) quit signal. Two examples on the use of the monitorClients utility are given in Listing 5.2. Listing 5.2 Examples of use of the monitorClients utility 1: > monitorClients 2: 10 clients max, 1 client declared, 2 processes attached 3: PID SOR EOR SORF EORF SOB EOB PHYS CAL FERR Monitoring program: 4: 53648 yes 5: 6: 7: 8: 9: 10: ALICE DAQ and ECS manual yes yes yes yes yes DATE V3 event dump V1.08@host yes yes > monitorClients -t Displaying top clients: ^C to stop PID Bytes/s Mp 37678 843113 DATE V3 event dump V1.08@host PID Events/s Mp 37678 242 DATE V3 event dump V1.08@host yes EOL yes The monitoring package 104 5.7.2 The monitorSpy utility The monitorSpy utility can be used to obtain a snapshot of the entire monitoring scheme in use on the host where the utility is executed. An example of the information that can be obtained with this utility is given in Listing 5.3. Listing 5.3 Examples of use of the monitorSpy utility 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: > monitorSpy mbMonitorBufferSize: 0x30000000 13114156 bytes --------------------------------------------------------------------mbMaxClients: 0x30000004 10 clients max mbMaxEvents: 0x30000008 100 events max mbEventSize: 0x3000000c 131072 bytes/event mbEventBufferSize: 0x30000010 13107200 bytes event data --------------------------------------------------------------------mbNumSignIn: 0x30000024 1 clients mbNumSignOut: 0x30000028 0 clients mbForcedExits: 0x3000002c 0 clients mbTooManyClients: 0x30000030 0 clients mbEventsInjected: 0x30000034 0 events mbBytesInjected: 0x30000038 0 bytes -------------------------------------------------------------------monitorEvents: 0x30001030 mbOldestEvent: 0x30000018 -1 (index) mbFirstFreeEvent: 0x3000001c -1 (index) mbCurrentEventNumber:0x30000020 0 (sequential number) -------------------------------------------------------------------monitorClients: 0x30000040 10 clients [ 0 .. 9 ] mbNumClients: 0x30000014 1 client(s), 1 process(es) attached 0@30000040..300001d7: PID:22202, reg#:1, mp:"DATE V3 event dump V1.08", last event:0, events:0, bytes:0, WAITING Monitor policy: SOR :monitor -=[...] FORMATERROR :monitor -=1 .. 9: unused -------------------------------------------------------------------mbFreeEvents: 0x30001b20, 1 frag, size: 13107188 tot, 13107188 avg Free list: 0x30001b2c (13107188, 0x00c7fff4) 0x00000000 0x00000000 0x00000000 0x00000000 monitorEventsData: 0x30001b2c 5.8 Monitoring configuration Functionally, hosts participating in a DATE monitoring scheme can be defined as: 1. monitoring hosts running specific monitoring programs, either part of the standard monitoring package (e.g., the utility eventDump) or written by any DATE user. 2. monitored hosts offering monitoring streams to monitoring programs: these streams can be online streams (from a live data-acquisition system) or offline streams (typically files available from permanent data storage). 3. relaying hosts offering a liaison between monitored and monitoring hosts that cannot establish a direct link due to the presence of network firewalls or ALICE DAQ and ECS manual Monitoring configuration 105 gateways. It is possible to have any combination of those three functions, e.g. hosts who are monitoring, are monitored and offer relayed monitoring to other hosts. The DATE monitoring scheme needs to be configured only for monitored and relaying hosts in the following situations: 1. online monitored hosts (LDCs or GDCs) offering online, offline or relayed monitoring to itself and/or to other hosts. 2. hosts that are part of a DATE system offering offline or relayed monitoring to other hosts. No setup is required for hosts only wishing to perform monitoring, either on the same or on remote hosts and a complete DATE installation is not required. For the developer of the monitoring program itself, a library is available and can be used in stand-alone mode. Otherwise, monitoring programs can be exported to any type of hosts (within the set of supported architectures) with no need for extra files or special setups. No daemons are necessary and no configuration is required on the monitoring hosts. We will now review the configuration needed on monitored and relayed hosts to let them perform their function. 5.8.1 Creation of configuration files The monitoring scheme can be configured using three separate files: • ${DATE_SITE_CONFIG}/monitoring.config: this file is optional and can be used to control a complete DATE site, all types of hosts. • ${DATE_SITE}/${DATE_HOSTNAME}/monitoring.config: this file is mandatory for online hosts and must be created by the DATE system administrator. It is not required for offline, relayed or monitoring hosts. • /etc/monitoring.config: this file is optional and can be used to control the behavior of relaying hosts; it is not used by online or offline monitored hosts. The above files should be created using the following commands: Listing 5.4 Creation of configuration files 1: > touch file 2: > chmod u=rw,g=rw,o=r file where “file” is the full path of the file to be created. Once created, the configuration files can be edited and parameters can be specified as a list of names followed by their associated values. Comments can be inserted via the “#” sign, e.g.: # This is a comment PARAMETER VALUE # comment These files can be changed at any time. Some of the parameters (those labelled in Table 5.5 as “Online monitoring only”) require the acquisition to be stopped and no ALICE DAQ and ECS manual The monitoring package 106 active clients (the command monitorClients - see Section 5.7.1 - can be used to check for registered clients). All the other parameters can be changed at any time and will become active for all new clients (monitored and monitoring hosts) started after the modification(s). When the same parameter is defined in multiple files, a “last given” policy is followed, that is: • parameters defined in ${DATE_SITE_CONFIG}/monitoring.config can be overwritten by equivalent definitions from any of the other files. • parameters defined in ${DATE_SITE}/${DATE_HOSTNAME}/monitoring.config are final for local monitoring and can be overwritten by equivalent definitions from /etc/monitoring.config for relayed monitoring. • parameters defined in /etc/monitoring.config are final and cannot be overwritten. Only exception to this scheme is the parameter LOGLEVEL, where the highest given level is used regardless of their definition point (e.g. if the values 0, 20 and 10 are specified, the value used will be 20). The parameters that can be specified in the configuration files are listed in Table 5.5. Table 5.5 Monitoring configuration parameters Parameter name Used for Description LOGLEVEL All types of monitoring Level for error, information and debug statements generated by the monitoring scheme MAX_CLIENTS Online monitoring only Maximum number of clients allowed to be registered simultaneously MAX_EVENTS Online monitoring only Maximum number of events available for monitoring EVENT_SIZE Online monitoring only Average event size EVENT_BUFFER_SIZE Online monitoring only Size of buffer used to store events data EVENTS_MAX_AGE Online monitoring only Maximum age (in seconds) of the events available for monitoring MONITORING_HOSTS Online monitoring Networked monitoring Comma-separated list of hosts allowed to perform monitor-when-available from this host MUST_MONITORING_HOSTS Online monitoring Networked monitoring Comma-separated list of hosts allowed to perform all types of monitoring from this host ALICE DAQ and ECS manual Monitoring configuration 107 For the MONITORING_HOSTS and MUST_MONITORING_HOSTS parameters, a comma-separated list of hosts should be given, e.g.: MONITORING_HOSTS localhost,pcxy,pcabc01 In the above example, the hosts allowed to perform “normal” monitoring are the local host, all hosts whose name begins with pcxy (pcxy01, pcxy02 and so on) plus the host pcabc01. A host who is defined within the MONITORING_HOSTS list can only perform monitoring-when-available. To be able to perform 100% monitoring, a host must be in the MUST_MONITORING_HOST list. If the parameter MUST_MONITORING_HOSTS is not specified, all hosts can perform 100% monitoring on the monitored machine. Conversely, if the parameter MONITORING_HOSTS is not specified, all hosts can perform monitoring functions on the given machine. If processes on the local host are allowed to perform monitoring, the MONITORING_HOSTS and MUST_MONITORING_HOSTS lists must contain the localhost keyword. If the keyword localhost is not present, local monitoring will not be allowed. This keyword is needed as local monitoring “escapes” from the network protocol and is instead performed via memory-mapped direct access. ALICE DAQ and ECS manual 108 The monitoring package ALICE DAQ and ECS manual The readout program 6 This chapter describes the DATE software running on the LDC that manages the data stream. There are two processes in an LDC, namely readout and recorder. The readout process waits for a trigger, reads out the front-end electronics, and fills a FIFO with the sub-event data. The recorder process off-loads this FIFO and sends the sub-event data to a local disk, to a named pipe, or to a GDC over the network. In particular this chapter explains how to build and how to customize a readout program. By using the generic readList library, the readout program is organized as a collection of equipments. Each equipment can be programmed independently, can be selected (activated and deactivated) and parameterized before the run starts without changing the code. All the software is contained in the DATE packages readout and readList. ALICE DAQ and ECS manual 6.1 The readout process . . . . . . . . . . . . . . . . . . . . . . . 110 6.2 The generic readList concept . . . . . . . . . . . . . . . . . . 115 6.3 Using the generic readList . . . . . . . . . . . . . . . . . . . 117 6.4 The equipmentList library . . . . . . . . . . . . . . . . . . . 118 The readout program 110 6.1 The readout process The readout process and the recorder process are running in all the LDCs participating in the data taking. This chapter is devoted to the readout process, whereas the recorder process is covered in Section 10.3. The readout process executes the suitable code to perform the front-end electronic readout. This code is specified in a separate software module, called readList, which has to be compiled and linked with the readout main program. The readlist module consists of the following five routines: • ArmHw, called at each start of run to perform the initialization; • AsynchRead, called in the main event loop to perform the readout of the hardware that produces an asynchronous flow of data; • EventArrived, called in the main event loop to discover whether a trigger has occurred; • ReadEvent, called in the main event loop after the arrival of a trigger to perform the readout of the hardware; • DisArmHw, called at each end of run to perform the hardware rundown. Figure 6.1 shows the structure of the readout program and how these routines are called in the main event loop. begin SOR scripts ArmHw SOR files finish ? no no handle sub−event Figure 6.1 yes AsynchRead EOR scripts EventArrived DisArmHw event arrived ? yes EOR files end ReadEvent Main event loop executed by the readout process. ALICE DAQ and ECS manual The readout process 111 6.1.1 Start of run phases At each start of run (SOR), the readout process performs the following sequence of operations in the order described below: 1. maps to all memory banks that are configured for this LDC in the banks database (see Chapter 4.3.6); 2. executes the common SOR scripts (if any), see “SOR.commands” in the ALICE DAQ WIKI; 3. executes the detector SOR scripts (if any); 4. executes the specific SOR scripts (if any); 5. calls the routine ArmHw; 6. produces a SOR record: the eventType field of the base event header is set to START_OF_RUN, the eventTypeAttribute field of the base event header is set to START_OF_RUN_START and the payload is empty; 7. prepares the common SOR files (if any, one record per file); 8. prepares the detector SOR files (if any, one record per file); 9. prepares the specific SOR files (if any, one record per file); 10. produces a SOR record: the eventType field of the base event header is set to START_OF_RUN, the eventTypeAttribute field of the base event header is set to START_OF_RUN_END and the payload is empty. The SOR sequences have been split into phases, corresponding to the points enumerated above. At each time the timeout (given by the LDC run parameter Max. time for SOR/EOR phases) is restarted to also allow longer initialization or ending phases. Once the SOR phase has been completed readout reads from the DATE data base two configuration files: 1. readout.config, to select the GDC selection algorithm, the possibility to dump the event payload during the run and activate the back-pressure monitor, (see “readout.config” in the ALICE DAQ WIKI); 2. CDH.config that contains the readout instructions on how to print the detector specific information stored in the CDH, (see “CDH.config” in the ALICE DAQ WIKI). If these two files don’t exist, readout uses the default configuration, described in the ALICE DAQ WIKI: • GDC_SELECTION_ALGORITHM = ORBIT_RAND • READOUT_DUMP_PAYLOD = NO • BCKP_MONITOR = NO • CDH.print = FALSE ALICE DAQ and ECS manual The readout program 112 6.1.2 Main event loop After the start of run phases, the readout process enters in the main event loop (see Figure 6.1). It allocates memory for one sub-event, which depends on the value of the LDC run parameter Paged data flag (see Chapter 3): • for streamlined data (Paged data flag is 0) readout allocates the following: an entry for an event descriptor in the readoutReadyFifo and memory from the readoutData memory bank for the payload, whose size is given by the LDC run parameter Max. event size.; • for paged data (Paged data flag is 1) readout allocates the following: an entry for an event descriptor in the readoutReadyFifo and memory for the first level vector from the readoutFirstLevel memory bank. The readout process calls the routine AsynchRead to activate the readout of hardware that produces an asynchronous flow of data and then waits for a trigger by calling the routine EventArrived: the arrival of a trigger can signal for example a physics event or a start of burst (SOB) or an end of burst (EOB). If no events are arriving (routine EventArrived returns 0), the innermost main event loop is executed at maximum speed as long as there is no “end of run” request. If the LDC run parameter startOfData/endOfData event enabled is set, the readout process also exits the main event loop, when the timeout (given by LDC run parameter startOfData timeout or endOfData timeout) to wait for events of type START_OF_RUN or END_OF_RUN (see “DATA FORMAT” in the ALICE DAQ WIKI) has expired. Each time an event has been arrived, the readout process fills the base event header fields for which it is responsible (including the field eventTimestamp to tag the sub-event with an absolute timestamp), and it increments the run-time variable Number of sub-events for all event types. Then the routine ReadEvent is called, which is in charge of transferring the event data and for filling the base event and equipment header fields (see Chapter 3). Afterwards the readout process performs the following operations in the order described below: 1. checks that the mandatory fields in the base event header have been set by the ReadEvent routines; 2. checks that the eventId field in the base event header filled by the routine ReadEvent is not zero and has an increasing value for events of types PHYSICS_EVENT and CALIBRATION_EVENT.; 3. checks whether the eventType field is START_OF_RUN for the first arrived event. If this event has a different type or if it arrives after the startOfData timeout period, then an “end of run” request with an error condition is issued. The LDC run parameter startOfData/endOfData event enabled must be set to enable this check, otherwise it is omitted; 4. checks whether the eventType field is END_OF_RUN after having received an “end of run” request. If such an event does not arrive within the endOfData timeout period, then the end of run phases (see Section 6.1.3) are executed with an error condition. The LDC run parameter startOfData/endOfData event enabled must be set to enable this check, otherwise an “end of run” request leads directly to the end of run phases; 5. fills the eventGdcId field in the base event header for PHYSICS_EVENT and ALICE DAQ and ECS manual The readout process 113 CALIBRATION_EVENT events with a default value in order to achieve a fair distribution of sub-event to multiple GDCs. The dispatch algorithm uses the “number in run” part of the eventId field if FIXED TARGET mode (see Section 3.4) or the “orbit counter” part of the eventId field if COLLIDER mode. The eventGdcId field can be overwritten by the edmAgent process (see Chapter 13). By default the destination of special event types (START_OF_RUN, START_OF_RUN_FILES, END_OF_RUN, END_OF_RUN_FILES, START_OF_BURST, END_OF_BURST) is the first GDC; 6. if FIXED TARGET mode is selected (see Section 3.4), it fills the “number in run” part within the eventId field in the base event header for END_OF_BURST events in such a way that it contains the last event number held in the header of the last physics event in the burst and the number of events in the last burst. These values are used by the event builder to make consistency checks based upon independent criteria and redundant information.; 7. for START_OF_BURST events, it sets the run-time variable inBurst flag to 1, and for END_OF_BURST events sets the run-time variable inBurst flag to 0; 8. for FIXED TARGET mode it fills the run-time variables Number of sub-events in burst and Number of bursts in the shared memory control region; 9. increments the run-time variable Number of triggers in the shared memory control region for events of type PHYSICS_EVENT; 10. fills the eventSize field in the base event header by taking into account the event scheme (streamlined or paged).; 11. sets the run-time variable Bytes injected in the shared memory control region for all event types. The readout process exits the main event loop, if one of the following six conditions is met: • the maximum number of events to be collected (given by the LDC run parameter Max. number of sub-events) has been reached; • the maximum number of bursts to be collected (given by the LDC run parameter Max. number of bursts) has been reached and there is an END_OF_BURST type of event; • the maximum number of bytes to be collected (given by the LDC run parameter Max. bytes to record) has been reached; • the arrival of an “end of run” request combined with the following three cases: • – the parameter startOfData/endOfData event enabled is not set, hence there is no waiting for an event of type END_OF_DATA; – the parameter startOfData/endOfData event enabled is set and an event of type END_OF_DATA has been received within the timeout (given by the parameter endOfData timeout); – the parameter startOfData/endOfData event enabled is set and an event of type END_OF_DATA has not been received within the timeout (given by the parameter endOfData timeout); an event of type START_OF_DATA has not been received within the timeout (given by the parameter startOfData timeout) when the parameter startOfData/endOfData event enabled is set; ALICE DAQ and ECS manual The readout program 114 • a fatal error has occurred. All records are inserted in the readoutReadyFifo. In addition, they are also injected in the buffer reserved for monitoring (see Chapter 5) if the following conditions are met: • the LDC run parameter Monitor enable flag is set to 1; • a monitor program requesting this type of events is running; • there is enough space in the monitoring buffer. 6.1.3 End of run phases At each end of run (EOR), the readout process performs the following sequence of operations in the order described below: 1. executes the common EOR scripts, if any, (see “EOR.commands” in the ALICE DAQ WIKI); 2. executes the detector EOR scripts, if any; 3. executes the specific EOR scripts, if any; 4. calls the routine DisArmHw; 5. produces an EOR record: the eventType field of the base event header is set to END_OF_RUN, the eventTypeAttribute field of the base event header is set to END_OF_RUN_START and the payload is empty; 6. prepares the common EOR files, if any (one record per file); 7. prepares the detector EOR files, if any (one record per file); 8. prepares the specific EOR files, if any (one record per file); 9. produces an EOR record: the eventType field of the base event header is set to END_OF_RUN, the eventTypeAttribute field of the base event header is set to END_OF_RUN_END and the payload is empty; 10. updates the bookkeeping information with the run number, the physics events count, the SOB records count, the EOB records count, the trigger count and the burst count. The EOR sequences have been split into phases, corresponding to the points enumerated above. At each time the timeout (given by the LDC run parameter Max. time for SOR/EOR phases) is restarted to also allow longer initialization or ending phases. 6.1.4 Log messages It is possible to choose where to direct the output of messages produced by the readout process. The script readout_startup.sh located in directory ${DATE_READOUT_BIN} can be edited. The options proposed are the following: • output via infoLogger (default); • creation of an iconized xterm where all the output produced should appear; • no output; ALICE DAQ and ECS manual The generic readList concept 115 • output to a file (e.g. /tmp/Readout@hostname.log); • output to a file (e.g. ${DATE_SITE_TMP}/Readout@hostname.log). By default the readout process uses the infoLogger package to report and trace errors or abnormal conditions, and to trace state changes. The operator can tailor these features to the required needs by setting the value of the LDC run parameter Logging level (see the ALICE DAQ WIKI to operate the system). The readout process also updates the bookkeeping information at the end of the run through the LOGBOOK facility. If the readout process crashes, a core dump for post-mortem analysis is produced in the ${DATE_SITE_TMP}/${DATE_HOSTNAME}/readout directory. 6.2 The generic readList concept This section describes how the readout program accesses the hardware by calling the five routines of the readList module (see Figure 6.1), which contains the code specific to a given electronics setup. Instead of writing several of these modules, the generic readList concept allows to group the code for all the electronics setups in another library called equipmentList (see Figure 6.2). readout.c main event loop readList.c equipment configuration Figure 6.2 ArmHw AsynchRead EventArrived ReadEvent DisArmHw equipmentList ArmNNN AsynchReadNNN EventArrivedNNN ReadEventNNN DisArm NNN The generic readList concept of the readout process. The readout software specific to an electronics setup can be written separately for a so-called equipment. One equipment is responsible for generating data from an electronics board or a set of electronics boards, depending on how the readout software is structured. A set of equipment-handling routines deals with one single equipment, thus the code is more modular and readable. The routine names are fixed by convention: the name is obtained by concatenating the prefix Arm, DisArm, AsynchRead, ReadEvent and EventArrived with the name of the equipment type NNN as declared by the equipment configuration. All the five routines must be implemented for an equipment. If one of these functionalities is not required, a dummy routine should be provided. The details of this library is described in Section 6.4. ALICE DAQ and ECS manual The readout program 116 The equipment configuration defines the equipments used for each LDC. It is done with the equipment databases (see Chapter 4). An equipment may be repeated several times in a detector; each run-time call will be distinguished by a different set of equipment parameters. The configuration file specifies the selection of the active equipments and the setting of the parameters that will be passed to the readout routines. Therefore, it is possible to modify the readout program behaviour without changing the readout executable code. As a result of this generic readList concept, the sub-events of an LDC are divided further into smaller parts, called equipment data or fragments. Each equipment data block begins with an equipment header, followed by the equipment raw data (see Chapter 3). Hence a fully built event contains the sub-events from the various LDCs, which in turn contain the fragments from the equipments. In order to realize this concept, the generic readList module implements the following functions: • ArmHw It loads the equipment configuration, identifies the equipments involved in the readout of the LDCs, saves them in a table and then calls the ArmNNN routine for each active equipment in the order specified in the configuration file. It also sets the run-time variable Number of equipments to the number of data generating equipments. • AsynchRead It calls the AsynchReadNNN routine for each active equipment configured in the equipment database. • EventArrived It calls the EventArrivedNNN routine of the active equipment configured in the equipment database. • ReadEvent It calls the ReadEventNNN routine for each active equipment. In addition it generates the equipment header if it is an equipment generating data. • DisArmHw It calls the DisarmNNN routine for each active equipment. The handling of the equipment routines must be further configured with the help of the optional attributes GENDATA and TRIGGER. They are assigned to an equipment type as part of the equipment configuration. • GENDATA: Usually an equipment generates data, but it may be convenient to isolate some specific processing in an equipment that does not generate any data. Only the equipment routine ReadEventNNN is able to produce data when the attribute GENDATA is set. If this attribute is set, then the equipment header is generated and the ReadEventNNN routine is called with parameters to access the equipment header and the raw data. If this attribute is omitted, then no equipment header is generated and the equipment routine is called with parameters that do not allow to access the equipment header and the raw data. • TRIGGER: The trigger hardware plays a special role, since it is in charge to indicate if a sub-event has arrived or not. This decision needs to be captured by the EventArrivedNNN routine of an equipment. However, this equipment routine is only called if the attribute TRIGGER is set. Several trigger equipments may be declared in the equipment configuration, but only one per LDC should ALICE DAQ and ECS manual Using the generic readList 117 be active. If there are more of them, no warnings are given and only the first one is used. 6.3 Using the generic readList The use of the generic readList (see Section 6.2) requires the preparation of two components: • A source code file that contains the equipment routines to handle the readout of all the equipments that may be activated in an LDC. A detailed description on how to prepare it is given in Section 6.4. • An entry in the equipment database describing the configuration of each active equipment. These two components are strictly correlated and must match one another. There is no mechanism to make sure that this is the case. Error conditions due to a mismatch are discovered at start of run and will immediately stop the run. The conventions tying these files are the following: • The name of the equipments in the equipment database constrains the name of the equipment routines in the source code file. A prefix is added to the equipment type, as explained in Section 6.2. • The equipment routines must be declared using macros provided in readList_equipment.h. These macros provide the link between the equipment name (read by readList from the equipment database) and the address of the readout routines (to be called by readList). The usage of these macros is explained in Section 6.4.3. • The configuration described in the equipment database is shared by all the LDCs in the system, therefore all the equipments used in each of the LDCs must be declared there. A readout program has to be linked to the equipmentList library. It is possible to declare in the equipment database equipments, for which the handling code is not provided. The readout program will abort the run during the Arm phase, only if missing LDCs are selected as active. The advantage is to build a readout program containing only the equipments that will be used in the target LDC, whereas the equipment configuration file can still contain more equipments than the ones encoded in the readout program. • Parameters that can be passed to the equipment routines. The parameter declarations in equipment database and the source code file must match. The handling of these equipment parameters is explained in Section 6.4.2. The DATE package readList contains four suites of equipment software: TEST, CTP, DDL and UDP. Only one of these suites will be present in any LDC. Users can add their own specific equipment software in one of these suites or create a new one. With the aid of the TEST suite, software simulated events are produced mostly for testing purpose. The DDL suite contains all the equipment software to handle the RORC readout (see Section 7.1). The CTP suite is an equipment developed for the ALICE trigger system to read out information sent by the Central Trigger System (CTP). The UDP ALICE DAQ and ECS manual The readout program 118 suite contains all the equipment software to handle the UDP readout (see Section 7.3). To each equipment suite belongs the source code file for the equipment routines and the associated GNUmakefile. A summary is given in Table 6.1. Table 6.1 Equipment suites in the readList package equipment suite components TEST equipmentList_TEST.c GNUmakefile_TEST CTP equipmentList_CTP.c GNUmakefile_CTP DDL equipmentList_DDL.c/.h GNUmakefile_DDL UDP equipmentList_UPD.c/.h GNUmakefile_UDP As an example, to prepare a readout program for the TEST suite by using the generic readList, see “How to prepare a readout program” in the ALICE DAQ WIKI. The equipment configuration may be changed between runs just by modifying the equipment database. Changes are taken into account only at the next start of run. If the modifications concern only the descriptive parts of the equipment, such as: • adding or removing equipments assigned to an LDC; • activating or deactivating equipments assigned to an LDC; • changing the value of an equipment parameter. Then the readout program does not need to be re-built. However, the readout program must be re-build if the modifications in the equipment configuration (e.g. adding equipment parameters) entail changes in the source code files. 6.4 The equipmentList library This section provides the synopsis, the parameter handling and the functional references of the five equipment routines that are required for each equipment in the equipmentList library. The user has to provide these routines that are specific to a readout electronics setup. Examples of these routines can be found in the source code files of the four equipment suites, as shown in Table 6.1. 6.4.1 Synopsis of the equipment routines The synopsis of the routines ArmNNN, AsynchReadNNN, EventArrivedNNN, ReadEventNNN, and DisArmNNN for an equipment of type NNN is given below. All five routines must be provided for an equipment, even if some of them are empty. ALICE DAQ and ECS manual The equipmentList library 119 Upon return from these routines the readout process checks the content of the global variable readList_error, whose value allows signaling error conditions. If its value is different from 0, the readout process logs an error message containing the value of the variable and the name of the routine originating the error through the global variable readList_errorSource (which is filled by the generic readList module), and asks to stop the run. ArmNNN Synopsis #include “rcShm.h” #include “event.h” #include “readList_equipment.h” void ArmNNN( char *par ) Description The ArmNNN routine is called by ArmHw at each start of run for equipment type NNN, after the execution of the start of run scripts and before the transfer of the start of run files on the output medium. This equipment routine should perform all the actions needed at the beginning of the run, e.g. the initialization of the hardware and of the trigger, or the assignment of values to global static variables. The routine cannot generate any data. Parameters: • Returns The parameter par is a pointer to a memory region containing the sequence of pointers to the values of the parameters of the component being armed; these values are read at run time from the equipment database and are assigned to the equipment before arming the LDC. The routine does not return any value. The global variable readList_error should be used to signal error conditions, which will provoke the log of a message and the termination of the run. AsynchReadNNN Synopsis #include “rcShm.h” #include “event.h” #include “readList_equipment.h” void AsyncReadNNN( char *par ) Description The AsynchReadNNN routine is always called by AsynchRead in the main event loop (see Figure 6.1) for equipment type NNN. It is called in a strictly closed loop without sleeping just before the EventArrivedNNN routine. Since this equipment routine is invoked before the ReadEventNNN routine, it offers the possibility to perform the readout of hardware that produces an asynchronous data flow. However, it cannot pass any data to readout. Only the routine ReadEventNNN is designed for this purpose. If this feature is not needed, this routine should be left empty. ALICE DAQ and ECS manual The readout program 120 Parameters: • Returns The parameter par is a pointer to a memory region containing the sequence of pointers to the values of the parameters of the component being armed; these values are read at run time from the equipment database and are assigned to the equipment before arming the LDC. The routine does not return any value. The global variable readList_error should be used to signal error conditions, which will provoke the log of a message and the termination of the run. EventArrivedNNN Synopsis #include “rcShm.h” #include “event.h” #include “readList_equipment.h” int EventArrivedNNN( char *par ) Description The EventArrivedNNN routine is called by EventArrived in the main event loop (see Figure 6.1) only if the TRIGGER attribute is assigned to this equipment type NNN. It is called in a strictly closed loop without sleeping just after the AsynchReadNNN routine. The purpose of this routine is to indicate whether a trigger has occurred or not. This can be done either by polling and returning immediately (with 0 if no trigger has occurred) or by waiting for an interrupt with an appropriate driver call for the hardware. Parameters: • Returns The parameter par is a pointer to a memory region containing the sequence of pointers to the values of the parameters of the component being armed; these values are read at run time from the equipment database and are assigned to the equipment before arming the LDC. The function must return the value 1 if a new event has arrived, or 0 otherwise. The global variable readList_error should be used to signal error conditions, which will provoke the log of a message and the termination of the run. ReadEventNNN Synopsis #include “rcShm.h” #include “event.h” #include “readList_equipment.h” int ReadEventNNN( char *par, struct eventHeaderStruct *ev_header, struct equipmentHeaderStruct *eq_header, int *data ) ALICE DAQ and ECS manual The equipmentList library Description 121 The ReadEventNNN routine is always called by ReadEvent in the main event loop (see Figure 6.1) for equipment type NNN after a trigger has arrived. This equipment routine is the place to make the data available to the readout and to fill the fields in the base event and equipment headers. Parameters: • The parameter par is a pointer to a memory region containing the sequence of pointers to the values of the parameters of the component being armed; these values are read at run time from the equipment database and are assigned to the equipment before arming the LDC. • The parameter ev_header is a pointer to the base event header (see Chapter 3), which is defined in the ${DATE_COMMON_DEFS}/event.h header file. • The parameter eq_header is a pointer to the equipment header (see Chapter 3), which is defined in the ${DATE_COMMON_DEFS}/event.h header file. This pointer is only valid (i.e. different from NULL) if the attribute GENDATA is assigned to this equipment type in the declaration part of the equipment database. • The parameter data is a pointer to the raw data block to fill in. This pointer is only valid (i.e. different from NULL) if the attribute GENDATA is assigned to this equipment type in the declaration part of the equipment database. If this equipment is designed for streamlined events (LDC run parameter Paged data flag is 0), then the data must be copied to the memory where parameter data is pointing. If this equipment is designed for paged events (LDC run parameter Paged data flag is 1), then the equipment vector (see Chapter 3) needs to be placed where parameter data is pointing. The main readout program sets the variable readList_bufferSize to the value of the available space for the data of the current equipment. The ReadEventNNN routine is supposed to use this value in order to prevent writing beyond the space available. This variable is accessible by making the following declaration: extern int readList_bufferSize; If streamlined events, the value of the variable readList_bufferSize is given by the LDC run parameter Max. event size (see Chapter 2) minus the size of the base event and the equipment header. For paged events, the value of the variable readList_bufferSize is given by the size of the first level vector minus the size of the equipment header. After calling the ReadEventNNN routine, the new value of the variable readList_bufferSize is calculated (both for streamlined or paged events) by the generic readList module. If the value becomes negative (i.e. overflow) the following error is provoked: the variable readList_error is set to 15 and the variable readList_errorSource is set to the string “ReadEvent equipment N overflow”, where N is the ordinal number of the faulty equipment. N is obtained by counting the active equipments only. The run will then be aborted. The ReadEventNNN routine is supposed to fill the fields in the base event and in the equipment headers. Refer to Chapter for a detailed description of them and how to access them with the help of macros. The most important header fields are: • eventId in the base header: it is mandatory, except for END_OF_BURST events. This variable is the event number in the run, where both the COLLIDER mode or the FIXED TARGET mode are encoded, see Section 3.4. It must always ALICE DAQ and ECS manual The readout program 122 increase during a run, but it allows for gaps. The readout process checks that the value of this field is non-zero and increasing for consecutive events inside a run for events of the type PHYSICS_EVENT and CALIBRATION, and asks to stop the run if not. This field must be the same in all the sub-events of the same event, since it is used by the event builder to perform consistency checks. This field should be set to 0 for START_OF_BURST events, while for END_OF_BURST events this field is overwritten by the readout process which sets it to the last event number held in the header of the last physics event in the burst. • eventType in the base header: it is mandatory and initialized to the type PHYSICS_EVENT. This variable marks the type of record. The readout process increments the trigger number only for the PHYSICS_EVENT type of record and not for other types of records. • eventTypeAttribute in the base header: it is optional and initialized to 0. This variable contains the system-defined attributes and the user-defined attribute associated to an event. • eventTriggerPattern in the base header: it is optional and initialized to 0. This variable contains the level 2 trigger pattern. • eventDetectorPatttern in the base header: it is optional and initialized to 0. This variable contains the level 2 detector pattern. • equipmentId in the equipment header: it is optional and initialized to 0. It is set by an equipment parameter to distinguish between equipments of the same type. • equipmentBasicElementSize in the equipment header: it is optional and initialized to 0. Usually it is set to 4 bytes. • equipmentTypeAttribute in the equipment header: it is optional and initialized to 0. This variable contains user-defined attribute associated to an equipment. Upon return from this routine, the main readout program checks that the user has filled the mandatory fields in the event header and updates some variables used in the runControl status display. Returns If the equipment produces data (i.e. the attribute GENDATA is assigned to this equipment type), the routine must return the number of bytes actually taken. This rule applies to both streamlined and paged events. If the equipment does not produce data (i.e. attribute GENDATA is omitted to this equipment type), the routine should return 0. The global variable readList_error should be used to signal error conditions, which will provoke the log of a message and the termination of the run. DisArmNNN Synopsis #include “rcShm.h” #include “event.h” #include “readList_equipment.h” void DisArmNNN( char *par ) ALICE DAQ and ECS manual The equipmentList library Description 123 The DisArmNNN routine is always called by DisArmHw at each end of run for equipment type NNN, after the execution of the end of run scripts and before the transfer of the end of run files on the output medium. This equipment routine should perform all the actions needed at end of run, such as the release of unused memory, the switching off of high voltages, or the saving of error statistics that may have been collected. Parameters: • Returns The parameter par is a pointer to a memory region containing the sequence of pointers to the values of the parameters of the component being armed; these values are read at run time from the equipment database and are assigned to the equipment before arming the LDC. The routine does not return any value. The global variable readList_error should be used to signal error conditions, which will provoke the log of a message and the termination of the run. 6.4.2 Accessing the parameters The equipment routines have access to the following parameters: • Equipment parameters are specific to an equipment type. They are accessible via a pointer received as first parameter in the routine call: char *par; • Global parameters can be used by all equipments. They are accessible via a global pointer: char *globPar; The order, type and format of these parameters is a matter of convention. Coherence must be assured between what is specified in the equipment database and the source code file. No check is performed by readList before calling the library. The values of the equipment parameters are copied into memory at run time while reading the equipment database configuration by following this convention on their formats. To ease the use of the parameters it is suggested to cast their memory pointer into a pointer to a structure with proper fields, according to the declaration in the equipment database. Listing 6.1 shows a skeleton of a source code file for the routines of equipment type Rand (lines 1-21). This is a simple equipment for testing purposes in the TEST suite. Important is the way how the parameters are declared as pointers in a C typedef (lines 2-7), casted to a local pointer (line 10) and eventually accessed (line 11). ALICE DAQ and ECS manual The readout program 124 6.4.3 The function references In order to make the functions contained in the library accessible from the generic readList, references to them must be created in the library through an array and a macro defined into the readList_equipment.h header file. Independent of their equipment attributes, the reference to their routines has to be made as follows: 1. The value returned applying the EQUIPMENTDESCR macro to the name of each equipment type must be put into the equipmentDescrTable array. The order is not important. 2. The variable nbEquipDescr must be set to the number of entries in the equipmentDescrTable array. Listing 6.1 (lines 23-27) shows an example on how to make the function references of the sketched above equipment type Rand. Listing 6.1 Example of an equipment source code file 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: /******************** equipment Rand *********************/ typedef struct { long32 *eventMinSizePtr; long32 *eventMaxSizePtr; short *eqIdPtr; short *triggerPattern; } RandParType; void ArmRand(char *parPtr) { RandParType *randPar = (RandParType *)parPtr printf("Arming random generator (id = %hd)” \ with min = %ld max = %ld triggerPattern = %d\n", *randPar->eqIdPtr, *randPar->eventMinSizePtr,*randPar->eventMaxSizePtr, *randPar->triggerPattern); 12: ... } 13: 14: void DisArmRand(char *parPtr) {} 15: 16: void AsynchReadRand(char *parPtr) {} 17: 18: int ReadEventRand(char *parPtr, struct eventHeaderStruct *header_ptr, struct equipmentHeaderStruct *eq_header_ptr, int *data_ptr) { 19: ... } 20: 21: int EventArrivedRand(char *parPtr) {} 22: 23: /****************** table of functions *******************/ 24: equipmentDescrType equipmentDescrTable[] = { 25: EQUIPMENTDESCR( Rand ) 26: }; 27: int nbEquipDescr = sizeof(equipmentDescrTable) / sizeof(equipmentDescrTable[0]); ALICE DAQ and ECS manual The RORC readout software 7 This chapter describes the DATE readout software for: • the RORC (Read-Out Receiver Card) which is the interface between the DDL (Detector Data Link) and the LDC. • The Ethernet port which can be used by DATE as alternative data source. Information on the implementation of the two equipments can be found in the following Sections: ALICE DAQ and ECS manual 7.1 Introduction to the RORC equipment . . . . . . . . . . . . . 126 7.2 Internals of the RORC equipment . . . . . . . . . . . . . . . 127 7.3 Introduction to the UDP equipment . . . . . . . . . . . . . . 145 7.4 Internals of the UDP equipment . . . . . . . . . . . . . . . . 145 The RORC readout software 126 7.1 Introduction to the RORC equipment The Read-Out Receiver Card (RORC) is a PCI master card that provides an interface between the Detector Data Link (DDL) and the PCI, PCI-X or PCI Express bus of a commodity PC. The DDL consists of a Source Interface Unit (SIU), which is attached to the front-end electronics inside the detector and a Destination Interface Unit (DIU). The SIU and the DIU are connected through a pair of optical fibres to transmit data up to a rate of 200 MB/s. Five types of RORCs have been designed: • The pRORC has a 32 bit/33 MHz PCI bus interface and handles one DDL channel using a piggy-backed DIU. • The single channel D-RORC has a 64 bit/64 MHz PCI bus interface and handles one DDL channel via a piggy-backed DIU. • The dual channel D-RORC has a 64 bit/64 MHz PCI bus interface and handles two DDL channels with embedded DIUs. • The dual channel D-RORC has a 64 bit/100 MHz PCI-X bus interface and handles two DDL channels with embedded DIUs. • The dual channel D-RORC has a x8 PCI Express (Gen. 2) bus interface and handles two DDL channels with embedded DIUs. In this chapter, these cards are commonly referred as RORC, since they do not differ from the software point of view. Depending on the acquisition needs and on the number of available PCI bus slots, one PC can be equipped with several RORCs (up to 6). Each RORC has a revision number and a unique serial number in its configuration EPROM. The RORC can generate pre-defined data streams for testing purposes. Dual channel RORC’s can be switched to “splitter mode”: data arriving in one channel is sent out on the other channel in automatic way. The data flow from the DIU to the PC memory is driven by the DMA engine of the RORC firmware in co-operation with the RORC readout software. During one DMA, only one data page can be written. Data pages that belong to the same sub-event are referred as fragment, which is transferred over the DDL by one or more DDL blocks. Each block can be up to 4x(2^19-1) = 2097148 bytes. For a comprehensive description of the DDL and the RORC see the Web site http://cern.ch/ddl. The stand-alone utility programs for the RORC are documented in Chapter 20. DATE provides all the necessary readout software to operate RORC devices on a PC running Linux via the two following packages: • Package rorc: it contains the Linux driver module, the library functions, and some utility programs to have an interface to a RORC device. This package is self-contained in the ${DATE_ROOT}/rorc directory. • Package readList: it contains the equipment software to realize a readout program for an LDC with a RORC device. The software is located in the ${DATE_ROOT}/readList directory and depends on package rorc. ALICE DAQ and ECS manual Internals of the RORC equipment 127 7.2 Internals of the RORC equipment The goal of RORC readout software is to operate several RORC devices attached to one LDC by considering the asynchronous data flow and the scattered location of data pages in the main memory. Moreover, the RORC readout software has to be structured in equipment routines as explained in Chapter 6. The rest of this chapter presents the internals of the RORC equipment software, which explains in more details the mechanism to transfer data with a RORC device (see Section 7.2.2), the software elements to handle it (see Section 7.2.3), the data flow for multiple RORC devices active in the LDC (see Section 7.2.4), and the pseudo code of the RORC equipment routines (see Section 7.2.6). 7.2.1 Event Identification The identification of the sub-event is given by the eventId field in its base event header. The RorcData equipment writes this field by taking into account the two common run parameters Collider mode and Common Data Header Present (see the ALICE DAQ WIKI). Their usage is illustrated in Figure 7.1. If the raw data contains the Common Data Header (see Section 3.9) with the cdhEvent1 field (12 bit bunch-crossing number) and the cdhEvent2 field (24 bit orbit number), then both Collider mode and Common Data Header Present should be set. This case is depicted in the upper half of Figure 7.1. Setting the parameter Common Data Header Present instructs the software to extract these Common Data Header fields and to use them for filling the eventId field (orbit counter and bunch-crossing counter part). Setting the parameter Collider mode ensures that the eventId field is decoded in COLLIDER mode (see Section 3.4). The 28 bit period counter of the COLLIDER mode identification is incremented by software, whenever the orbit number wraps. Not setting the parameter Collider mode in this scenario leads to an unsuitable event identification. If several fragments need to be assembled to one sub-event, then the RorcData equipment executes consistency checks among the Common Data Header fields of their fragments. If the raw data does not contain the Common Data Header, then none of the common run parameters Collider mode and Common Data Header Present should be set. This case is depicted in the lower half of Figure 7.1. When the parameter Common Data Header Present is not set, then the identification is done by the run-time variable Number of triggers. This is a 32 bit software counter, which is incremented for each arrived sub-event. It is used to set the eventId field (number in run part) when decoded in FIXED TARGET mode (see Section 3.4). Therefore parameter Collider mode should be not set in this scenario to avoid an incorrect event identification. ALICE DAQ and ECS manual The RORC readout software 128 Common Data Header cdhEventId2 (24 bit) cdhEventId1 (12 bit) Common Data Header Present = 1 period counter orbit counter bc counter Collider mode = 1 number in burst Collider mode = 0 eventId number in run burst nb Common Data Header Present = 0 Number of triggers (32 bit) software counter Figure 7.1 Event identification mechanism of the RorcData equipment. 7.2.2 Data transfer mechanism of the RORC device The mechanism to transfer data from the DDL to the PC memory by one RORC device involves three activities that may run in parallel: 1. Fill: a process using the RORC has to fill the rorcFreeFifo with references to free data pages to which the data stream from the DDL has to be transferred. The rorcFreeFifo is located in the firmware of the RORC and has 128 entries. Each entry consists of three fields: the physical start address (32 or 64 bits according to the address mode) of the data page, the size (24 bits) in words (= 4 bytes) of the data page, and the index (8 bits) of the rorcReadyFifo holding information about the data transfer. In fact, the index field is part of the physical address (bit 3 to 10) of an rorcReadyFifo entry. The latter FIFO is further described in the part for activity Transfer and Scan. 2. Transfer: the RORC transfers data from the DDL to the data page addressed by the top entry of the rorcFreeFifo. This can only take place if there is data arriving from the DDL and if the rorcFreeFifo is not empty. When the data transfer is completed, the RORC fills the rorcReadyFifo with information about the transfer. The rorcReadyFifo is located in the memory of the PC (the RORC needs to know the physical start address of it) and has also 128 entries. The RORC writes to the corresponding entry of the rorcReadyFifo, which is determined by the index field of the top entry of the rorcFreeFifo. Each entry consists of two fields: the length (32 bits) in words of the transferred data, and the transfer status (32 bits). The status field can either be a DTSTW (Data Transmission Status Word) or 0 if more pages are about to follow. A DTSTW marks the end of a DDL block (to allow fragments larger than 2 MB there can be several DDL blocks) or the end of a fragment. For all cases the status field also contains an error bit to indicate a transfer problem. Whenever a free data page with a particular index is given to the RORC during a fill activity, the status field of this indexed rorcReadyFifo entry has to be initialized to -1. 3. Scan: a process using the RORC needs to scan the rorcReadyFifo entries in ALICE DAQ and ECS manual Internals of the RORC equipment 129 order to find out if there are fragments ready in one or more pages. By looking up the status fields, a sequence can be obtained such as "0 0 0 DTSTW DTSTW 0 DTSTW 0 0 0 -1 -1". In this example there are 3 fragments ready (the first one consists of 4 pages, the second of 1 page, and the third of 2 pages) and one fragment is arriving but not finished. To simplify the scan activity, these entries are in increasing order with a wrap-around at 128. 7.2.3 Elements to handle the RORC device Based on the mechanism to transfer data with the RORC device, Figure 7.2 shows the software elements to handle it. They represent the main data structures of the readout software for the RORC device: ALICE DAQ and ECS manual The RORC readout software 130 DDL RORC page address (32 bits) page size (24 bits) page index (8 bits) writes to the page index: page length (32 bits) page status (32 bits) rorcPageOffsetFifo rorcReadyFifo 0: 1: 0: 1: 2: 2: <− nextPageInIdx <− nextPageOutIdx RORC_READY_FIFO_ MAX_SIZE: RORC_READY_FIFO_ MAX_SIZE: offset (32 bits) length (32 bits) status (32 bits) FragmentVector(s) 0: <− nextPageInFragmentIdx FRAGMENT_VECTOR_ MAX_SIZE: eventVector eventVector eventVector eventVector Size StartOffset BankId PointsToVector (32 bits) (32 bits) (16 bits) (16 bits) FragmentReadyFifo 0: 1: 2: <− nextFragmentOutIdx <− nextFragmentInIdx FRAGMENT_READY_FIFO_ MAX_SIZE: bankId (32 bits) Figure 7.2 bankOffset (32 bits) nbOfPages fragment InFragment DataSize (32 bits) (32 bits) rorcStatus (32 bits) The software elements to handle the RORC device. 1. rorcPageOffsetFifo: it is used to remember the bank offset of a free page given to the RORC during a fill activity, since this information is missing when the RORC has finished the transfer and is needed to find the location of the data page during a scan activity. The rorcPageOffsetFifo has up to 128 entries, where each entry holds the offset (32 bits) within the bank where the free data page is located. The handling is done by the indices nextPageInIdx and ALICE DAQ and ECS manual Internals of the RORC equipment 131 nextPageOutIdx, which are further explained in the rorcReadyFifo description. 2. rorcReadyFifo: it is used by the RORC during a transfer activity to store information about the completed data transfer(s), and during a Scan activity to retrieve the written data pages belonging to a fragment. As already pointed out in Section 7.2.2, the rorcReadyFifo has up to 128 entries, where each entry consists of the length field (32 bits) and the transfer status field (32 bits). The RORC needs to know the physical start address of the rorcReadyFifo. To operate this FIFO, two indices nextPageInIdx and nextPageOutIdx and a flag rorcReadyFifoFull are used (see Section 7.2.6). The index nextPageInIdx always points to the entry where the RORC device will make a notification about its next completed data transfer. This index is advanced (wrap-around at 128) only during a scan activity. The index nextPageOutIdx always points to the entry that determines the next free data page for the fill activity. Only during the fill activity the index nextPageOutIdx can be advanced (wrap-around at 128), but it must not overtake the index nextPageInIdx. At initialization the RORC is filled with 128 free pages and if no data page has arrived so far, all entries in the rorcReadyFifo have -1 in their status field, the index nextPageInIdx is equal to index nextPageOutIdx, and the flag rorcReadyFifoFull is TRUE. During the scan, the status field is read at index nextPageInIdx. If a page has arrived, the status field has turned to 0 or to a DTSTW, and the index nextPageInIdx can be advanced until a status field of -1 is hit. At the same time the RORC can be filled with new free pages, whose indices are taken by advancing the index nextPageOutIdx up to nextPageInIdx. If the filling of new pages fails (e.g. allocation of pages not possible) over a longer period, it may happen that nextPageOutIdx reaches nextPageInIdx after wrapping around and the flag rorcReadyFifoFull is FALSE. In this condition the rorcReadyFifo is empty and the data transfer cannot continue, since there are no free pages in the rorcFreeFifo. 3. FragmentVector(s): they are used to link together the data pages of one fragment transferred by the RORC. The FragmentVector will become the 2nd level vectors in the event tree (see Figure 7.3). During the Scan activity the status field is read of the rorcReadyFifo entries. For each complete fragment (status fields form a sequence of zeros, interspersed DTSTWs for DDL blocks, and a terminating DTSTW), a FragmentVector will be produced, where each entry represents one data page. An entry has four fields: the eventVectorBankId field (16 bits) and the eventVectorStartOffset field (32 bits) of a data page, the eventVectorSize field (32 bits) in bytes of the data block within the page, and the eventVectorPointsToVector field (16 bits) to indicate that the pair points directly to a data page and not to another vector. All this information is obtained from the rorcReadyFifo in conjunction with the rorcPageOffsetFifo. Moreover, the filling is assisted by an index nextPageInFragmentIdx. The parameter FRAGMENT_VECTOR_MAX_SIZE gives the maximum number of data pages per fragment, which must be known in advance for the allocation. 4. FragmentReadyFifo: it is used to queue the fragments transferred by the RORC. Given that one LDC can host more than one RORC, the appropriate fragments from each RORC device need to be built together in a sub-event before any further processing. Since the delivery rate of fragments may differ between the RORCs, the FragmentReadyFifo is designed to buffer these fragments. The parameter FRAGMENT_READY_FIFO_MAX_SIZE gives the ALICE DAQ and ECS manual The RORC readout software 132 maximum number of entries, where each entry has 5 fields: the bankId field (32 bits) and the bankOffset field (32 bits) to locate the FragmentVector, the nbOfPagesInFragment field (32 bits) to know the number of entries in the FragmentVector, the fragmentDataSize field (32 bit) to know the size in bytes of the fragment, and the rorcStatus field (32 bit) to store the DTSTW (the last one terminating the fragment). To operate this FragmentReadyFifo the two indices nextFragmentInIdx and nextFragmentOutIdx and the flag fragmentFifoFull are used in a simple way (see Section 7.2.6). An entry is made into the FragmentReadyFifo whenever one complete fragment has been found during the scan activity. All the fields in the FragmentReadyFifo entry can be filled by exploiting the current FragmentVector and the entry of the rorcReadyFifo status field. When a sub-event is being built, an entry is taken out from FragmentReadyFifo. 5. Data page(s): They contain the raw data delivered by the RORC. For simplicity they are not shown in Figure 7.2. A data page goes through the following cycle: during the Fill activity it is allocated from buffer readoutData and given to the RORC, held by the rorcFreeFifo and waiting to be filled by the RORC; during the Transfer activity it is written by the RORC, held by the rorcReadyFifo and waiting to be scanned as ready (status 0 or a DTSTW). During the Scan activity it is attached to a FragmentVector and a complete fragment is carried along with the FragmentReadyFifo. During sub-event building it is transferred to the readoutReadyFifo, processed and de-allocated by the recorder process (see Section 10.3). As a matter of fact, there is no memory-to-memory copy of data pages in this scheme. 7.2.4 Equipments to handle the RORC device The RORC readout software is implemented by three equipment types in the DDL equipment suite which are provided in the package readList (see Chapter 6): 1. RorcData is responsible for initializing the RORC and handling the autonomously delivered data pages from the RORC. One such equipment needs to be instantiated for each DDL channel in an LDC. It has the attribute GENDATA and can be configured by several parameters (see Section 7.2.4.1). 2. RorcTrigger is responsible for indicating the availability of a sub-event, where each DDL channel contributes a fragment. One equipment for each LDC needs to be instantiated. It has the attribute TRIGGER and can be configured by one parameter (see Section 7.2.4.2). 3. RorcSplitter enables a dual channel D-RORC to work in “split mode”. It does not have any attribute and can be configured by parameters (see Section 7.2.4.3). The first one, RorcData, has the attribute GENDATA and is in charge of reading the data from one RORC channel. The second one, RorcTrigger, has the attribute TRIGGER and is used for triggering one or more RORC channel(s). Their pseudo code is given in Section 7.2.6. The equipment routines are participating in the construction of a sub-event in paged mode (see Chapter 3), as shown in Figure 7.3. The sub-event is described by a 1st level vector, which is composed of the base header and three equipments; each of the equipments is represented by an equipment header and an equipment vector. The equipment vector of the first equipment points via a pair to a 2nd level vector, which is composed of three payload vectors in sequence. Each vector points again via a pair to one data page. The equipment vector of the second equipment points to a 2nd level vector with two payload vectors. The equipment vector of the third equipment points directly via a pair to one data page. As an option, an equipment vector may always point to a 2nd level vector, even if it contains only one payload vector. equipment 1 1st level vectors base header equipment 2 equipment 3 equipment equipment equipment equipment equipment equipment header vector header vector header vector 2nd level vectors (fragment vectors) payload vector payload vector payload vector payload vector payload vector data pages in memory bank(s) Figure 7.3 Example of one sub-event in paged event mode. The equipment RorcSplitter controls the feature on a dual channel D-RORC to duplicate the data stream from an incoming channel to an outgoing channel. It does not carry any attribute, since it does neither generate data nor functions as a trigger. In order to enable this feature, the run options HLT checkbutton (see the ALICE DAQ WIKI) must be set. 7.2.4.1 Equipment RorcData The equipment routines of RorcData for reading out data from one RORC channel are the following: 1. ArmRorcData(): it checks equipment parameters and logs a message. Allocates and initializes the rorcPageOffsetFifo, the rorcReadyFifo, and the FragmentReadyFifo. It resets and starts the RORC device. 2. AsynchReadRorcData(): it tries to fill the rorcFreeFifo with free pages. It scans the rorcReadyFifo to determine if data pages have arrived from the RORC device. If the status field of a data page has 0 or a DTSTW that terminates a DDL block, then this page will be added to the FragmentVector, which will be allocated if it is the first page of a fragment. If the status field of a data page has a DTSTW that terminates a fragment, then this page will be added to the FragmentVector as well, and the FragmentVector will be put into the FragmentReadyFifo. For triggering purpose (see Section 7.2.4.2), the global flag allFragmentsReadyFlag will be set to FALSE if the FragmentReadyFifo is empty. 3. EventArrivedRorcData(): this routine is empty. ALICE DAQ and ECS manual The RORC readout software 134 4. ReadEventRorcData(): it takes out a fragment from the FragmentReadyFifo and uses it to fill the equipment header and equipment vector of the 1st level vector. 5. DisArmRorcData(): it stops the RORC device and deallocates all memory blocks. 7.2.4.2 Equipment RorcTrigger The equipment routines of RorcTrigger are used for triggering the RORC devices. The global flag allFragmentsReadyFlag is used as trigger mechanism. This flag is set to TRUE at the beginning of each iteration of the inner data-taking loop, and set to FALSE if FragmentReadyFifo is empty. Hence, if there is at least one fragment in each FragmentReadyFifo, the value of this flag remains TRUE (“trigger arrived”). The routines are the following: 1. ArmRorcTrigger(): it checks the existence of memory banks, checks if the rcShm flag Paged data flag is set, and logs a message. 2. AsynchReadRorcTrigger(): it sets the allFragmentsReadyFlag to TRUE. 3. EventArrivedRorcTrigger(): it returns the value of allFragmentsReadyFlag. 4. ReadEventRorcTrigger(): it initializes in the base event header the eventId field (needed for CDH processing) and the eventTriggerPattern field. 5. DisArmRorcTrigger(): this routine is empty. 7.2.4.3 Equipment RorcSplitter The equipment routines of RorcSplitter are the following: 1. ArmRorcSplitter(): it enables the data splitting mode for the selected channel. If configured via the parameters, it enables the flow control handling. 2. AsynchReadRorcSplitter(): this routine is empty. 3. EventArrivedRorcSplitter(): this routine is empty. 4. ReadEventRorcSplitter(): this routine is empty. 5. DisArmRorcSplitter(): it disables the data splitting mode for the selected channel. It also disables the flow control handling. 7.2.4.4 Configuring the RorcData equipment The various parameters for the equipment RorcData can be separated into two groups, depending whether they are related to the payload or not. There is a choice between four data sources: • Equipment software (parameter dataSource = 1): events are generated by the RORC equipments without any RORC hardware, thus only by software. In this mode the DATE setup along with the readout program can be tested. • RORC internal data generator (parameter dataSource = 2): events are generated by the RORC internal data generator. In this mode the RORC can be ALICE DAQ and ECS manual Internals of the RORC equipment 135 tested stand-alone. • Front-end emulator (parameter dataSource = 3): events are generated by the front-end interface card (FEIC). Consult the Web at http://cern.ch/ddl for the documentation about the FEIC. In this mode the RORC and the DDL chain (SIU, optical fibers, DIU) can be fully tested. • Detector electronics: (parameter dataSource = 0): events are generated by the detector electronics. The complete DDL chain starting from the detector electronics is in operation. This mode needs to be chosen for data taking. Table 7.1 describes the RorcData equipment parameters that are not related to the payload. The section about the internals of the RORC equipments (see Section 7.2) helps understanding the meaning of these parameters. Typical values can be found in Listing 7.5 (lines 37 and 39). To achieve optimal performance, the parameter rorcReadyFifo should not exceed 128 and the size of the data pages (given by the parameter rorcPageSize) should be at least the average size of a fragment. The fragment size is limited by the number of pages per fragment (given by the parameter fragmentVectorSize) times the page size. For example in Listing 7.5 (line 37) the maximum fragment size is 1024 * 100000 bytes. Table 7.1 RorcData equipment parameters for all data sources Parameter Description eqId equipment id for the equipment header rorcRevision 1 = pRORC 2 = single channel D-RORC 3 = dual channel D-RORC 4 = dual channel D-RORC with PCI-X interface 5 = dual channel D-RORC with PCI eXpress interface rorcSerialNb serial number of the RORC rorcChannelNb 0 for the pRORC and single channel D-RORC 0 or 1 for the dual channel D-RORC dataSource 0 = Detector electronics 1 = Equipment software 2 = RORC internal data generator 3 = FEIC rorcPageSize data page size in bytes rorcReadyFifoSize number of rorcReadyFifo entries fragmentVectorSize maximum number of pages per fragment fragmentReadyFifoSize number of fragmentReadyFifo entries ctrlPtr for internal use (any value can be chosen) readyFifoPtr for internal use (any value can be chosen) The parameters of the RorcData equipment that are related to the payload are described in the following tables. The parameters in Table 7.2 refer to the equipment software as data source. At the moment only incremental data can be generated in this mode and the parameters rorcRevision, rorcSerialNb, and ALICE DAQ and ECS manual The RORC readout software 136 rorcChannel have no meaning. The parameters in Table 7.3 refer to the RORC internal data generator as data source, and the parameters in Table 7.4 refer to the FEIC as data source. Common to these three modes is that an event counter (starting at 1) is generated as the very first data word, which is counted in the fragment size. Moreover, fragments have fixed data size in case the parameters dataGenMinSize and dataGenMaxSize are equal, or random data size in case these parameters are different. Finally, the parameters in Table 7.5 refer to the detector electronics as data source. Table 7.2 Table 7.3 RorcData equipment parameters (equipment software) Parameter Description dataGenMinSize minimum event size in bytes: • fixed/random: minimum is 4 bytes dataGenMaxSize maximum event size in bytes: • fixed/random dataGenInitWord first incremental data word dataGenPatternNo not in use since only incremental data with event counter is generated (any value can be chosen) dataGenSeed seed for random generator expectedCdHVersion dummy (not checked) consistencyCheckLevel 0 = no data checks 1 = first and last data word are checked 2 = all data words are checked consistencyCheckPattern 5 = incremental data with event counter 8 = incremental data without event counter RorcData equipment parameters (RORC internal data generator) Parameter Description dataGenMinSize minimum event size in bytes: • fixed: minimum is 4 bytes, • random: no effect, always 4 bytes dataGenMaxSize maximum event size in bytes: • fixed: maximum is 2097152 bytes • random: the value will be rounded to the next lower power of 2, maximum is 2097152 bytes dataGenInitWord first incremental/decremental data word, constant or alternating data word dataGenPatternNo 1 = constant data 2 = alternating pattern 3 = flying 0 4 = flying 1 5 = incremental data 6 = decremental data 7 = random data dataGenSeed seed for random generator ALICE DAQ and ECS manual Internals of the RORC equipment Table 7.3 Table 7.4 Table 7.5 137 RorcData equipment parameters (RORC internal data generator) Parameter Description expectedCdHVersion dummy (not checked) consistencyCheckLevel 0 = no data checks 1 = first and last data word are checked 2 = all data words are checked consistencyCheckPattern 5 = incremental data with event counter 8 = incremental data without event counter RorcData equipment parameters (FEIC) Parameter Description dataGenMinSize minimum event size in bytes: • fixed: minimum is 64 byte • random: no effect, always 4 bytes dataGenMaxSize maximum event size in bytes: • fixed: the value will be rounded to the next lower power of 2, maximum is 1073741824 bytes • random: the value will be rounded to the next lower power of 2, maximum is 1073741824 bytes dataGenInitWord not in use since the first incremental data word is always 0 (any value can be chosen) dataGenPatternNo 1 = external pattern generator 2 = alternating pattern 3 = flying 0 4 = flying 1 5 = incremental data 6 = decremental data dataGenSeed seed for random generator expectedCdHVersion dummy (not checked) consistencyCheckLevel 0 = no data checks 1 = first and last data word are checked 2 = all data words are checked consistencyCheckPattern 5 = incremental data with event counter 8 = incremental data without event counter RorcData equipment parameters (detector electronics) Parameter Description dataGenMinSize not in use (any value can be chosen) dataGenMaxSize not in use (any value can be chosen) dataGenInitWord not in use (any value can be chosen) dataGenPatternNo not in use (any value can be chosen) ALICE DAQ and ECS manual The RORC readout software 138 Table 7.5 RorcData equipment parameters (detector electronics) Parameter Description dataGenSeed not in use (any value can be chosen) expectedCdHVersion version number of the CDH (current version is 1) consistencyCheckLevel 0 = no data checks (recommended) 1 = first and last data word are checked 2 = all data words are checked consistencyCheckPattern 5 = incremental data with event counter 8 = incremental data without event counter There are several checks implemented in the RORC equipment software that are always applied, for example the length and status of the each delivered data page has to be correct (see Section 7.2). These checks are not related to the payload and if no other checks are desired, the parameter consistencyCheckLevel must be 0. However, consistency checks may be applied on payloads having a specific test pattern, if the parameter consistencyCheckLevel is set to the value 1 or 2: 7.2.4.5 • If it is set to 1, the first and last data word of each page are checked against the pattern given by parameter consistencyCheckPattern. A check of the first data word (event counter) against the DAQ event counter is optional. • If it is set to 2, all data words of each page are checked against the pattern given by the parameter consistencyCheckPattern. A check of the first data word (event counter) against the DAQ event counter is optional. Configuring the RorcTrigger equipment There is only one parameter for the equipment RorcTrigger, called EvInterval. It allows to specify an additional delay interval in microseconds, which can be useful for testing purposes. The default value is 0, as shown in Listing 7.5 (lines 32-33). 7.2.4.6 Configuring the RorcSplitter equipment A dual channel D-RORC can be used in “split mode”, where the data arriving over the incoming channel is bit-by-bit copied to the other outgoing channel. The parameters of the RorcSplitter equipment identify the outgoing channel (number 0 or 1) and define how the data flow is handled. After a RORC reset, the split mode is disabled. Table 7.6 describes the RorcSplitter parameters. Table 7.6 RorcSplitter equipment parameters Parameter Description rorcSerialNb serial number of the RORC rorcChannelNb 0 or 1 defining the outgoing channel ALICE DAQ and ECS manual Internals of the RORC equipment Table 7.6 139 RorcSplitter equipment parameters Parameter Description rorcFlowControl 0 = the flow control from the receiving side on the outgoing channel is ignored 1 = the flow control from the receiving side on the outgoing channel is taken into account ctrlPtr for internal use (any value can be chosen) 7.2.5 Data flow for multiple RORC devices One LDC can host more than one RORC device with one or two channels, in which case the fragments from each channel need to be built together to constitute the sub-event. Figure 7.4 shows the logical view for three devices to better understand the asynchronous data flow with multiple RORCs in one LDC. After initializing all elements, each AsynchReadRorcData() equipment routine keeps one RORC channel going (Fill and Scan activity). If a complete fragment has arrived, this routine puts the constructed FragmentVector, which points to the attached data pages, into the specific FragmentReadyFifo. There is one of these FIFOs for each RORC device in an LDC. If none of them is empty, in which case equipment routine EventArrivedRorcTrigger() returns TRUE, one entry is taken out from each FragmentReadyFifo and is assembled ina sub-event via the ReadEventRorcData() routine. The result of this process is the 1st level vectors of this sub-event. The bank id and offset of this 1st level vector is put into the readoutReadyFifo. If the common run parameter Common Data Header Present flag is set, there are additional consistency checks to verify whether the assembled fragments belong to the same particles collision by analyzing their CDHs (see Section 3.9). ALICE DAQ and ECS manual The RORC readout software 140 RORC 3 FragmentReadyFifo 3 RORC 2 FragmentReadyFifo 2 FragmentReadyFifo 1 RORC 1 FragmentVector readoutReadyFifo data pages 1st level vector 2nd level vectors data pages Figure 7.4 The data flow for an LDC with 3 RORC devices 7.2.6 Pseudo code of the RORC equipment routines The following section presents the pseudo code of the routines ArmRorcData(), AsynchReadRorcData(), ReadEventRorcData(), DisArmRorcData(), and for handling FIFOs by a single process. The actual code can be found in the ${DATE_ROOT}/readList/equipmentList_DDL.c file. Listing 7.1 shows the pseudo code of the routine ArmRorcData(). It can be divided in 3 parts. In the first part (line 1), the validity of the equipment parameters (seeSection 7.2.4) is checked. In the second part, the data structures to handle one RORC device are allocated (lines 2-4) and initialized (lines 5-10). As shown in Figure 7.2, these are the rorcReadyFifo, the rorcPageOffsetFifo, and the fragmentReadyFifo. They are allocated from the banks readoutData and readoutFirstLevel. The indices of these FIFOs are all set to 0 and the flags to FALSE. At this point of time, the contents of these data structures can be ignored. In the third part, the RORC device is initialized (line 11) by calling the rorc library functions rorcFind(), rorcOpen(), rorcReset(), rorcArmDDL(), rorcStartTrigger(), rorcStartDataReceiver(), whose synopsis is given on the Web site at http://cern.ch/ddl. In the function rorcStartDataReceiver(), the physical address of the rorcReadyFifo is given to the RORC. The filling with free data pages of the RORC is done by the routine AsynchReadRorcData(). ALICE DAQ and ECS manual Internals of the RORC equipment Listing 7.1 141 Pseudo code of equipment routine ArmRorcData() 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: check the equipment parameters allocate a block for rorcReadyFifo from readoutData allocate a block for rorcPageOffsetFifo from readoutFirstLevel allocate a block for fragmentReadyFifo from readoutFirstLevel nextPageInIdx = 0 nextPageOutIdx = 0 rorcReadyFifoFull = FALSE nextFragmentInIdx = 0 nextFragmentOutIdx = 0 fragmentFifoFull = FALSE call functions rorcFind(), rorcOpen(), rorcReset(), rorcArmDDL(), rorcStartTrigger(), rorcStartDataReceiver() Listing 7.2 shows the pseudo code of the routine AsynchReadRorcData(). Each time this routine is entered, it tries to completely fill the RORC with free data pages in the fill-loop (lines 1-14). This is important for the initialization, when this routine is called the very first time. There are two conditions to exit the fill-loop, either when the rorcReadyFifo is full with free data pages (lines 2-4) or the allocation of a free data page was not successful (lines 11-13). In any of these cases the execution can be continued, since allocations will be tried again in the next call. If there are no problems with the allocation of a new data page (line 5), the corresponding status field in the rorcReadyFifo is set to -1 (line 7), its bank offset is stored in the rorcPageOffsetFifo (line 8), the data page is communicated to the RORC (line 9) via the rorc library routine rorcPushFreeFifo(), and the index nextPageInIdx is increased (line 10). After to the fill-loop, the scan-loop (lines 15-55) checks the entries in the rorcReadyFifo to find out if the RORC has transferred data pages. The scan always starts at the index nextPageOutIdx (line 19) and stops when a status field with -1 is hit (lines 20-22). Since at each advancement of the index nextPageOutIdx (line 54) the former status field is set to -1 (lines 53), it is assured that the scan-loop cannot be executed forever. In the special case that the rorcReadyFifo is empty (lines 16-18), the scan-loop exits. If the error bit is not set in the status field (lines 23-25), the raw data has been written into this page by the RORC device. This page has to be added to the current fragmentVector, which will be allocated (line 27) if this is the beginning of a new fragment (line 26). The handling of a fragmentVector is assisted by the index nextPageInFragmentIdx to know where the next page entry will be put, and by a variable fragmentVectorDataSize to count the total number of bytes of this fragment. Both are initialized to 0 (lines 28-29) when a new fragmentVector is allocated. A written data page is put into the current fragmentVector by filling its fields (lines 31-35), by advancing the index nextPageInFragmentIdx (line 36), and by increasing fragmentVectorDataSize (line 37). If the status field indicates a DTSTW that terminates a DDL block (line 38), then only its length is checked. If the status field indicates a DTSTW that terminates a fragment (line 41), then some additional work is done. First the current fragmentVector and the last data page of this fragment are resized (lines 42-43), since they might be larger as needed. Then an entry is made for the current fragmentVector into the fragmentReadyFifo by filling the fields (lines 44-49) and by advancing the index nextFragmentInIdx (line 51). After exiting the scan-loop, the fragmentReadyFifo is checked (line 56) for emptiness in which case the “trigger arrived” condition is FALSE (line 57). The equipment RorcTrigger holds the counterparts in the routine AsynchReadRorcTrigger() to initialize this condition to TRUE, and in the routine EventArrivedRorcTrigger() to signal this condition. ALICE DAQ and ECS manual 142 The RORC readout software Listing 7.3 shows the pseudo code of the routine ReadEventRorcData(). It takes out the entry (pointer to a fragment) from the FragmentReadyFifo to which the index nextFragmentOutIdx is pointing (lines 1), and it uses this fragment to fill the equipment header fields (lines 2-7) and the equipment vector fields (lines 8-12). These copying operations construct one equipment entry in the 1st level vector of the sub-event. If the common run parameter Common Data Header Present flag is set, then the CDH (see Section 7.2.5) of the fragment is processed (lines 13-24), in particular the eventId field of the base event header is filled. If the CDH processing is switched off, then the software counter Number of triggers is used for the event identification. Listing 7.4 shows the pseudo code of the routine DisArmRorcData(). First the RORC is stopped by calling the rorc library routines rorcStopTrigger(), rorcStopDataReceiver() and rorcClose(), whose synopsis is given on the Web site http://cern.ch/ddl. Then all the data structures are de-allocated in the following order: pages in the rorcReadyFifo, pages in the FragmentVector(s), the FragmentVector(s), the FragmentReadyFifo, the rorcPageOffsetFifo, and the rorcReadyFifo. Finally Listing 7.5 shows the pseudo code to handle a generic FIFO if only one process is using it. Assuming the entries at index [0,...,maxIdx-1], it requires two indices nextInIdx and nextOutIdx and a flag fifoFull to implement the initalize/put/get primitives. This pseudo code applies to rorcReadyFifo and fragmentReadyFifo. ALICE DAQ and ECS manual Internals of the RORC equipment Listing 7.2 Pseudo code of equipment routine AsynchReadRorcData() 1: begin of fill-loop 2: if( rorcReadyFifo is full ) 3: break fill-loop 4: endif 5: allocate one data page from readoutData 6: if( allocation successful ) 7: set status field to -1 in rorcReadyFifo[nextPageInIdx] 8: put bank offset in rorcPageOffsetFifo[nextPageInIdx] 9: call function rorcPushFreeFifo() 10: advance index nextPageInIdx 11: else 12: break fill-loop 13: endif 14: end of fill-loop 15: begin of scan-loop 16: if( rorcReadyFifo is empty ) 17: break scan-loop 18: endif 19: status = status field in rorcReadyFifo[nextPageOutIdx] 20: if( status == -1 ) 21: break scan-loop 22: endif 23: if( status contains error bit ) 24: report error and exit 25: endif 26: if( a new fragment ) 27: allocate a fragmentVector from readoutSecondLevel 28: nextPageInFragmentIdx = 0 29: fragmentVectorDataSize = 0 30: endif 31: fill the fragmentVector[nextPageInFragmentIdx] 32: - eventVectorBankId field = readoutDataBank 33: - eventVectorPointsToVector field = FALSE 34: - eventVectorSize field from length field of rorcReadyFifo[nextPageOutIdx] 35: - eventVectorStartOffset field from rorcPageOffsetFifo[nextPageOutIdx] 36: advance index nextPageInFragmentIdx 37: increase fragmentVectorDataSize by the eventVectorSize field 38: if( status is a DTSTW terminating a DDL block) 39: check the length of the DDL block 40: endif 41: if( status is a DTSTW terminating a fragment) 42: resize the fragmentVector 43: resize the data page 44: put an entry into fragmentReadyFifo[nextFragmentInIdx] 45: - bankId field = readoutSecondLevelBank 46: - bankOffset field from the fragmentVector 47: - nbOfPagesInFragment field = nextPageInFragmentIdx 48: - fragmentDateSize field = fragmentVectorDataSize 49: - rorcStatus field from the status field of rorcReadyFifo[nextPageOutIdx] 50: update the DDL monitoring fields 51: advance index nextFragmentInIdx 52: endif 53: set status field to -1 in rorcReadyFifo[nextPageOutIdx] 54: advance index nextPageOutIdx 55: end of scan-loop 56: if( fragmentReadyFifo is empty ) 57: allFragmentsReadyFlag = FALSE 58: endif ALICE DAQ and ECS manual 143 The RORC readout software 144 Listing 7.3 Pseudo code of equipment routine ReadEventRorcData() 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: Listing 7.4 get fragment from FragmentReadyFifo[nextFragmentOutIdx] fill the equipment header - equipmentSize from the fragmentDataSize field - equipmentType from the equipment parameter - equipmentId from the equipment parameter - equipmentTypeAttribute from the rorcStatus field - equipmentBasicElementSize in bytes fill the equipment vector - eventVectorBankId from the bankId field - eventVectorPointsToVector = TRUE - eventVectorSize from the nbOfPagesInFragment field - eventVectorStartOffset from the bankOffset field if( CDH processing ) - version number - MBZ field - block length - status&error bits - L1 trigger message - event id - mini event id - block attributes - trigger classes - participating subdetectors - ROI else - set the event id from software counter endif Pseudo code of equipment routine DisArmRorcData() 1: stop the RORC by calling functions rorcStopTrigger(), rorcStopDataReceiver(), rorcClose() 2: deallocate all data structures 3: - data pages in the rorcReadyFifo 4: - data pages in the FragmentVector(s) 5: - FragmentVector(s) 6: - FragmentReadyFifo 7: - rorcPageOffsetFifo 8: - rorcReadyFifo ALICE DAQ and ECS manual Introduction to the UDP equipment Listing 7.5 145 Pseudo code for handling a FIFO for a single process 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: // initializing the FIFO nextInIdx = 0 nextOutIdx = 0 fifoFull = FALSE // putting an element into the FIFO if( fifoFull ) error “FIFO is full” else put FIFO element at index nextInIdx nextInIdx = nextInIdx + 1 MOD maxIdx if( nextInIdx == nextOutIdx ) fifoFull = TRUE endif endif // getting an element from the FIFO if( nextInIdx == nextOutIdx && NOT fifoFull ) error “FIFO is empty” else get FIFO element at index nextOutIdx nextOutIdx = nextOutIdx + 1 MOD maxIdx fifoFull = FALSE endif 7.3 Introduction to the UDP equipment The Ethernet socket has been added in DATE as an alternative data source. The UDP equipment reads data coming from the Ethernet port of a PC using the UDP protocol. The readout UDP consists of one Ethernet port used by the front-end electronics to send data, a second port used by readout to receive data and one Ethernet cable that connects the two ports. It can be a copper or an optical fiber cable. Depending on the hardware used it is possible to obtain a data throughput from 1 Gb/s up to 10 Gb/s. Depending on the acquisition needs and on the number of available Ethernet sockets, a PC can be equipped with several UDP equipments (up to 3 Ethernet ports in one PC have been tested so far). DATE provides all the necessary readout software to operate the Ethernet port on a PC running Linux via the driver of the network card. The following sections concentrate on the equipment software for the UPD readout. 7.4 Internals of the UDP equipment The goal of the UDP equipment is to read data from one or more LDC Ethernet ports. The front end electronics send data using the UDP protocol, packing events in frames of a maximum size of 9KB. The UDP readout software has to be structured in equipment routines as explained in Chapter 6. ALICE DAQ and ECS manual The RORC readout software 146 7.4.1 Data transfer mechanism of the UDP equipment The mechanism to transfer data from an Ethernet socket to the memory of the PC has been inherited from the RORC algorithm and requires the following activities: 1. Read: the process reads data, if any, from the UDP receiver buffer. A counter is increased every time a packet has been read by this process. When the counter reaches a value equal to the maximum number of packets that the UDP receiver buffer can accept, the process checks if the buffer is empty. If the buffer is empty it sends a word to the front end electronics asking for more data (see Section 7.4.2). 2. Transfer: the process transfers data from the Ethernet socket to the data page if the rorcSimulatorFreePages is bigger than 0. This can only take place if there is data arriving from the socket and if the rorcReadyFifo is not full. When the data transfer is completed, the process fills the rorcReadyFifo with information about the transfer. The rorcReadyFifo is located in the memory of the PC and has 128 entries. Each of these entries consists of two fields: the length (32 bits) in words of the transferred data, and the transfer status (32 bits). The status field can be either a DTSTW (Data Transmission Status Word) or 0 if more pages are about to follow. A DTSTW marks the end of a sub event. Whenever a free data page with a particular index is used by the process during a fill activity, the status field of this indexed rorcReadyFifo entry has to be initialized to -1. 7.4.2 The back-pressure algorithm UDP uses a simple transmission model without explicit hand-shaking dialogues to guarantee reliability, ordering and data integrity. Error checking and correction must be performed in the application. The UDP readout equipment implements a software back-pressure to avoid an overflow of the socket receiving buffer. Figure 7.5 shows the behavior of the back-pressure algorithm. The board will send data to the DAQ system at the maximum speed until a fixed number of packet have been sent. This number of packets can be calculated using the following formula: fixed number of packets = SOCKET RECV BUF SIZE / MAX UDP PACKET SIZE Once the number of packets sent reaches the number of packets expected, the detector electronics enters in an idle loop waiting for a specific word coming from the readout software. Once received, the board continues sending data stored in its buffer, if any is present. ALICE DAQ and ECS manual Internals of the UDP equipment Figure 7.5 147 The back-pressure algorithm. 7.4.3 Equipments to handle the Ethernet port The UDP readout software is implemented by two equipment types in the UDP equipment suite which is provided in the package readList (see Chapter 6): • RorcDataUDP is responsible for initializing the socket and to handle the data packets from the Ethernet port. One such equipment needs to be instantiated for each Ethernet socket in an LDC. It has the attribute GENDATA and can be configured by several parameters (see Section 7.2.4.1). • RorcTrigger is responsible for indicating the availability of a sub-event, where each port contributes a fragment. Exactly one such equipment for each LDC needs to be instantiated. It has the attribute TRIGGER and can be configured by one parameter (see Section 7.2.4.2). The equipment routines are participating in the construction of a sub-event in paged mode (see Chapter 3), as shown in Figure 7.3. The sub-event is described by a 1st level vector, which is composed of the base header and three equipments. Each of the equipments is represented by an equipment header and an equipment vector. The equipment vector of the first equipment points via a pair to a 2nd level vector, which is composed of three payload vectors in sequence. Each vector points again via a pair to one data page. The equipment vector of the second equipment points to a 2nd level vector ALICE DAQ and ECS manual The RORC readout software 148 with two payload vectors. The equipment vector of the third equipment directly points to one data page via a pair . As an option, an equipment vector may always point to a 2nd level vector, even if it contains only one payload vector. 7.4.3.1 Equipment RorcDataUDP The equipment routines of RorcDataUDP for reading data from one port are the following: 1. ArmRorcDataUDP(): it checks equipment parameters and logs a message. It allocates and initializes the rorcPageOffsetFifo, the rorcReadyFifo, and the FragmentReadyFifo. It opens the socket connected to the Ethernet port. 2. AsynchReadRorcDataUDP(): it reads data stored in the socket receiving buffer and copies them into the memory of the PC. 3. EventArrivedRorcDataUDP(): this routine is empty. 4. ReadEventRorcDataUDP(): it takes out a fragment from the FragmentReadyFifo and uses it to fill the equipment header and equipment vector of the 1st level vector. 5. DisArmRorcDataUDP(): it closes the socket and de-allocates all memory blocks. 7.4.3.2 Equipment RorcTriggerUDP The equipment routines of RorcTriggerUDP are used for triggering the Ethernet port. The global flag allFragmentsReadyFlag is used as trigger mechanism. This flag is set to TRUE at the beginning of each iteration of the inner data-taking loop, and set to FALSE if FragmentReadyFifo is empty. Hence, if there is at least one fragment in each FragmentReadyFifo, the value of this flag remains TRUE (“trigger arrived”). The routines are the following: 1. ArmRorcTriggerUDP(): it checks the existence of memory banks, checks if the rcShm flag Paged data flag is set, and logs a message. 2. AsynchReadRorcTriggerUDP(): it sets the allFragmentsReadyFlag to TRUE. 3. EventArrivedRorcTriggerUDP(): it returns the value of allFragmentsReadyFlag. 4. ReadEventRorcTriggerUDP(): it initializes in the base event header the eventId field (needed for CDH processing) and the eventTriggerPattern field. 5. DisArmRorcTriggerUDP(): this routine is empty. 7.4.4 Data flow for multiple UDP equipments One LDC can host more than one Ethernet port, in which case the fragments from each channel need to be built together to constitute the sub-event. To understand the asynchronous data flow, see Figure 7.4 showing the logical view for three devices. ALICE DAQ and ECS manual The trigger system 8 In the ALICE DAQ system, the detector readout is based on the DDL or the Ethernet/UDP link. The trigger mainly interacts with the detectors, while DATE accepts a continuous flow of data. The DATE software is self-triggered by the availability of complete sub-events in the LDC memory. This chapter discusses the trigger requirements of DATE and gives some indications on how to set up the trigger system. ALICE DAQ and ECS manual 8.1 The trigger system . . . . . . . . . . . . . . . . . . . . . . . . 150 8.2 LDC synchronization via the equipments . . . . . . . . . . . 151 The trigger system 150 8.1 The trigger system The ALICE trigger is designed for two different types of beams: Pb-Pb beams with 125 ns bunch crossings and pp beams with 25 ns bunch crossings. The trigger system identifies the events that are supposedly worth to be read out and activates their readout. Triggering the data–acquisition system is a complex operation that involves a variety of actions, such as sending signals to each detector with the proper timing, activate the readout processes and distributing some information about the event (e.g. event identification). In ALICE, different types of triggers are generated, involving different sets of LDCs. DATE is made to cope with this set of requirements. 8.1.1 The Central Trigger Processor (CTP) The general architecture of the ALICE Trigger is shown in Figure 8.1. The Central Trigger Processor (CTP) receives the input from the trigger detectors and the LHC clock from the TTC Machine Interface (TTCmi). For every bunch crossing, and according to the busy status of all the detectors, the CTP produces trigger decisions which are transmitted to every detector via its own Locat Trigger Unit (LTU). The LTU converts these decisions into messages which are distributed to the detector electronics via the TTC broadcast system thanks to the TTCvi and TTCex modules. More information about the ALICE Trigger and the TTC system can be found respectively in Ref. [11] and [12]. Orbit, bunch crossing LHC RF clock TTCmi L0 inputs L1 inputs DAQ DRORC CTP L2 inputs DDL Detector Electronics BUSY LTU TTCrx LDC DRORC DDL TTCvi Detector Electronics TTCrx TTCex To HLT Figure 8.1 ALICE Trigger. The information transmitted by the TTC messages include trigger information (trigger class for physics triggers and list of detector for software triggers), and a unique event identification (orbit number and bunch crossing). ALICE DAQ and ECS manual LDC synchronization via the equipments 151 In DATE, the major requirement for the trigger system is to inform the LDCs of the availability of the next event, in such a way that they can collect the sub-events in a synchronous way. The synchronization of the LDCs is implemented in DATE by a mechanism that recognizes the events, based on the readout program (see Chapter 6). The processing of the readout program consists of a series of operations made in a tight loop. One of them polls the EventArrived routine, which provides the event synchronization. The EventArrived strongly depends on the synchronization method adopted. The synchronization of the LDCs is achieved by a data–driven mechanism (see Section 8.2). The DDL injects data in the LDC memory in an autonomous way. The DDL data structure keeps the knowledge of the original blocks generated by the detector and marks the boundary of them. The arrival of a new block is notified to the readout software and is assumed to be a new event. Since the LDCs work independently (there is no communication between them), it is important that they receive an ordered sequence of triggers. Keeping the order of the sub-events collected by each LDC is essential for the event builders which subsequently assemble the sub-events into a full event. Independent verification mechanisms are put in place to catch the occurrence of an LDC losing a sub-event. One of these mechanisms is based on the unique event identifier (orbit number and bunch crossing) that is transmitted by the trigger system. This identifier must be transmitted to an electronic module in order to be included in the sub-events by the data source. DATE uses the event identifier located in the sub-event header to perform various consistency checks, which are made both by readout in the LDCs and by the eventBuilder in the GDCs. 8.2 LDC synchronization via the equipments Two types of readout links are supported by DATE: the DDL and the Ethernet/UDP links. The DDLs are read by the LDCs via the Read-Out Receiver Card (RORC). Two types of RORCs are presently supported by the ALICE DAQ: • The dual channel D-RORC interfaces two DDL channels with embedded DIUs to a 64 bit/64 MHz PCI bus. • The dual channel D-RORC interfaces two DDL channels with embedded DIUs to a PCI Express (PCIe) bus. The Ethernet/UDP links are read by using the Gigabit Ethernet or 10 Gigabit Ethernet interface available on the PC Motherboard or added as a PCI or PCIe add-on board. In this section these different versions of RORC cards and of Ethernet interfaces will all be commonly referred as equipments, since they are exactly identical from the DATE/trigger interface point of view. ALICE DAQ and ECS manual 152 The trigger system The equipmentList software contains several equipments which handle the RORC hardware and the Ethernet link. Two of these equipments are relevant for the trigger handling. Their behaviour is briefly described here. More details on the operation of them can be found in Chapter 6. The equipment injects a continuous stream of event fragments into the LDC memory in an autonomous way. There may be several equipments in an LDC; each of them owns an instantiation of the RorcData equipment for the RORC or RorcDataUDP equipment for the Ethernet/UDP equipment, which keeps a list of the fragments arrived. The RorcData equipment keeps the list of available data fragments (FragmentReadyFifo) up to date . It also updates the global flag allFragmentsReadyFlag by setting it to FALSE if its own FragmentReadyFifo is empty. The RorcTrigger equipment is unique in each LDC. It is used for triggering the readout of one or more RORC or Ethernet/UDPchannel(s). The global flag allFragmentsReadyFlag is used as trigger mechanism. This flag is set to TRUE at the beginning of each iteration of the inner data- taking-loop, and set to FALSE in case of an any empty FragmentReadyFifo. Hence, if there is at least one fragment in each FragmentReadyFifo, the value of this flag remains TRUE (“trigger arrived”). The readout software is informed when a sub-event is complete and ready to be acquired. The sub-event is ready for readout when all the RORCs or Ethernet/UDPchannels have received all the fragments belonging to an event and the fragments have been joined into a sub-event. ALICE DAQ and ECS manual COLE COnfigurable LDC Emulator 9 C OLE (Configurable LDC Emulator) is designed to create ALICE-like events according to a simple user-defined configuration file. It replaces the standard DATE readList and follows the directions given in the DATE configuration files and databases. ALICE DAQ and ECS manual 9.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 9.2 Delayed mode vs. free-running mode . . . . . . . . . . . . . 155 9.3 System requirements and configuration. . . . . . . . . . . . 155 9.4 COLE as an Equipment . . . . . . . . . . . . . . . . . . . . . 157 9.5 Basic Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 9.6 The colecheck utility . . . . . . . . . . . . . . . . . . . . . . . 158 COLE - COnfigurable LDC Emulator 154 9.1 Introduction COLE (COnfigurable LDC Emulator) is designed to create ALICE-like events according to a simple user-defined configuration file. It is a replacement for the standard DATE readList module that is compiled and linked with the readout program. The aims of COLE are to provide complete and flexible control over the structure of the input stream to DATE and the functionality to reconfigure the readout program without the need for re-compilation. The COLE configuration file is used to define details of the events to be created by the readout program, event by event. COLE features include: 1. configurable/flexible: • no re-compilation required to reconfigure DATE stream. • fully scalable. • a single configuration file controls all of COLE. • integrated with ALICE DATE databases used for the DATE configuration (detector, trigger, event building policies). • support for streamlined and paged event mode. 2. pre-defined event stream: • different event types. • global synchronized event ID. • pre-set trigger/detector patterns. 3. simulated ALICE trigger classes: • support for partial event building and use of trigger/detector patterns to create events. • non-global events (to perform partial event building). • use of pre-configured trigger patterns to create events. 4. simulate ALICE raw data: • configurable on an event-by-event, equipment-by-equipment and host-by-host basis. 5. simulate trigger and detector delays: • possible to simulate delays on a detector-by-detector and trigger-by-trigger basis. 6. simulate burst mode structure: • test-beam like data traffic. • creates bursts of a given number of events defined in common run parameters. • control of burst number and number in burst. 7. DATE Equipment: • COLE is fully implemented as a standard DATE equipment. ALICE DAQ and ECS manual Delayed mode vs. free-running mode 155 9.2 Delayed mode vs. free-running mode As COLE is mainly used to emulate DAQ systems, it may suffer from the absence of a real triggering system. LDCs may go out of synch and find themselves hundreds of events away, as a function of the relative loads on individual machines and on the event building network. For this reason the delayed mode was introduced. When running in delayed mode, COLE waits for a given delay between events. Synchronization between LDCs is done at start-of-run. Jitters may still occur due to the absence of a trigger system as a function of the actual time of the start-of-run for each LDC and to different load on the LDCs. The LDCs will keep track of the cumulated delays and - if needed - will try to re-synchronize whenever possible. A threshold can be set to generate warning messages coming from LDCs that cannot keep up with the requested timing. Delays can be specified on an event-by-event basis and on a detector-by-detector basis. All delays are given in microseconds. 9.3 System requirements and configuration DATE must be installed and available on all hosts that are to use COLE. The data-acquisition system must be correctly configured. In addition the following files must be created: a. event payloads: stored in ${DATE_SITE}/${HOSTNAME} they contain the payload associated to all the events created by COLE during a single run. One LDC can have multiple payloads associated to multiple events. More events can share if needed the same payload file. More LDCs may share the same payload files by means of Unix symbolic links. b. COLE configuration: specified by the symbol ${DATE_COLE_CONFIG} it allows the control of the stream created by COLE. It consists of an ASCII file divided into three sections: • Options: global options used for all events. • Events: the stream of events created by the DATE system as a whole. • Detectors: detector-specific parameters. An example of COLE configuration is given in Listing 9.1. Listing 9.1 Example of COLE configuration: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: ALICE DAQ and ECS manual >Options_section UseDelay 1 UseRandomEvent 1 12321 Threshold 1000 >Events_section trdDetector CAL * 10+4 raw2 40 * PHYS trdTrigger 5 raw1 40 * PHYS centralTrigger 1+2+3 raw3 40 >Detectors_section trdDetector 20 12 tpcDetector 30 13 itsDetector 40 COLE - COnfigurable LDC Emulator 156 The available options are: • UseDelay: delayed emulation mode active (1) or inactive (0). • UseRandomEvent: directs COLE to create events according to the given list (0, default) or to create pseudo-random events (1): in the second case an optional initial seed for the pseudo-random number generator can be provided (default: 12345). • Threshold: delay (in microseconds) used to issue warning messages when the LDC is unable to keep the requested delay between events (delayed emulation mode only). The events section defines the events as they are created by the DATE system as a whole. Events are specified as: • detector pattern: the detector(s) that participate to the event (“*”: no detector pattern). This list could also contain names of individual LDCs. Mixing detector(s) and LDC(s) is not allowed. Multiple names must be separated by “+”. • event type (SOD for start of data events, EOD for end of data events, PHYS for physics events, CAL for calibration events, SST for system software trigger events or DST for detector software trigger events). • trigger pattern associated to the event (“*”: no trigger pattern). Multiple trigger classes (separated by “+”) can be specified. • attributes set in the event (“*”: no attributes set). Attributes are specified by value. Multiple attributes can be given separated by “+”. • payload of the event (path relative to ${DATE_SITE}/${DATE_HOSTNAME}) The filename is combined with the equipmentId to create a unique, per-equipment filename. If the filename has no extension, COLE will append _equipmentId to it (e.g. for equipment 123, coleData becomes coleData_123). If the filename has an extension, COLE will insert _equipmentId between the base name and the extension (e.g. for equipment 123, coleData.raw becomes coleData_123.raw). • delay (in microseconds) for the generation of the event (0: no delay). If two values are given, a pseudo-random number will be extracted within the given range. This number will be the same for all the LDCs belonging to the same detector. The LDCs participating in a data-acquisition system will loop on the given list. For each line (in sequence or following a pseudo-random sequence) encountered, they will wait for the given delay. When they are supposed to contribute to the event, they will first wait for the detector-specific delay and then create their sub-event with the given parameters and payload. Events can be driven on a detector basis or on a trigger basis. In the first case, a list of detectors must be given and the trigger pattern must contain “*”. In the second case the detector pattern must contain “*”. In both cases, only the LDCs supposed to create events will do so. The detector section specifies for each detector the delay (or the range of delays) to be applied to all events coming from LDCs from this detector. One number corresponds to a fixed delay. Two numbers will instruct COLE to draw a pseudo-random number within the given range and use that as a delay. ALICE DAQ and ECS manual COLE as an Equipment 157 COLE can simulate a burst structure. This is done when the common run parameter burstPresentFlag is set. A burst will be closed when more than simBurstLength events will have been created at the level of the full DAQ system (less events may have been created at the individual LDC level according to the triggering criteria). 9.4 COLE as an Equipment COLE is fully implemented as an equipment and can be used in both streamlined and paged event mode. The equipment configuration file must be correctly defined to use COLE as an equipment. An example file is shown in Listing 9.2. Cole will fill the equipmentHeader structure as any other standard DATE equipment (basic element size will be set to 4). Listing 9.2 Example of COLE equipment configuration: 1: 2: 3: 4: 5: 6: 7: 8: >EQTYPES >Cole 1 TRIGGER GENDATA EqId %hd >LDCS >host1 + Cole firstCole 1 >host2 + Cole secondCole 2 9.5 Basic Design COLE will use and extend the basic structure of the four functions required for each DATE equipment. 9.5.1 ArmHw() Called at start of run to perform system initialization such as loading the detectors, triggers and roles databases. ArmHw will also parse the COLE configuration file and load all the payloads. 9.5.2 EventArrived() Simulates the trigger delay used for the delayed emulation mode. A simple state machine will implement the emulation of the trigger and detector delays. ALICE DAQ and ECS manual COLE - COnfigurable LDC Emulator 158 9.5.3 ReadEvent() Called in the main event loop after the arrival of a trigger to perform the readout of the hardware. It fills in the event header information as defined in the ${DATE_COLE_CONFIG} configuration file. The following fields are handled: • nbInRun - event number within run. This is the unique number identifying the event and must be set. This value is used by the event builder and must increase for each event. • burstNb - burst number; initialized to 0 by readout. • nbInBurst - event number within the burst (starts at 0 at each start-of-burst event). • event type - physics, calibration, start-of-burst, end-of-burst. • trigger pattern. • detector pattern. • user attributes. This function also loads the payload of the event. 9.5.4 DisArmHw() Called at each end of run to perform rundown. Also used to unload the databases upon completion and to free the dynamic memory allocated by COLE. 9.6 The colecheck utility The colecheck command line utility is available to validate the structure of a ${DATE_COLE_CONFIG} file and to display a summary of any errors found. Command line syntax: colecheck -n -f [-q] [-h] • : the name of the host you wish to check for in the cole.config file. • : the full path to the COLE configuration file. • [-q]: quiet mode (only check for errors). • [-h]: print usage and exit. colecheck checks the syntax of all events defined in the given COLE configuration file and checks whether the hostname machine contributes to each event. If no hostname is given, colecheck only checks the syntax of the ${DATE_COLE_CONFIG} file. ALICE DAQ and ECS manual Data recording 10 T his chapter describes the data recording process and how the data can be recorded in the LDCs and in the GDCs. It explains also the conventions concerning the filenames of the data streams that can be created using DATE. ALICE DAQ and ECS manual 10.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 10.2 Common data recording procedures. . . . . . . . . . . . . . 160 10.3 Recording from the LDC . . . . . . . . . . . . . . . . . . . . 162 10.4 Recording from the eventBuilder. . . . . . . . . . . . . . . . 163 10.5 Recording with the Multiple Stream Recorder . . . . . . . . 167 Data recording 160 10.1 Introduction Data recording can be done either on LDCs or on GDCs. A common library is used by all DATE actors doing data recording (local or remote), the same features are therefore available throughout the whole DATE system. The basic GDC recording functionality can be enhanced by using a dedicated DATE component – the mStreamRecorder (MSR). This chapter describes the configuration and the behavior of the data recording process, both common and DATE-role specific. 10.2 Common data recording procedures The data recording process uses the recordingDevice runParameter, specified via the runControl. This string can be suffixed by a special character used to define the type of recording channel. The string can specify an arbitrary number of output channels all of the same type (no mixture of different channels is allowed within the same run). These channels are then handled according to the various configuration parameters concerning file size and maximum amount of data to be recorded during the run. It is also possible to include in the file name special characters, to be translated at run-time into machine-specific and run-specific strings. The data generated in an LDC can be recorded: • to a (set of) local disk file(s) (no suffix needed). • to a (set of) local named pipe(s) (suffix: “|”). • by sending them to a (set of) GDC(s) (suffix: “:”). The data generated in a GDC can be recorded: • to a (set of) local disk file(s) (no suffix needed). • to a (set of) local named pipe(s) (suffix: “|”). • using an external recording process (“:“ as recordingDevice). The recorder and the eventBuilder processes can store their events on the local machine (either to a file or to a named pipe). In this case, the full path of the output stream has to be specified in the recordingDevice runParameter, e.g.: /tmp/my_raw_data.dat /tmp/my_pipe| To record on a file, the directory in which the file resides should have write access for “${DATE_USER_ID}”; one can either give write access for this user (or for the whole world) to the directory or set the ownership of the directory. Files are created (and possibly overwritten) according to the status of the data-acquisition system. A list of comma-separated files can be given as recording device, e.g. “/tmp/a,/tmp/b,/tmp/c”. ALICE DAQ and ECS manual Common data recording procedures 161 To record onto a named pipe, this must be created - before starting the run - via the Unix command mknod. The appropriate file protection must be set to allow user “${DATE_USER_ID}” to write into the pipe and to be able to read the pipe via whatever daemon is required. The filename must be suffixed by “|” (removed from the recordingDevice string to derive the real name of the pipe). Special care must be taken in this recording mode as the absence or the unexpected termination of the data consumer may stall the data-taking process. Some mechanism external to DATE must be implemented to avoid this scenario. To record with an external recording process, the eventBuilder must be running in the “online recording” mode. The recording process, launched at start of run, must use the API provided within the DATE eventBuilder package (Section 10.4.2). DATE provides the online recording application MSR, described in Section 10.5. If the DAQ system includes one or more GDCs, the recorder process on the LDC can send the data to them. In this case, the recording device name must be the host name(s) of the event-builder machine(s) separated by “:”, e.g.: GDC01: GDC01:GDC02:GDC03: In the above example, the machine GDC01 receives all events when the first string is used while the machines GDC01, GDC02 and GDC03 receive about 1/3 of the events when the second string is used. For multiple-GDC environments, the algorithm currently implemented in the LDCs sends all the non-physics events to the first GDC of the list (in the example above: GDC01) and then distributes the physics and the calibration events between all the GDCs of the list, using an algorithm based on the event number and on the decisions taken from the EDM (when the EDM is active). It is possible to limit the total amount of information to be recorded in a run by setting the run parameters maxBytes, maxEvents, and maxBursts. It is also possible to limit the maximum file size (excluding the case of a network channel) using the run parameter maxFileSize (in this case, special care should be used when the output device is a named pipe: the consumer must be capable to handle the EOF event correctly). When recording on local file, it is recommended to use a per host, per run file. To allow automatic generation of unique file names based on those parameters, the recording library allows the use of some special characters, namely: • “@” is replaced by the current host name. • “$” is replaced by the current role name. • “!” is replaced by the current role ID. • “#” is replaced by the current run number. For example, using the recordingDevice: /data/run_#.raw the data of the run 1020 is recorded into the file /data/run_1020.raw (assuming there is no limit on the file size). The recording library replaces the first occurrence (left to right) of the special characters with the corresponding run-time value. ALICE DAQ and ECS manual Data recording 162 If there is a limit on the maximum file size, the data will be recorded to a sequence of files. Their filenames will be formed by the addition of the original filename for this run and a sequential number (preserving the file extension, if any). In the example above, the following files are created: /data/run_1020.000.raw /data/run_1020.001.raw /data/run_1020.002.raw The file with sequential number “000” can be reserved (when the appropriate runtime parameter is set) for the data recorded during the start of run phase. In this case, the first file includes the records of the types START_OF_RUN and START_OF_RUN_FILES and it is closed as soon as all the START_OF_RUN record(s) tagged with ATTR_P_END attribute have been recorded. When writing to a set of identical local devices (files, tapes, pipes, etc.), the recording library sends events to the first channel found available (as seen from the Operating System output library). The decision if a channel is busy or not is taken the moment the request to write an event is issued from the data producer (recorder or eventBuilder). The actual destination of a given event is therefore a function of the Operating System and of the device itself and - in general - cannot be predicted beforehand. 10.3 Recording from the LDC The processes that are always running in an LDC during the data taking phase are the readout process (see Chapter 6) for receiving the event fragments from the detector electronics, and the recorder process for moving the assembled sub-events either to local storage devices or to the GDC machines over the event building network. In the following the recorder process will be presented in more details, in particular its recording capabilities. The source code of it is located in the readout package. The recorder process performs the following sequence of operations in the order described below: 1. maps to all memory banks that are configured for this LDC in the banks database (see Chapter 4). 2. saves its own process ID in the shared memory control region, which allows the readout process to suspend and to resume the recorder process. 3. opens file(s) on local storage device(s) or connects to remote GDC machine(s) depending on the LDC run parameter recordingDevice, which is fully explained in Section 10.2: • if the name of the recording device does not terminate with “:”, then the recorder process writes all sub-events to files on the local storage device. This is typically the case when the data-acquisition system is composed by a single LDC without event building. • if the name of the recording device does terminate with “:”, then the recorder process takes it as the name of a remote GDC machine and opens ALICE DAQ and ECS manual Recording from the eventBuilder 163 a TCP/IP socket connection for transmitting the sub-events. In this case the data-acquisition system has event building provided by one or more GDC machines. 4. enters the event loop, in which the event descriptors are taken out from the recorderInput FIFO and each sub-event is either written on a file or is sent over the event building network to a GDC, depending on the recording device. All recording operations are done by calling routines of the high level library from the recordingLib package (see Section 16.4). An event descriptor points either to a streamlined or paged sub-event (see Chapter 3). After a successful recording of a sub-event, the recordingLib routines also take care to update the relevant run-time parameters and to deallocate the associated memory blocks of the sub-event. 5. closes the local file or the socket connection(s), after exiting the event loop. In the event loop the recorder process continuously checks for the arrival of the end of run command. It exits the event loop if one of the following conditions are met: • the maximum number of bytes to be written (given by the LDC run parameter maxBytes) has already been reached. • there have been too many errors in writing the file or in the transfer over the event building network. • the operator asked to stop the run. • a fatal error occurred, which is indicated by a non-zero value of the runEndCondition variable in the shared memory control region. In the first three cases the recorder process tries to write all the pending sub-events (represented by their event descriptors) in the recorderInput FIFO onto the recording device before exiting, whereas in the last case the recorder process does not empty the recorderInput FIFO before exiting. In the simplest case the readout process is the producer of the event descriptors, thus the recorderInput equals the readoutReadyFifo. The recorder process uses the infoLogger package to report and trace error or abnormal conditions, and to trace state changes. The operator can tailor these features to the required needs by setting the value of the LDC run parameter logLevel. Output messages produced by the recorder process are sent to ${DATE_SITE_TMP}/${DATE_ROLENAME}/recorder/recorder.log. The same directory will also store the core files that may be created by the recorder process in case it gets terminated by an unrecoverable signal. The recorder process runs as the user defined in the DAQ configuration database. Directories protections and ownerships must be set accordingly. 10.4 Recording from the eventBuilder Recording on the GDC begins at the level of the eventBuilder process. Two options are available at this stage: direct recording, where the eventBuilder writes the raw events directly to a given local device, or online recording, where the events are transmitted to a further stage, for handling and recording. The two ALICE DAQ and ECS manual Data recording 164 schemes are exclusive on the same GDC while different GDCs can use different schemes and/or different run-time parameters. 10.4.1 Direct recording The eventBuilder can record data directly using the DATE recordingLib package (the same package used by the recorder process on the LDCs). If multiple output streams are specified, the eventBuilder uses the first available recording channel. This channel is ensured to be able to accept a new I/O request (although it is not sure if it is able to complete it). All options made available by the DATE recording library are available on the GDCs. 10.4.2 Online recording The eventBuilder can transfer “ready” events to an optional processing stage. Events are moved using an internal format and can be accessed by another process using a copy-less, memory mapped access scheme. Only one process can attach itself to the output of the eventBuilder during a given run. This process must use the API provided within the DATE eventBuilder package. Online recording can be activated giving “:” as recording device. The resource eventBuilderReadyFifo must be declared for the GDC in the banks database in order to be able to perform online recording. Events are normally transmitted from the eventBuilder to the requester using iovec structures. These structures are standard entities used by Unix I/O libraries and cannot be shared across processes. When the requirement to transfer these vectors to other processes exists, then event descriptors should be used. Event descriptors are process-independent entities and can be transformed at any time to equivalent iovec structures (which are process dependent). Event descriptors are allocated in the process’ local address space and therefore need external mechanisms for their transfer to other processes (e.g. shared message queues or shared memory blocks). An API is available for C and C++. The file ${DATE_EB_DIR}/libDateEb.h should be included (use the “-I ${DATE_EB_DIR}” compiler directive) and the library ${DATE_EB_BIN}/libDateEb.a should be used during the link phase. Please note that several other include files and libraries are needed for a successful compilation: refer to the file ${DATE_EB_DIR}/simpleConsumer.c and the associated rules within ${DATE_EB_DIR}/GNUmakefile for a complete list. struct iovec C Synopsis #include struct iovec { ... } Description Structure used to describe an event created by the eventBuilder. For the actual implementation of the structure, refer to the system include files and/or to the ALICE DAQ and ECS manual Recording from the eventBuilder 165 relative manual pages (e.g. “man readv”). The I/O vector described by this structure has numOfLdcs + 1 entries (where numOfLdcs is the number of LDCs contributing to the event) with entry number 0 being the header of the super event. ebRegister C Synopsis #include “libDateEb.h” int ebRegister( void ) Description Registers the process with the eventBuilder and attaches to its memory banks. Returns TRUE in case of success, FALSE otherwise. ebGetNextEvent C Synopsis #include “libDateEb.h” struct iovec *ebGetNextEvent( void ) Description Gets the next available event from the output queue of the eventBuilder. Returns NULL if the queue is empty, -1 on error, pointer to I/O vector otherwise. ebGetNextEventDescriptor C Synopsis #include “libDateEb.h” int ebGetNextEventDescriptor( void **descriptor, int *descriptorSize ) Description Gets the descriptor to the next available event from the output queue of the eventBuilder. On success, the descriptor parameter is loaded with the address of the result and the descriptorSize parameter contains the size in bytes of the descriptor. The format of the descriptor is not published and may vary between releases of DATE. The event descriptor can be manipulated only using the ebDescriptorToIovec routine and it must be released using the ebReleaseDescriptor() routine. Returns 0 if the queue is empty, -1 on error, integer positive value otherwise. ALICE DAQ and ECS manual Data recording 166 ebReleaseEvent C Synopsis #include “libDateEb.h” int ebReleaseEvent( struct iovec * ) Description Releases the event described by the given I/O vector. The parameter must be an I/O vector returned by a previous call to ebGetNextEvent(). The routine also disposes the input iovec that must not be used after this routine returns. Returns TRUE in case of success, FALSE otherwise. ebReleaseDescriptor C Synopsis #include “libDateEb.h” int ebReleaseDescriptor( void *descriptor ) Description Releases the descriptor of the event pointed by the input parameter, which must have been returned by a previous call to ebGetNextEventDescriptor(). Please note that this call will not release the event itself (for this purpose use the ebReleaseEvent() routine). Returns 0 in case of success, -1 otherwise. ebDescriptorToIovec C Synopsis #include “libDateEb.h” struct iovec *ebDescriptorToIovec( void *descriptor ) Description Converts the input parameter, which must have been returned by a previous call to ebGetNextEventDescriptor(), to a standard iovec structure. The descriptor and the iovec must be disposed using the appropriate routines (ebReleaseDescriptor() and ebReleaseEvent()). Returns 0 in case of success, -1 otherwise. ebEor C Synopsis #include “libDateEb.h” int ebEor( void ) ALICE DAQ and ECS manual Recording with the Multiple Stream Recorder Description Returns 167 Checks for end of run. TRUE if the run is closed and no more data is available from the eventBuilder. ebGetLastError C Synopsis #include “libDateEb.h” const char * const ebGetLastError( void ) Description Gets a string describing the last error condition encountered by the library. Returns Pointer to a read-only string. 10.5 Recording with the Multiple Stream Recorder 10.5.1 Overview The online recording described in the previous section liberates the eventBuilder from the overheads related to physical data recording and lets it free to execute its principal task more effectively. The benefits of this mode can be important when event pre-processing is required before recording, especially on multi-processor/multi-core platforms. Additional gains can be achieved by having several concurrent recording streams: when one stream is busy (e.g., is waiting for a file to open), other streams could carry on. This effect is marginal when one is recording to a fast local file system. However, when one is dealing with a remote mass storage system having a limited throughput per data stream, such as CASTOR [9], use of multiple stream recording becomes essential. The experience of the ALICE data challenges suggests that >2 recording streams per GDC are needed to match CASTOR’s aggregate throughput with the one of DATE. The DATE mStreamRecorder process (MSR) is the default online recording application for the eventBuilder, designed with the above considerations in mind. It enhances the basic GDC recording features by offering: • concurrent asynchronous and individually configurable recording streams. • a possibility to record to CASTOR. • real-time transcoding of raw events into a ROOT [10] tree format compatible with the ALICE offline analysis software, using AliRoot API [16]. MSR can be run either together with the eventBuilder configured for online recording, or as a stand-alone application to “replay” pre-recorded raw data files. It consists of the main steering and dispatching process, disp, and a number of concurrent stream processes running on the same machine (see Figure 10.1). ALICE DAQ and ECS manual Data recording 168 disp is launched at the “Start processes” phase of the run control. Its task is: • to read and interpret the configuration file. • to configure and fork the stream processes. • to read the event descriptors from the output FIFO of the eventBuilder and dispatch them to the streams, via individual FIFOs. • to report the status information to DATE. Event descriptors via SimpleFifo Dispatching algorithm Events via EB Consumer API File destination assigned to stream n n Fatal condition signals Reporting to infoLogger (optional) raw/ROOT transformation mStreamRecorder EB buffer stream Event GDC Builder stream disp … stream Config file 1 2 ………. 1 2 N N Recorder shared memory (internal logging) GDC host Figure 10.1 A schematic block-diagram of the mStreamRecorder. The legend is shown at the top The available dispatching methods distribute the events uniformly between the streams. An option of having dedicated streams with custom filtering (e.g., according to trigger pattern) is reserved but not implemented yet. Each stream is totally independent of other stream processes. Its tasks are: • to receive event descriptors dispatched to it by disp via the individual FIFO. • to construct the iovec pointing to the corresponding event parts (the header and sub-events) in the eventBuilder buffer. • to manage output files on a specified destination. • to write the events, optionally transformed into ROOT structures, to the output. • to handle I/O errors and report its status to disp and DATE. MSR uses the following DATE components: the eventBuilder client API (Section 10.4.2), the simpleFifo (Section 16.3) package and the DATE ALICE DAQ and ECS manual Recording with the Multiple Stream Recorder 169 infoLogger (Chapter 11). The source codes and the related executables are located in ${DATE_MSTREAM_DIR} and ${DATE_MSTREAM_BIN} directories, respectively. The following sub-sections describe how to configure and run MSR. 10.5.2 MSR configuration file 10.5.2.1 Configuration file: naming and handling The configuration of mStreamRecorder is done on a per-partition basis. This allows different partitions to use dedicated set of parameters such as file size or destination path. Two approaches are possible: 1. create a specific configuration for each partition; 2. create a generic template to instantiate for each partition. A specific configuration file has priority over a generic template. Specific configuration files use the file name mStreamRecorder.PART_NAME.config where PART_NAME is: • the name of the partition for standalone runs (e.g. started via DCA); • the name of the partition preceeded by ALL (e.g. ALLPHYSICS_1 for the partition PHYSICS_1) during global runs (e.g. started via PCA). Specific configuration files are used “as is” by mStreamRecorder. A generic template must be saved under the name mStreamRecorder.config.template and it contains the configuration defined below with the inclusion of the following run-time fields: • __WRITE_VOLUME__: this field can be used to select a partition-dependent write volume and it is replaced at run-time by the name of the partition; • __FILE_NAME_ATTR__: this field can be used to create a file name which is partition-dependent and usable by the TDSM for run-time selection purposes. It is replaced at run-time by _nameOfDCA for standalone runs (nameOfDCA contains the name of the DCA, usually the name of the detector) or by the keyword ‘_Technical’ during technical runs (to allow writing the data coming from technical runs into a separate directory, easier to handle from the PDS side). During physics runs, the field is simply removed from the file name. The template approach shall be used for sites running multiple partitions. Test sites should rather use specific creation files which are easier to read and to maintain. 10.5.2.2 Configuration examples A DATE user may wish to record to a variety of output destinations (local file system, CASTOR, remote file server) and use different data formats (raw eventBuilder format, ROOT tree) and/or protocols (local, RFIO, ROOTd). This diversity can be described in a concise and flexible way by the MSR configuration ALICE DAQ and ECS manual Data recording 170 file whose syntax is based on the “name key=value key=value...” paradigm. In simple cases of a uniform configuration for all GDCs, this file may consist of just a few lines, as is illustrated by examples in Listing 10.1 and Listing 10.2. They differ mainly in the definition of the default output stream (default_str): CASTOR files in Listing 10.2 need more attributes to be defined. Other minor differences illustrate the grammatical features which will be explained later. Listing 10.1 A simple configuration with 3 streams per GDC, recording to a local disk 1: >COMMON Nstreams=3 2: >RECORDERS 3: default_rec 4: >OSTREAMS 5: default_str path=/scratch fsize=1024 Listing 10.2 A simple configuration with 3 streams per GDC, recording to CASTOR 1: >COMMON 2: >RECORDERS 3: default_rec Nstreams=3 stream=default_str 4: >OSTREAMS 5: default_str fsize=1024 6: pool=alice_stage stager=stagealice 7: path=/castor/cern.ch/alice/daq_dev/daq_recorder Listing 10.3 shows the configuration for ROOT recording to CASTOR via ROOTd protocol, the same for all GDCs. Listing 10.3 The configuration for ROOT recording to CASTOR 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: # ROOT recording to CASTOR >COMMON timing=1 loglevel=1 Nstreams=3 root=3 >RECORDERS default_rec method=2 stream=default_str fsize=255 !! NB: fsize has no effect here! >OSTREAMS default_str sleep=1 path=/castor/cern.ch/user/d/developer filename=%h_%R_%s_%f_%T.root ! ROOT-style filename mxrecl=0 fsize=1024 pool=default stager=lxs5007 !!!! special CASTOR stager! Finally, a more complicated example in Listing 10.4 shows how to arrange separate configurations for different GDCs and define a variety of streams with different output destinations. It also illustrates most of the syntactic and semantic rules of the MSR configuration language. ALICE DAQ and ECS manual Recording with the Multiple Stream Recorder Listing 10.4 The configuration with special properties for the GDC pcaldXXgdc 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 10.5.2.3 171 !\ A special configuration for pcaldXXgdc Author: DATE developer April 2005 \! # common definitions >COMMON method=1 loglevel=1 sleep=1 # recorder definitions >RECORDERS default_rec method=2 stream=default_str ! single stream pcaldXXgdc fsize=255 Nstreams=4 ! four streams filename=%h_%r_%s_%f.data stream=default_str stream=public \ filename=Test_%r_%f.data stream=test # output stream definitions >OSTREAMS default_str sleep=2 path=/local \ mxrecl=0 ! line-continuation sign “\” is optional pool=public stager=stagepublic fsize=2047 ! fsize is in MB test path=/castor/cern.ch/user/d/developer/test_dir \ =public ! a copy of all attributes of stream public public mxrecl=20000 path=/castor/cern.ch/user/d/developer File names MSR produces per host, per run and per stream output files. The meta-characters “@“ and “#“ can be used in their names, as described in Section 13.2. In addition, MSR interprets the percent sign “%” as a meta-character in all filenames appearing in the configuration file. The letters preceeded by “%” are replaced with the corresponding values at the run time, as follows: • %r – (same effect as “#”) the run number, without leading zeroes. • %h – (same effect as “@”) the short host name, like the one produced by the Linux command “hostname --short”. • %H – the full host name, e.g. pcaldXX.cern.ch, as produced by the Linux command “hostname --long”. • %G – the full GDC name, as specified by the hostname attribute in the role database. • %g – same as “%G”, with the “gdc” suffix stripped off, if present. • %R – the 8-digit run number, with leading zero-padding. • %s – the stream number (the numbering starts with 0). • %f – the sequential file number (the numbering starts with 1). • %F – the 3-digit sequential file number, with leading zero-padding. • %T – the current time stamp, in the form YYYYMMDD_hhmmss. • %S – the stream sequential number. The default output file name is “%h_%R_%s_%f.data“. Unlike with the direct GDC recording, the files with sequential number 0 are not created. The start-of-run events are transferred to the first stream(s) that become available to disp. A user can change the default file name template by using the filename attribute in the MSR configuration file. The name “%h_%R_%s_%f_%T.root“ is strongly ALICE DAQ and ECS manual Data recording 172 recommended for ROOT data files. The directory part of the fully specified file name must be defined by the obligatory path attribute. CASTOR files are distinguished by the “/castor/cern.ch“ prefix in the path name. Note, that ROOT recording is not automatically enabled even if the “.root“ suffix is specified. For that purpose the attribute root should be used. 10.5.2.4 The configuration file syntax: tags and attributes This section presents a formal description of the MSR configuration file syntax and semantics. A MSR configuration file is a free-format plain text file in which the word items (contiguous strings of non-blank text characters, up to 512 characters long) are interpreted as either tags, or attributes. The items are separated by blanks, or tabs, or new lines, or an arbitrary mixture of them. The tags starting with a bracket ”>” are called structural: they identify different levels in the configuration tree. All other tags are just names of objects belonging to those levels. The attributes, distinguished by the equal sign “=“ in the item, have to follow the tag which they qualify. The part to the left of the equal sign is the attribute key, the part to right is the attribute value, internally stored as a text string. The combination of a tag and its attributes will be referred to as a tag definition. Tags without attributes are syntactically correct. The attributes with the empty key, such as “= something” have a special meaning: they are replaced with a copy of all attributes belonging to another tag given as the value. For example, in Listing 10.4 the stream “test“ has an attribute “= public”, so all attributes of the stream “public“ will be appended to “test”. As the result, the stream “test“ will receive the new attribute mxrecl, as well as the second path attribute (which will be ignored at the semantic level because of rules of precedence, described later). The layout of configuration files is not fixed by the syntax. For example, the entire configuration file may consist of a single line, e.g. >COMMON Nstreams=3 >RECORDERS default_rec >OSTREAMS default_s tr path=/scratch However, for sake or readability, it is recommended to place tags at the beginning of separate lines, as in the examples quoted earlier. Indents, tabs, new lines and comments can be used freely to ease the reading. The comments are introduced by a hash “#” and an exclamation mark “!” characters. Their use is illustrated by Listing 10.4. The lines with “#“ as the first non-blank character are purely commentary. An exclamation mark begins an inline comment spanning the rest of the line, while the combination “!\” begins a long comment which extends to the end of line containing the terminating symbol “\!”. Backslash ” \” can be optionally used as a line-continuation sign, though the syntax does not require it. The rest of the line after a backslash is ignored and can be used for inline comments, like with the “!”. All meaningful items in the configuration files are case-sensitive. ALICE DAQ and ECS manual Recording with the Multiple Stream Recorder 10.5.2.5 173 The configuration file structure All tags, together with their attributes, must be grouped into three sections similar to the “roles” in the DATE roles database: • common section: the structural tag >COMMON itself and its attributes. • recorder section: the structural tag >RECORDERS and the name tags listed after it. By a recorder we mean an instance of the MSR running on a given GDC. The name tag defining a specific recorder must be identical to the GDC name, defined in the DATE roles database (see, for example, Listing 4.3). • output stream section: the structural tag >OSTREAMS and the name tags listed after it. All varieties of stream configurations needed for all GDCs are described here. These configurations are instantiated by stream attributes appearing in recorder tags. Within each section, the tags may appear in any order. The sections >COMMON, >RECORDERS and >OSTREAMS may also appear in any order within the configuration file, but the important requirement is that they must all be present. For the common section this simply means that the tag >COMMON must be present (with or without attributes). As to the recorder and stream sections, each of them must contain the corresponding structural tag (without any attributes) and at least one default tag definition: • default_rec tag, describing the default recorder, must appear in the recorders section. All rules for recorder definitions apply to it. Its main purpose is to provide the configuration for the GDCs which are not explicitly defined in the recorders section. In particular, if all GDCs are equal, default_rec can be the only recorder defined, as shown in Listing 10.1, Listing 10.2 and Listing 10.3. • default_str tag, describing the default stream, must appear in the streams section. The attribute “stream=default_str” is automatically added to any recorder tag having no stream attributes specified explicitly or implicitly (via copying). This feature is illustrated by Listing 10.1. If all streams in the system have the same properties, default_str can be the only stream defined. Thus, there are five tags which must appear in any MSR configuration file. 10.5.2.6 Scopes of attributes and rules of precedence All attributes used in MSR are listed in Table 10.1 and described in more detail in Section 10.5.3. Each of them (except the syntax-level copy attribute) defines a certain property or parameter of the object it qualifies. The attributes appearing in >COMMON apply to all recording streams and all recorders. The attributes appearing in a specific (recorder or stream) tag definition apply only to that definition and the ones derived from it. For example, the stream attribute fsize in the tag pcaldXXgdc in Listing 10.4 applies only to streams created for this recorder, including the instance of the default stream; it is not propagated to instances of the default stream for other recorders. The exceptions are the default recorder and stream, defined by default_rec and default_str tags: their attributes provide default values for other recorders and streams. ALICE DAQ and ECS manual Data recording 174 Table 10.1 Attribute key Attributes in MSR configuration files Property ofb Default valuea Type Description path char* none (obligatory) S(R,C) path of output file stager char* “stagepublic” S(R,C) CASTOR stager host name pool char* “public” S(R,C) CASTOR pool name fsize int 256 S(R,C) max file size (Mbytes) nevents int 0 (no limit) S(R,C) maxnumber of events per file timing int 0 (no timing) S(R,C) timing logging timer_log char* “Stream_time.%R” S(R,C) timing log file sleep int 1 (minimal) S(R,C) stream polling latency mxrecl int 0 (no buffering) S(R,C) record length for buffered writing (non-ROOT only). filename char* “%h_%R_%s_%f.data” S(R,C) output file name template root int -1 (no transcoding) S(R,C) ROOT recording mode compress int 0 (no compression) S(R,C) compression level filtermode int 0 (no filtering) S(R,C) 3rd level filtering maxtagsize double 2.e8 S(R,C) max size of tag DB (bytes) runDBFS char* “/tmp/meta%s” S(R,C) run DB path name tagDBFS char* “/tmp/tags%s” S(R,C) tags DB path name alienHost char* NULL (no AliEN DB) S(R,C) AliEN host [17] reserved for future use alienDir char* NULL (no AliEN DB) S(R,C) AliEN directory [17] reserved for future use method int 1 (equal load) R(C) dispatching method Nstreams int none (optional) R(C) forced number of streams stream char default_str R creates an instance of the named stream (empty) char none (optional) R,S only copies attributes from another name use int none (optional) C forced recorder name loglevel int 1 (minimal) C log level dump int 0 (no dump) C to debug the config file run int none (optional) C forced run number in stand-alone mode source char none (optional) C full name of the data source file in stand-alone mode Nev float none (optional) C event limit in stand-alone mode a. The built-in default value. The attribute without defaults are either obligatory (must appear in the configuration file) or optional (no effect, if absent). ALICE DAQ and ECS manual Recording with the Multiple Stream Recorder 175 b. The tag which an attribute qualifies (C= “>COMMON”, R=recorder, S=stream). Alternative attribute placements are indicated in parentheses. The recommended attribute placement is shown in bold. Any attribute that can be placed in >COMMON may also appear among command-line arguments. Most of the attributes qualifying streams may also appear in recorder and common definitions. Similarly, the recorder attributes (except stream) may also appear in the common section. An alternative placement of an attribute changes its scope and significance. The attribute value assignment rules are given in Table 10.2. When a “lower-level” attribute appears in a “higher-level” definition, its value overrides all lower-level definitions. This feature makes the configuration file grammar more flexible, permitting to affect an entire group of objects without touching the original low-level definitions. Table 10.2 Rules of precedence for MSR attribute values The effective value of the attribute A for the stream S on recorder R is retrieved from: The effective value of the attribute A for the recorder R is retrieved from: 1 = highest priority command-line or >COMMON section (e.g., sleep attribute in Listing 10.4) command-line or >COMMON section 2 R tag in the >RECORDERS section: the last occurrence of A attribute preceding the corresponding stream=S (e.g., filename for the streams of recorder pcaldXXgdc in Listing 10.4) R tag in the >RECORDERS section: the first occurrence of A attribute in the tag description 3 The first occurrence of A in the tag S in >OSTREAMS section (e.g., mxrecl and path for the stream public in Listing 10.4) The first occurrence of A in default_rec 4 The first occurrence of A in default_str (e.g., pool and stager for all streams in Listing 10.4) The built-in default value from Table 10.1 5 = lowest priority The built-in default value listed in Table 10.1 (e.g., the timing attribute for all streams in Listing 10.4) priority The built-in defaults (Table 10.1) have the lowest priority and are applied only if the corresponding attributes are not specified, directly or indirectly, for a given stream. The copy attributes are exercised at the syntax parsing stage, so the copied properties are regarded as if they were explicitly specified. The order of attributes within the tag definition matters only for stream-related attributes in recorder definitions. In that case the last value preceding the stream=X is applied to the instance of X. For example, fsize=255 in Listing 10.3 has no effect, as there are no stream attributes after it. In all other cases, when the same attribute appears several times within a tag, the first value is taken. ALICE DAQ and ECS manual Data recording 176 10.5.2.7 Summary In summary, the MSR configuration file consists of tags and tag attributes, grouped into three sections. The tags in the >RECORDERS section describe individual GDCs. The tags in the >OSTREAMS section describe abstract output streams, instantiated by stream attributes of recorder tags. The obligatory tags default_str and default_rec describe the default stream and recorder, respectively. The >COMMON section contains the attributes whose values are enforced globally. Almost all attributes have built-in defaults which can be modified at different scope levels. Multiple definitions are resolved using rules of precedence. The format of the configuration file is free and may contain inline comments. 10.5.3 Description of the MSR configuration attributes This sub-section describes how to specify the values of the MSR configuration attributes, summarized in Table 10.1. Initially stored as text strings, these values are interpreted according to their type at the semantic parsing stage. Most of them have meaningful built-in default values. All attributes, except stream, can be specified in the command line. In that case, they are prepended to the >COMMON section and, therefore, have the highest precedence. • path The full pathname (without terminating “/”) of the directory which will contain the output file. For CASTOR files it has to start with “/castor/cern.ch”. This attribute is obligatory and must be specified for any stream created by MSR. The specified pathname must have write permissions for the owner of the MSR executables disp and stream, stored in the directory ${DATE_MSTREAM_BIN} (they have SUID and SGID bits set). • stager The CASTOR [9] stager host name. MSR assigns the value of this attribute to the CASTOR environment variable STAGE_HOST. By default, the CERN public stager is used. • pool The CASTOR disk pool name. MSR assigns the value of this attribute to the CASTOR environment variables STAGE_POOL and STAGE_SVCCLASS. By default, the CERN public pool is used. • fsize The file size limit, in MB. The output file is closed when the actually written size exceeds this limit (less a safety margin, for ROOT files). • timing and timer_log The detailed stream timing can be enabled for each output stream, by specifying “timing=1”. The log will be written via infoLogger to the log stream specified by the timer_log attribute (by default, “Stream_time.%R”, common for all streams). The statistics (minimal, maximal and mean values, the accumulated sums) are recorded at each file close for the following time intervals: time spent while waiting for events from disp, write operations, time between consecutive writes, file open and close latencies. • sleep The stream processes poll the dispatcher FIFO while waiting for the next event. Whenever the FIFO is found to be empty, the stream executes usleep(s), where “s” is the value of the sleep attribute. “sleep=0” turns sleeping off (not ALICE DAQ and ECS manual Recording with the Multiple Stream Recorder 177 recommended!) and any negative value enables the minimal possible non-CPU consuming wait interval (10 ms or 20 ms, depending on the Linux platform). • mxrecl A non-zero value enables buffering for non-ROOT recording. The optimal value should be found experimentally, as it strongly depends on running conditions and the average event size. By default, buffering is disabled. • filename The output file name, see Section 10.5.2.3 for details. The filenames should contain symbols identifying streams and sequential file numbers, to avoid clashes. Such clashes are not detected by the MSR and might not even cause run-time errors, but the data will be tacitly overwritten. • root A non-negative value enables ROOT recording and specifies the access protocol for the raw DB created by the corresponding stream. The currently supported values are: “0” (writing to a local filesystem, using class AliRawDB) and “3” (writing to CASTOR via rootd daemon, using class AliRawCastorDB). The values “1” and “2” are reserved for RFIO/CASTOR (using class AliRawRFIODB) and plain ROOTd (using class AliRawRootdDB). For further details about ROOT-related classes, refer to their description in the AliROOT documentation [16]. The ROOT recording must also be enabled at the compilation level, by defining the ROOTsys macro in the MSR GNUmakefile (Section 10.5.4). A non-ROOT version of MSR will abort if a non-negative value of root is specified. Using different ROOT recording modes for different streams, though possible with MSR, is discouraged. • compress, filtermode, maxtagsize, runDBFS, tagDBFS, alienHost and alienDir For the streams with the ROOT recording enabled, these attributes qualify the ROOT recording mode and are transferred to the class AliMDC constructor (the corresponding API function is reproduced in Listing 10.5). The special values are: • leading “-” in runDBFS and/or tagDBFS: the corresponding DB creation is suppressed and its pathname is reset to NULL. • “maxtagsize=0”: the tag DB creation is suppressed, tagDBFS is reset to NULL. ALICE DAQ and ECS manual Data recording 178 Listing 10.5 An API function used by the MSR to create an AliMDC object 1: include “AliMDC.h” 2: // creating AliMDC object for ROOT recording with MSR 3: void *alimdcCreate( 4: int compress, 5: int filtermode, 6: const char* runDBFS, // 7: Bool_t rdbmsRunDB, // =0 (MySQL run DB is disabled) 8: const char* alienHost, 9: const char* alienDir, 10: double maxtagsize, 11: const char* tagDBFS) { 12: return new AliMDC( compress, kFalse, AliMDC::EfilterMode(filtermode), runDBFS, rundbmsRunDB, alienHost, alienDir, maxtagsize, tagDBFS ); 13: } • method This attribute defines the dispatching method used by the corresponding recorder to distribute the events between the streams. The value “1” corresponds to the “equal-load” method and is assumed by default. The value “2” corresponds to the “first-available” method which bypasses the busy streams (having their buffer FIFOs almost full). Both methods are protected by an internal time-out which may temporarily disable the overloaded stream(s). The difference in performance for the two methods is marginal and strongly depends on the running conditions. • Nstreams This attribute enforces the specified number of streams for the corresponding recorder. By default, MSR creates as many streams, as there are stream attributes in the recorder tag. If no stream attributes are present in that tag, one default stream (defined by the obligatory default_str tag) is assumed. If the value of Nstreams is less than the actual number of stream attributes listed in the tag, then the trailing streams are discarded. Otherwise, the entire list is iterated until the number of streams requested by Nstreams is reached. • stream This attribute creates an instance of the named stream for a given recorder. The order in which stream attributes are listed within the same recorder tag may be relevant for two reasons. First, if the effective value of the Nstreams for that recorder is less than the number of its stream attributes, only the leading ones are retained. Second, if it is desirable to modify the properties of streams instantiated for a given recorder, all modifying attributes must proceed the corresponding stream attribute (see, for example, the use of the filename templates for the recorder pcaldXXgdc in Listing 10.4). • use This attribute can be placed in the >COMMON section to enforce the specified recorder tag on all recorders. It overrides the default behavior of MSR which determines the GDC hostname and picks the recorder tag with that name (or the default_rec tag, if such tag is missing). • loglevel ALICE DAQ and ECS manual Recording with the Multiple Stream Recorder 179 Defines the volume of status and diagnostic messages. When running in the DATE mode, all messages from all disp and stream processes are sent to the infoLogger, with the recorder and stream names prepended. Note, that the detailed and debug levels of infoLogger are enabled only in a special debugging version of MSR produced with the “_d_“ macro defined in the MSR GNUmakefile (see Section 10.5.4). Each recorder sends a single-line starting message to the runLog stream. All subsequent messages go to the common dedicated log stream (“LOGNAME = mStreamRecorder”). Fatal errors are reported with the FATAL macro. The timing logging (see the description of timing and timer_log attributes) is not affected by loglevel. • dump This attribute is useful when composing or testing configuration files, especially in the stand-alone mode. Its only effect is to produce a detailed dump of the configuration tree structure immediately after the syntax parsing and write it to stdout. • run This attribute can be placed in the >COMMON section to override the run number coming with the data. It is effective only in the stand-alone mode. If “run=0“ is specified, MSR will stop after parsing the configuration file, without creating any streams. • source The full name of the local source raw data file for the stand-alone (“replay”) mode. This file can be created either directly by a GDC, or by MSR running with DATE in the non-ROOT mode. If the source attribute is omitted in the stand-alone mode, MSR will generate some meaningless dummy events which are good only to test MSR in the non-ROOT mode. For consistent testing, a short sample raw data file ${DATE_MSTREAM_DIR}/sample_source.dat can be used. • Nev Specifies the number of events to process in the stand-alone mode. By default, MSR stops after processing all events in the source file (see source attribute). If Nev is given, the source file is “replayed” (in a loop, if needed) until the required number of events is reached. 10.5.4 How to build and run MSR In order to build MSR components, run gmake in the ${DATE_MSTREAM_DIR} directory containing the MSR source codes. The makefile GNUmakefile puts MSR libraries and executables into the directory ${DATE_MSTREAM_BIN}. Check that the executable files disp, stream and cleanup have the “s” bits set and the owner of these files has a write-access to the directories that MSR will be writing to. The makefile has internal variables-switches to control the makefile execution and/or to create functionally different versions of MSR: • EB=1 produces the “DATE” version reading from the eventBuilder and reporting to infoLogger; EB=0 produces the “stand-alone” version reading from a pre-recorded file, reporting to standard output and independent of any ALICE DAQ and ECS manual Data recording 180 DATE components, except simpleFifo. • ROOT=1 produces the ROOT-aware version using AliROOT API. The pathnames ${ROOTSYS} (the ROOT installation directory) and ${ALIROOT} (the AliROOT directory), required for the ROOT version, are defined within the makefile. ROOT=0 produces the version stripped of all ROOT-related features and independent of any ROOT resources. • DEBUG=1 produces the debugging version of MSR. Currently, its only difference from the standard version (produced with DEBUG=0) is that the code to produce the detailed and debug level of logging messages (see the loglevel attribute) is included only in the debugging version of MSR. In future, the standard version can be further optimized by delegating some error checks to the debugging version. • HC=1 for the DATE version replaces the default reporting to infoLogger with printing to the standard output, like in the stand-alone version (this applies also to timing logs, see the timer_log attribute). • SHARED=1 instructs the makefile to create shared MSR libraries and link MSR correspondingly. With SHARED=0, static linking is performed. Note that all ROOT libraries are always dynamically-linked, independently from that switch. • TRACE switch simply controls the makefile verbosity. TRACE=-1 retains the default gmake output. TRACE=0 is equivalent to “-s” option of gmake (any output except for errors is suppressed). TRACE=1 prints a one-line header for all actions the makefile performs (including the intrinsic compilation rules) and, finally, TRACE=2 adds the action headers to the default gmake output. These variables are assigned in the beginning of GNUmakefile as follows: EB=1, ROOT=1, DEBUG=0, HC=0, SHARED=1 and TRACE=1. To modify the default value(s), either edit the GNUmakefile and re-make MSR, or specify the alternative assignment(s) as gmake arguments, for example: gmake -W GNUmakefile DEBUG=1 SHARED=0 Note, that GNUmakefile itself is in the list of common dependencies for MSR, so editing GNUmakefile (or giving its name in the “-W“ option) will force gmake to re-compile the entire MSR. Before running MSR, one has to prepare the configuration file. By default, MSR uses the name ${DATE_SITE_CONFIG}/mStreamRecorder.config (or ./mStreamRecorder.config, if the DATE_SITE_CONFIG environment variable is undefined). Using command-line options, an alternative filename can be specified as shown below. At Point 2, a special syntax is used to handle global templates and partition-specific configuration files (see Section 10.5.2). The standard way to run MSR together with DATE is by using the DATE runControl Human Interface (Section 14.5). Before starting processes: • set recordingDevice to “:” in the “GDC configuration” menu. • check “Recording enable” and “Recording to PDS” options in the main menu. To run the stand-alone version of MSR, enter a fully specified name of the disp executable, with optional arguments. Apart from any number of the configuration attributes, two options, “-v“ and “-f”, can be specified in the command line. The ALICE DAQ and ECS manual Recording with the Multiple Stream Recorder 181 “-v” option will cause disp to get loaded and print the MSR version number and properties. The “-f” option followed by a full filename specifies the configuration file to be used. A few examples are given in Listing 10.6. Note, that the configuration attributes specified in the command line have the highest priority and override the corresponding values in the configuration file (see Table 10.2). Listing 10.6 Examples of starting MSR in the stand-alone mode 1: ./Linux/disp -f my.config Nev=1.e6 run=287 source=/scratch/%h.dat 2: ${DATE_MSTREAM_BIN}/disp source=${DATE_MSTREAM_DIR}/sample_source.dat 3: ./Linux disp -v 4: disp build IM 050505; debug version; EB; 5: ROOT (linked with ROOTSYS=/adcRoot/ROOT/Linux/CurrentRelease/root) 6: ./Linux/stream -v 7: stream build IM 050505; EB; 8: ROOT (linked with ROOTSYS=/adcRoot/ROOT/Linux/CurrentRelease/root) 9: ./Linux/disp -f test.config use=thatGDC dump=1 loglevel=2 run=0 10: ./Linux/test_config test.config use=thatGDC Line 1 shows an example of a “replay” run with a custom configuration file. It shows that the data source filenames may contain meta-characters. Line 2 shows a replay of the standard MSR sample file. Line 3 tests the version of the disp program. Line 6 shows that the stream program version can also be tested. This is also a way to test-load the stream process. Line 9 shows how to test a configuration file. The file will be fully parsed by disp and the streams for the recorder running on thatGDC defined in this file will be prepared, with all their parameters printed out. However, the actual streams will not be created and the execution will stop at that stage, because of the “run=0“ attribute. The MSR tool application test_config (see Line 10) performs the same actions. Its advantage is that it does not depend on any external resources, like DATE or ROOT libraries. ALICE DAQ and ECS manual 182 Data recording ALICE DAQ and ECS manual The infoLogger system 11 A data-acquisition setup can consist of many nodes, each of them possibly running several DATE processes. For development and operation, it is needed to know how the distributed components behave, and what happens on the different machines. The DATE infoLogger package provides facilities to generate, transport, collect, store and consult log messages. This chapter describes how the infoLogger works and how to use it. ALICE DAQ and ECS manual 11.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 11.2 infoLogger configuration . . . . . . . . . . . . . . . . . . . . 184 11.3 The infoLogger processes . . . . . . . . . . . . . . . . . . . . 185 11.4 Log messages repository . . . . . . . . . . . . . . . . . . . . 187 11.5 Injection of messages . . . . . . . . . . . . . . . . . . . . . . 188 The infoLogger system 184 11.1 Introduction The infoLogger system provides facilities to send, collect and browse log messages created by DATE components and user processes on different machines. Figure 11.1 shows the overall architecture of the infoLogger system. A process calling a function of the infoLogger library sends the message to the local infoLoggerReader daemon. This process collects all the messages of the node where it runs, and sends them to a central infoLoggerServer daemon, which stores the received messages in a MySQL database. If at some point the transmission chain is broken, the messages are written to disk to avoid losing them. The infoBrowser user interface allows to read messages, either stored in the database or received online by the central server. API Process infoLoggerReader Unix socket HD HD Local host TCP/IP TCP/IP infoBrowser Console Figure 11.1 infoLoggerServer DB Central server The DATE infoLogger architecture 11.2 infoLogger configuration The connection parameters between the different processes are stored in the MySQL configuration database and should be defined with editDb (see Section 4.5) in the environment variables section. Table 11.1 describes the variables required. Table 11.1 infoLogger configuration parameters - environment variables Variable name Meaning DATE_INFOLOGGER_LOGHOST Name of the host running the infoLoggerServer process DATE_INFOLOGGER_MYSQL_DB Name of the database to be used to store log messages. ALICE DAQ and ECS manual The infoLogger processes 185 Table 11.1 infoLogger configuration parameters - environment variables Variable name Meaning DATE_INFOLOGGER_MYSQL_HOST Host running the database server. It is recommended to have it on the same machine as the infoLoggerServer. DATE_INFOLOGGER_MYSQL_USER MySQL username to connect to the database. DATE_INFOLOGGER_MYSQL_PWD MySQL password to connect to the database (with above user). These environment variables are stored in the database configuration class named infoLogger. They are created by default when installing the DATE database, and loaded at runtime by the components (readers and server). They are not loaded by the DATE setup procedure in the environment; the values are queried only when starting the infoLogger processes. Default socket port numbers for the communication between the processes are defined in $(DATE_ROOT)/commonDefs/shellParams.common, with the global environment variables DATE_SOCKET_INFOLOG_RX (reader to server) and DATE_SOCKET_INFOLOG_TX (browser to server). They don’t need to be redefined. The Unix named socket used to communicate between a log client and the local reader is based on the value of $(DATE_SITE) and does not need to be defined. DATE messages are stored in the specified database. MySQL infoLogger table structure should be initially created, once configuration parameters are defined, with the command /date/infoLogger/newDateLogs.sh -c. In addition, some infoLogger related log files are created in ${DATE_SITE_LOGS} when needed, as described in Section 11.3. This variable is set by default to ${DATE_SITE}/logFiles. Please note that the verbosity of some runtime processes are defined by the runControl run parameter named loglevel. The higher this integer, the higher the verbosity of the processes. When used, this variable is handled by the processes before any call to the infoLogger. Details about this parameter are described in Chapter 14. 11.3 The infoLogger processes The infoLogger components create some log files in the ${DATE_SITE_LOGS} directory in addition to the DATE log repository. These files contain status information, daemon error messages, and DATE log messages that could not be transmitted. For performance reasons, these log files are opened only once by all the infoLogger components. Therefore, you should first stop the infoLogger daemons before removing these files. Because of the operating system implementation, no error occurs on the daemons side if you remove the files while ALICE DAQ and ECS manual The infoLogger system 186 the daemons run. They will continue to write to the same files, which are not accessible any more by the user and which will be destroyed when closed. 11.3.1 infoLoggerReader The infoLoggerReader daemon is started automatically by the first process using the DATE infoLogger library on a specific host. It listens to a Unix named socket which name is based on ${DATE_SITE} (no configuration required), and receives all log messages created on the node by the infoLogger library. Then messages are sent to the central server, as defined by variable DATE_INFOLOGGER_LOGHOST, on TCP port DATE_SOCKET_INFOLOG_RX using a special protocol. A control script, ${DATE_INFOLOGGER_BIN}/infoLoggerReader.sh, is provided to start, stop, or restart it. DATE_SITE must be defined when invoking the script. The infoLoggerReader processes associated to a specific DATE_SITE can be started and stopped remotely with the dateSiteDaemons script. It should be used to restart infoLoggerReader after a change in the configuration (for example a change of the host name where the infoLoggerServer is running, etc.). 11.3.2 infoLoggerServer The infoLoggerServer daemon runs on a central node, defined by the variable DATE_INFOLOGGER_LOGHOST, and receives log messages sent by remote infoLoggerReader processes. It stores messages in a MySQL database. It also accepts connections on TCP port DATE_SOCKET_INFOLOG_TX where clients can connect to get log messages as soon as they are received by the server, without querying the repository. The message order of insertion and delivery is not guaranteed, only the message timestamp is reliable to order messages coming from a given machine. The accuracy of clock synchronization is critical when correlating events from different nodes, and it is not under the control of the DATE system. A control script, ${DATE_INFOLOGGER_BIN}/infoLoggerServer.sh, is provided to start, stop, or restart it. DATE_SITE must be defined when invoking the script. The infoLoggerServer process associated to a specific DATE_SITE can be started and stopped remotely with the dateSiteDaemons script. It should be used to launch infoLoggerServer before using DATE, or to restart infoLoggerServer after a change in the configuration (i.e. database parameters, node name, etc.). 11.3.3 infoBrowser The infoBrowser process is a user interface to extract and display log messages from the infoLogger system. A full description of it is given in the operations section of the ALICE DAQ WIKI. ALICE DAQ and ECS manual Log messages repository 187 11.4 Log messages repository Each DATE log message is made of several attributes: • Severity: the information level of each message. This can be one of: • Information: used for messages concerning normal running conditions. • Error: when an abnormal situation has been encountered but execution can somehow continue. • Fatal: when an unrecoverable situation has been detected and normal execution cannot be guaranteed. It usually causes the end of the current run. • Timestamp: time of the message creation, with microsecond resolution, as provided by the local operating system where the message is created. • Host name: host where the message was created. • Process ID: identifier of the process creating the message. • User name: user running the process creating the message. • System: system originating the message. For all DATE processes, this is set to DAQ. This field is useful if the infoLogger facilities are shared with other systems, like the ECS. It is defined by the value of the environment variable DATE_INFOLOGGER_SYSTEM. • Facility: the activity family, usually the DATE package name creating the message. This can be for example readout, recorder, runControl, or operator in case of a message coming from the command line. • Stream: the log stream name the message belongs to. It is usually the name of a detector (standalone operation) or of a partition (global runs). • Run number: when available, the run number associated to the message. This field can be undefined (empty value or -1). Most processes related to the runControl set this attribute when logging messages. • Message: the log information. It is a text string. End of line characters are not allowed in a message. Multiple-line messages are split into different messages (with the same other attributes). Log messages, with all the associated information as described above, are centrally collected for the whole DATE system, and stored in a repository. Two implementations of the repository are available: the infoLoggerServer can either write to a flat file or to a MySQL database. Some tags have a maximum string length allowed in the MySQL version of the repository. These limits are defined in the file ${DATE_INFOLOGGER_DIR}/newDateLogs.sh. 11.4.1 MySQL database Messages are stored in a table named messages, which columns correspond to the log fields previously described. The MySQL database connection parameters and infoLogger tables should be initially created as described in Section 11.2. ALICE DAQ and ECS manual The infoLogger system 188 11.4.2 Archiving Messages received by the infoLoggerServer are stored in the messages table in the MySQL database. It is important to archive older messages to optimize the resources used to insert, browse or process log information in the main table. The quantity of information stored in the main table is also limited by the maximum file size allowed by the file system. A utility is provided to manage the amount of logs over time: • ${DATE_INFOLOGGER_DIR}/newDateLogs.sh -d Deletes all messages in the main table (but not the archives). • ${DATE_INFOLOGGER_DIR}/newDateLogs.sh This script, launched without option, creates an archive from the main table. A new table (or file) is created, whose name includes the current date and time. All messages from the main table are moved to this archive. It is recommended to delete regularly or archive the messages from the main log table, for example with a regular cron job calling newDateLogs.sh script. Be sure that DATE_SITE is defined when this script is called. Log information stored in archived tables is still available to the infoBrowser. 11.4.3 Retrieving messages from repository The infoBrowser interface is the best tool to browse and display information stored in the log repository. It includes filtering and searching capabilities, and allows to export data to a text file. The full description of the infoBrowser is given in the operations section of the ALICE DAQ WIKI. A simple data extraction tool, ${DATE_INFOLOGGER_DIR}/getLog.sh, is also provided to extract information from the main table to stdout, independently from the repository type. This script can also output logging information in the format of the DATE system version 4, and select data from a specific stream. Call it with -h option to get details on usage. This output can then be piped to awk or grep, when looking for specific messages. Users familiar with SQL can also query directly the messages table. 11.5 Injection of messages Any process running on any DATE host can use the infoLogger system to transfer debug, information and error logs. Messages can be injected into the logging system using the command line tools, or with the APIs provided (C and Tcl). The DATE setup procedure must have been executed before launching any process using the infoLogger library. ALICE DAQ and ECS manual Injection of messages 189 All inserted messages must be native strings. If the message tag contains carriage returns, the message is split into several log messages with the same remaining tags. When necessary, it is possible to copy messages injected in the infoLogger system to stdout by setting the environment variable DATE_INFOLOGGER_STDOUT to TRUE. It can be useful for interactive tools. Note that the stream is set automatically at run time by the runControl, therefore it should not be set manually to other values. 11.5.1 Logging from the command line A set of executables allows to inject messages from the command line or from scripts. These tools are located in the ${DATE_INFOLOGGER_BIN} directory. For details on their usage, invoke them with the -? command line option. The DATE setup procedure must be executed before using these programs. • log Log a given message. Severity and facility may be provided. • logTo Log a given message to a given stream. Severity and facility may be provided. • logFromStdin Read messages from standard input. Messages are strings delimited by end of line character. It is possible to pipe the output of a program into this utility to have it injected into the log system. Severity, facility and stream may be provided. Unless otherwise specified, the facility tag is set to operator, and the destination stream tag is set to defaultLog. 11.5.2 Logging with the C API This section describes the macros and the methods available for programs written in C. A set of primitives are defined for the programmer in the header file ${DATE_INFOLOGGER_DIR}/infologger.h. A C program by default sends messages tagged with Facility set to the package name (for DATE packages) or with the image filename (for non-DATE packages). This behavior can be changed by setting the C preprocessor variable LOG_FACILITY to the desired value. This must be done before including the infoLogger.h file, e.g.: Listing 11.1 Setting the Facility name in C programs 1: #define LOG_FACILITY “myFacility” 2: include “infoLogger.h” Unless otherwise specified, the destination stream tag is set to runLog. ALICE DAQ and ECS manual The infoLogger system 190 Compilation of C programs require the inclusion of the file infoLogger.h. The linking with the library libInfo.a is also needed. Once the DATE setup procedure has been executed, a command to build the program is gcc myProgram.c -I${DATE_INFOLOGGER_DIR} \ ${DATE_INFOLOGGER_BIN}/libInfo.a By default, a connection to the infoLoggerReader process is opened (and the daemon launched if necessary) when the first log message is created. It remains open during the life of the process. To explicitly open/close this connection, the functions infoOpen() and infoClose() can be used. Additional functions calls exist which allow printf-like arguments, avoiding thus the need of a local buffer to prepare the message when variables values have to be included. infoOpen C Synopsis #include “infoLogger.h” void infoOpen(void) Description Opens the connection to infoLoggerReader. Its usage is optional: this function is called automatically when logging the first message. If the socket is not found, the infoLoggerReader is started. infoClose C Synopsis #include “infoLogger.h” void infoClose(void) Description Closes the connection to the infoLoggerReader. Its usage is optional. infoLog_f C Synopsis #include “infoLogger.h” void infoLog_f(const char* const facility, const char severity, const char* const message, ...) Description The specified message is injected into the infoLogger system, in the default log stream, with the given severity and facility. message is a string as accepted by printf. Additional parameters can be provided for string formatting. This avoids buffering a message that includes variable values. ALICE DAQ and ECS manual Injection of messages 191 infoLogTo_f C Synopsis #include “infoLogger.h” void infoLogTo_f(const char* const stream, const char* const facility, const char severity, const char* const message, ...) Description The specified message is injected into the infoLogger system, in the given log stream, with the given severity and facility. message is a string as accepted by printf. Additional parameters can be provided for string formatting. This avoids buffering a message that includes variable values. LOG C Synopsis #include “infoLogger.h” void LOG( char severity, char *message ) severity can be one of: LOG_INFO LOG_ERROR LOG_FATAL Description The specified message is injected into the infoLogger system, in the default runLog stream, with the given severity. LOG_TO C Synopsis #include “infoLogger.h” void LOG_TO( char *stream, char severity, char *message ) severity can be one of: LOG_INFO LOG_ERROR LOG_FATAL Description The specified message is injected into the infoLogger system, in the given log stream, with the given severity. LOG_ALL C Synopsis #include “infoLogger.h” void LOG_ALL( char *stream, char severity, char *message ) severity can be one of: LOG_INFO ALICE DAQ and ECS manual The infoLogger system 192 LOG_ERROR LOG_FATAL Description The specified message is injected into the infoLogger system, both in the default runLog stream and in the given log stream, with the given severity. INFO C Synopsis #include “infoLogger.h” void INFO( char *message ) Description The specified message is injected into the infoLogger system, in the default runLog stream, with Info severity. ERROR C Synopsis #include “infoLogger.h” void ERROR( char *message ) Description The specified message is injected into the infoLogger system, in the default runLog stream, with Error severity. FATAL C Synopsis #include “infoLogger.h” void FATAL( char *message ) Description The specified message is injected into the infoLogger system, in the default runLog stream, with Fatal severity. INFO_TO C Synopsis #include “infoLogger.h” void INFO_TO( char *stream, char *message ) Description The specified message is injected into the infoLogger system, in the given log stream, with Info severity. ALICE DAQ and ECS manual Injection of messages 193 ERROR_TO C Synopsis #include “infoLogger.h” void ERROR_TO( char *stream, char *message ) Description The specified message is injected into the infoLogger system, in the given log stream, with Error severity. FATAL_TO C Synopsis #include “infoLogger.h” void FATAL_TO( char *stream, char *message ) Description The specified message is injected into the infoLogger system, in the given log stream, with Fatal severity. INFO_ALL C Synopsis #include “infoLogger.h” void INFO_ALL( char *stream, char *message ) Description The specified message is injected into the infoLogger system, both in the default runLog stream and in the given log stream, with Info severity. ERROR_ALL C Synopsis #include “infoLogger.h” void ERROR_ALL( char *stream, char *message ) Description The specified message is injected into the infoLogger system, both in the default runLog stream and in the given log stream, with Error severity. FATAL_ALL C Synopsis #include “infoLogger.h” void FATAL_ALL( char *stream, char *message ) Description The specified message is injected into the infoLogger system, both in the default runLog stream and in the given log stream, with Fatal severity. ALICE DAQ and ECS manual The infoLogger system 194 LOG_NORMAL_TH LOG_DETAILED_TH LOG_DEBUG_TH Description The symbols define the thresholds for message generation. Messages may be injected into the infoLogger system depending on the logLevel run parameter. Refer to Section 14.11 for more details on this convention. The corresponding values are defined in infoLogger.h. 11.5.3 Logging with the Tcl API A subset of the C API can be called directly from Tcl scripts. The list of accessible functions is given in ${DATE_INFOLOGGER_DIR}/infologger.i. It includes, in particular, the infoLog and infoLogTo functions described in Section 11.5.2. To use the library, load the module at the beginning of the Tcl script: load $env(DATE_INFOLOGGER_BIN)/libInfo_tcl.so infoLogger Then, call the infoLogger functions with the same arguments as defined in the C interface: infoLog “my facility” “I” “This is an information message” ALICE DAQ and ECS manual The eventBuilder 12 The DATE eventBuilder is the software package running on a Global Data Collector (GDC), receiving data from several Local Data Concentrators (LDC), assembling the data into single events and recording them to the output stream. This chapter includes a description of the event-builder architecture and describes how sub-events are identified as belonging to the same event and how they are built as a single event. This chapter describes also how the eventBuilder uses some of the other DATE packages such as the runControl and the infoLogger. ALICE DAQ and ECS manual 12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 12.2 The event-builder architecture . . . . . . . . . . . . . . . . . 196 12.3 Data buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 12.4 Consistency checks on the data. . . . . . . . . . . . . . . . . 199 12.6 The control of the eventBuilder. . . . . . . . . . . . . . . . . 200 12.7 Information and error reporting . . . . . . . . . . . . . . . . 200 12.8 Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . 200 The eventBuilder 196 12.1 Overview The DATE eventBuilder is a software package responsible for merging together several streams of sub-events data originated from different readout subsystems into a single stream of events. This stream can be directed to the appropriate recording device or - using a memory-mapped scheme - to the next processing stage (filtering, compression, special recording). A DATE data-acquisition system is composed of one or several parallel readout streams. Each of these stream(s) is carrying the data produced by the front-end electronics of one detector or part of it. This front-end electronics of each stream is controlled and readout by one processor called the Local Data Concentrator (LDC). The event building is performed by processors called the Global Data Collector (GDC). The sub-events are transferred from the LDCs to the GDCs using the socket library of TCP/IP. The transfer is executed by the DATE recorder process running on the LDC when it is configured to use a GDC as output device. The sub-events are received by the eventBuilder which - according to the directives given by the event-building database - creates the appropriate event and forwards it to the following processing stage (recording or online transfer). The output of the eventBuilder can be either recorded directly to one or more local devices or sent to a further processing stage using fifos and memory buffers. Details on the different recording schemes and their description can be found in Section 10.4 The eventBuilder is running under the control of the DATE runControl system from which it receives the commands to start and stop the run and the parameters needed for a run. The infoLogger functions are used for normal and exceptional messages and to report run statistics and descriptions. 12.2 The event-builder architecture 12.2.1 The data transfer from the LDC to the GDC The LDC recorder program (see Section 10.3) writes data onto an output stream whose name is given by the runControl. Amongst several possibilities, the output stream can be a GDC machine where the eventBuilder is running. By defining the output stream to be a GDC, the LDC becomes part of a multiple hosts DAQ system. When the run is starting, the LDC recorder program opens the output stream on one TCP/IP socket of the GDC. The eventBuilder accepts the connection, negotiates the socket parameters and, when the run is declared started by the runControl system, begins to poll the channel for incoming data. Whenever new data is available, this is accepted and stored in the event-builder data buffer. When the event is completely readout, the eventBuilder takes the appropriate action. Once the event is completed, it is moved either to the recording stage or to the next processing stage, according to the configuration database. ALICE DAQ and ECS manual The event-builder architecture 197 If the eventBuilder runs out of memory, it stops accepting data from the LDC(s). Thanks to the backpressure applied by the TCP/IP socket library, this stops the recording process on the LDC(s) - at least for what concerns this particular eventBuilder (if the recorder process has multiple channels active, recording on the LDC can continue on other free channels). 12.2.2 The communication protocol between the LDC and the GDC The communication protocol between the LDC recorder and the GDC eventBuilder is based on the DATE data format (see Section 3.5). For each event the following operations are performed: • the eventBuilder reads the event header. On the basis of the event header, the eventBuilder knows the event type, the event number and the event length. • the validity of the header is checked: • magic word field and number of bytes effectively read compared to standard header length. If the event header is incorrect, an error is issued to the infoLogger and a special event header is created with the type EVENT_FORMAT_ERROR. • statistics are accumulated on the different event types. • data is read into the eventBuilder data buffer. If the data is truncated, an error bit (EVENT_DATA_TRUNCATED) is added into the event type. The cycle is repeated until the run is declared closed and either the LDC closes its channel or an abort (quick exit) sequence is started. 12.2.3 The communication protocol between the eventBuilder and the edm The communication protocol between the eventBuilder and the edm is unidirectional, from the eventBuilder to the edm. The messages exchanged are two: EDM_MESSAGE_NEARLY_FULL and EDM_MESSAGE_NEARLY_EMPTY. Both messages must be terminated by the character EDM_MESSAGE_SEPARATOR. All these constants are defined in ${DATE_EDM_DIR}/edm.h. The communication channel is set in asynchronous mode, therefore it is impossible to block the eventBuilder (if the channel between the eventBuilder and the edm becomes busy, the eventBuilder queues the messages that will be sent whenever the channel becomes free again). 12.2.4 The event-building process The process of building the event is based on the header of the incoming event and from the directives recorded in the event-building control database. The decision is always based on the eventType field of the event header. It can be refined by the eventDetectorPattern field (the set of detector(s) selected to readout the given event) or by the eventTriggerPattern field (the set of trigger(s) active for the given event). The first rule that matches the event is selected. The rule decides if the event should be built (the event includes all LDCs ALICE DAQ and ECS manual The eventBuilder 198 contributing to a given event) or not built (one LDC event per GDC event). If no rule can be applied to a given event, the eventBuilder requests an end of run with error condition. If the detector pattern stored in the event header enables a subset of the detectors included in the run, the eventBuilder will expect data only from those detectors. Therefore a “full build” rule may become a “partial build” action whenever a partial readout was performed. When running with the HLT, the HLT response may be used to change the building rule as the HLT response can enable or disable individual LDCs. The eventBuilder takes the HLT response into consideration and changes the event-building policy accordingly, expecting data only from those LDCs whose readout has been enabled. 12.2.5 SOR/EOR records, files and scripts The eventBuilder handles SOR and EOR records, files and scripts using the same method used by readout. Please note that hosts running both as LDC and as GDC transfer the same files and execute the same scripts (common and specific) twice, once as LDC and once as GDC. SOR and EOR scripts can differentiate their actions by testing the environment variable ${DATE_HOST_ROLE} (which is set to “ldc” if the script is being called by the readout process and to “gdc” if the script is being used by the eventBuilder process). 12.3 Data buffers The eventBuilder must be able to perform its main function: to put together sub-events belonging to the same event. To do this, it must ensure to have enough memory available for each LDC to build at least one event. The eventBuilder is given statically one memory bank. This bank is dynamically partitioned into two sections: the per LDC section and the public section. The per LDC section is further partitioned into one sub-section for each of the LDCs connected at start of run. The public section is available to all LDCs. The partition between public and private pools is done using compilation-time configuration parameters and run-time dymanic parameters. The actual values and the usage of all pools is available as a statistics record sent via the logBook stream, as described in Section 12.7.3. The eventBuilder refuses to run if the available memory is too small to allocate a minimum number of events. The check is made at start of run time using the maximum event size as declared by the run control for the LDCs connected to the eventBuilder. If the available memory is not sufficient, the eventBuilder sends an appropriate error message and requests an end of run with error. It is then up to the data-acquisition system administrator to provide bigger memory buffers, as requested by the eventBuilder via the eventBuilder stream. ALICE DAQ and ECS manual Consistency checks on the data 199 The event-builder data buffer can be implemented using any of the available supports. This includes the process HEAP; note that, in this case, allocation is performed once at start of run time (and not on demand) and that the only possible recording option is direct recording (events cannot be shared with post-processing stages as memory transfer is impossible). BIGPHYS/PHYSMEM are also allowed and may actually perform better than IPC seen the different approach of the operating system between the two methods (less overheads, no swapping, less conflicts, different resources). When post-processing is requested, the eventBuilder must be given space for its ready fifo. This can be done either by declaring a bank dedicated to all eventBuilder resources or by using two banks, one for eventBuilderReadyFifo and the second for eventBuilderDataPages. The second method ensures a better tuning of the two resources and eventually allows the use of two different support methods(e.g. IPC for the eventBuilderReadyFifo and PHYSMEM for the eventBuilderDataPages). 12.4 Consistency checks on the data The eventBuilder checks the data that is sent to it. This operation can reveal fatal errors originated in the readout electronics or readout software. In case of error, this is signaled through the infoLogger and the eventBuilder requests the run control to stop the run. Furthermore, a rule of the event-building database must match every given event. This rule can specify a subset of detectors (either directly or indirectly via the eventTriggerPattern and eventDetectorPatterns fields) in which case all LDCs belonging to the subset must contribute to the event. If no subset is specified, all LDCs are supposed to contribute to the event with a (possibly empty) sub-event. 12.5 ALICE events emulation mode The eventBuilder can run is a special ALICE events emulation mode. Target of this special function is to emulate as close as possible to the behavior of a production GDC when running in ALICE production. When this mode is selected, the data from each LDC is taken individually and unpacked as if it would come from several LDCs. The payload of the event must contain a real ALICE event, with one event header (coming from a GDC) and one or more events headers, equipment headers and payload. All this can be either built or extracted as is from an event recorded during a previous run. The result of this operation is an event that closely looks like its equivalent produced by the original LDCs (the only differences are the event ID, the event timestamp and the fact that the event data comes from consecutive memory blocks rather than from several scattered memory locations). The event-builder refuses payloads that do no match this structure and aborts the run with error as soon as this happens. This special running mode can be selected by setting the environment variable DAQ_EMULATE_ALICE_EVENTS to the value 111. This can be done either by ALICE DAQ and ECS manual The eventBuilder 200 asserting a DAQ-wide variable or by going machine by machine. Special care must be taken not to use this mode in production setups (even though the event-builder will most likely abort the run due to the inconsistent format of the event payload). 12.6 The control of the eventBuilder The eventBuilder is running under the control of the runControl system. In the GDC, the rcServer process is responsible for maintaining the control shared segment and for allocating the required memory buffers at start of run time. The eventBuilder cannot control detached processes (such as a post-processing stage). As for any other DATE process, this function must be delegated to the DATE runControl system. 12.7 Information and error reporting 12.7.1 Usage of the infoLogger The eventBuilder uses the DATE infoLogger package (see Chapter 11) to report statistics, information and error messages. 12.7.2 Run statistics update The eventBuilder updates regularly the information stored in the ALICE LogBook using the statistics update routines which are part of the DATE LogBook package (see Section 24.3). 12.7.3 End-of-run messages At the end of the run, the eventBuilder updates the run log with run statistics, warning and error messages. Run statistics include memory usage, timers, counters, event-building rules usage, per LDC counters and run-time performances. 12.8 Configuration The eventBuilder should be configured statically (event-building rules, memory banks) and dynamically (run control). The event-building static configuration is described in Section 4.3.5. ALICE DAQ and ECS manual Configuration 201 For the memory banks, the eventBuilder must have a data buffer of sufficient space to be able to perform its function. The declaration should be done as described in Section 4.3.6. Please note that the maxEventSize parameter for the eventBuilder does not apply to the events coming from the LDCs: here the individual maxEventSize (as declared for each of the LDCs) applies both for configuration purposes and for run-time checks. Only the events created by the eventBuilder itself (e.g. SOR records, SOR files etc...) use the maxEventsSize parameter. ALICE DAQ and ECS manual 202 The eventBuilder ALICE DAQ and ECS manual The event distribution manager 13 This chapter describes the DATE Event Distribution Manager software package (EDM). The event distribution is the process of distributing all the sub-events produced by the same trigger to a single destination machine (GDC). It allows a smooth GDC load balancing and consists of three different processes, two of which are running on each LDC and one is running on a machine, called edmHost. The next pages include a description of the event distribution manager architecture and describe the edmAgent and the edmClient processes in the LDC, as well as the edm process in the edmHost. These processes are needed to activate the event distribution mechanism towards the event builder software running in the GDC machines. ALICE DAQ and ECS manual 13.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 13.2 The EDM architecture . . . . . . . . . . . . . . . . . . . . . . 205 13.3 The synchronization with the run control . . . . . . . . . . . 210 13.4 Information and error reporting . . . . . . . . . . . . . . . . 210 The event distribution manager 204 13.1 Overview The DATE event distribution manager (EDM) is a software package responsible for the distribution of the parallel readout streams coming from the different LDCs and belonging to the same trigger, called sub-events, to a single destination machine, called GDC. On the GDCs, the event builder package is responsible for building the sub-events to form a single event, defined as the collection of data pertaining to the same particle collision. In a DAQ system there may be several GDCs performing event building functions, all connected to the same switching network, supporting TCP/IP protocol, to which all the LDCs and the edmHost machine are also connected. The EDM system allows the distribution of sub-events across the GDCs, all the physics and calibration events being randomly distributed to all GDCs regardless of the trigger class they belong to. Special event types, such as START_OF_RUN, START_OF_RUN_FILES, END_OF_RUN, END_OF_RUN_FILES, START_OF_BURST, END_OF_BURST are always sent to the first GDC which declared itself to the edm process. No sub-events are broadcast to all the GDCs. The actual transfer of sub-events is executed by the recorder process (see Section 10.3) in the LDC, when it is configured to use one or multiple GDCs as output device. The choice of the destination GDC is taken following the instructions given by the edm process. This mechanism permits a smooth GDCs load balancing and makes it possible to adapt at run-time the data flow to the capabilities of the GDCs. It excludes from the system the GDCs which are too overloaded and puts them back as soon as they are free. In a similar way, if a GDC goes down for whatever reason, the data acquisition does not stop, but simply removes this GDC from the list of possible destinations for the sub-events, thus avoiding hang-ups in the data-acquisition system. As soon as the GDC is up and running again, it is re-inserted in the list of possible destinations for the sub-events. The user can choose to perform a run with or without the EDM software by means of the checkbutton labelled EDM in the run control main window. In a system where only one GDC is available, it does not make any sense to activate the EDM software. In case the EDM checkbutton is not selected, the run control does not start any edm related process. In this case, the event distribution algorithm is performed by the readout process (see Section 6.1) running in the LDCs through a simple scheme of sub-event distribution in a round-robin fashion, independent of any distributed knowledge about the GDC status. The destination GDC for each event is set in the field eventGdcId of the event header, based on the total number of GDCs and on the eventId of the sub-event. The dispatch algorithm uses the eventId field which is mapped to an actual GDC by means of a hash table, whose function is to avoid periodicities introduced by non-uniform distribution of the eventId field. The same mechanism, combined with the availability of the GDCs, is also used by the EDM via a library shared between the readout and the EDM packages . The EDM software is running under the control of the runControl system (see Chapter 14) from which it receives the commands to start and stop the run and the parameters needed for a run. The infoLogger functions (see Chapter 11) are used to report messages. ALICE DAQ and ECS manual The EDM architecture 205 13.2 The EDM architecture The EDM architecture in shown in Figure 13.1 GDC LDC readout edmAgent recorder eventBuilder insert gdcId update gdcId use gdcID send GDC status edmFifo EDM edmClient write gdcAvMask Figure 13.1 sends gdcAvMask edm request new gdcAvMask update gdcAvMask The EDM architecture. The EDM software includes the three following processes, launched by the run control: • the edm in the edmHost. • the edmAgent on each LDC. • the edmClient on each LDC. The edm process keeps track of the list of available GDCs. It receives the status of each GDC from the eventBuilder process, which sends the following messages to the edm: • nearly full: this message declares the GDC on which the eventBuilder is running as unavailable, therefore to be removed by the edm from the list of available GDCs. • nearly empty: this message declares the GDC on which the eventBuilder is running as available, therefore to be added by the edm to the list of available GDCs. The threshold in percentage below which the event builder sends the message “nearly empty” and the one above which the eventBuilder sends the message “nearly full” can be chosen for each run via the ebNearlyEmpty and ebNearlyFull run parameters. The edm builds a GDC availability mask, which contains the list of available GDCs, the first and the last eventId for which the mask is valid, respectively called firstEventId and lastEventId. To calculate these two eventIds, the edm uses the edmInterval run parameter, which indicates the size of the validity range for a mask, in units of eventId. The structure of the GDC availability mask is the following: ALICE DAQ and ECS manual The event distribution manager 206 struct edmMask_t { eventIdType firstEventId; eventIdType lastEventId; V32 mask [GDC_AVMASK_NGROUPS]; }; where mask is an array of 32 bit integers, as many as needed to accommodate the highest bit corresponding to the maximum GDC identifier that has been declared in the configuration data base (see Chapter 2). The user has to configure the minimum fraction of GDCs that should be free for the system to continue the run without waiting. This is done by means of the edmQuorumPercent run parameter, which indicates the percentage of GDCs that should be available before sending the GDC availability mask to all the LDCs. With the exception of the first mask, sent at start of run, the edm sends the GDC availability mask to all the LDCs only if the number of available GDCs is bigger than the minimum quorum requested. Since the edm process does not have any knowledge on the eventId of the sub-event being processed in the LDC, it must be instructed by the LDCs when time has come to send a new GDC availability mask. In order to reduce the dead time, the LDC tells the edm to send a new mask in advance with respect to the last event ID for which the previous mask is valid. The LDC signals to the edm that a new GDC availability mask is needed when it reaches the bottom range of the validity for a mask, which is set by the user as edmDelta run parameter. In practice the LDC issues a request for a new mask when it is processing the event whose eventId is equal or higher to lastEventId - edmDelta. All the LDCs tag the sub-events with an increasing event ID, which is the same for all the sub-events belonging to the same trigger and is recorded in the field eventId of the event header. The monotony for the event ID is checked for all sub-events of type PHYSICS_EVENT: each event ID must be higher than the event ID of the previous sub-event. If this is not the case, an error message is issued to the infoLogger and the readout process asks to stop the run. In order to avoid TCP/IP socket connections in the LDC processes responsible for the main data flow of the sub-events, the software in the LDC has been organized in two separate processes: the edmClient and the edmAgent. The edmClient is responsible for the TCP/IP communication with the edm. It receives the GDC availability mask from the edm and it sends to the edm the request to get a new GDC availability mask. The edmAgent process running in the LDC takes a sub-event from the readout FIFO (where it has been inserted by the readout process), reads the GDC availability mask from the edm FIFO located in the shared control region in the LDC and inserts in the header of the physics sub-event the destination GDC. Then it passes the sub-event on to the recorder process for the actual dispatching of it to the eventBuilder process on the destination GDC. Only event descriptors are read and passed on: there is no memory-to-memory copy involved. The decision on the destination GDC is taken by each LDC independently from each other, using data-driven algorithm, based on the eventId field of the sub-event header. The algorithm forbids sending sub-events to all the GDCs declared as unavailable by the eventBuilder during the validity range of the GDC availability mask. ALICE DAQ and ECS manual The EDM architecture 207 The communication between the edmClient and the edmAgent process running on the same LDC happens through flags in the shared control region and the edm FIFO. 13.2.1 The edm process The edm process is responsible for creating and maintaining the GDC availability mask, as well as for sending it to all the LDCs participating in the run. When the run is starting, the edm process waits for connection declarations. LDCs can only connect once, because the run can’t continue if one LDC disconnects or breaks down, while the GDCs can connect and disconnect at any time during the run. When receiving the first connection request from an LDC or GDC, the edm negotiates some socket parameters for the connections, adds the GDC to the list of available GDCs and sets the first validity range of event IDs in the GDC availability mask. Once all the LDCs and all the GDCs are connected, the edm begins to poll as many channels as the number of GDCs and LDCs which participate in the run with a timeout, specified by the user as edmPollTimeOut run parameter (expressed in milliseconds). This allows the edm to periodically check for the arrival of end of run or abort run commands. The edm updates the GDC availability mask when: • a GDC connects or sends the nearlyEmpy message: the edm adds it to the list of available GDCs. • a GDC disconnects or sends the nearlyFull message: the edm removes it from the list of available GDCs. Before sending the GDC availability mask, the firstEventId and the lastEventId fields of the structure of type edmMask_t are updated as follows: • firstEventId = lastUpperBoundSent + oneEventId wherelastUpperBoundSent is the lastEventId field of the last mask sent, and oneEventId is 1 event number (nbInRun) or 1 bunch crossing depending on the mode of running (i.e. on the colliderMode run parameter). • lastEventId = max (firstEventId, maxWakeUpId) + edmInterval wherefirstEventId is the firstEventId field, calculated above, and maxWakeUpId is the highest event ID for which a request for a new mask has been received. Each request for a new mask is accompanied by the event ID of the sub-event at which the LDC issuing the request discovers that it needs a new mask. With the exception of the first mask, which is sent when all the machines participating in the run have connected to the edm, the edm sends the GDC availability mask when: • a wakeUp message is received from LDC(s) accompanied by the event ID for which the request has been issued. • a GDC connects and there was already a request for mask pending and the quorum, not reached before, is now reached. • when a GDC sends nearlyEmpty and there was already a request for mask ALICE DAQ and ECS manual The event distribution manager 208 pending and the quorum, not reached before, is now reached. After sending the GDC availability mask, the following variables are saved in the shared control region, so that they can be displayed by the run control status display: • lastThresholdSent = lastEventId - edmDelta • lastUpperBoundSent = lastEventId In order to avoid sending the mask for multiple wakeUp messages received by different LDCs, and therefore increasing the network traffic, the actual sending of the mask happens only if the event ID for which the request is made is higher or equal than the last threshold sent, i.e. if the requested mask has not already been sent. The edm asks to stop the run if not all the LDCs are connected; this check is made when the run is starting. 13.2.2 The edmClient process In the initialization phase, the edmClient process connects to the edm, after getting the port for the connection from the environment variable DATE_SOCKET_EDM, negotiates the socket parameters for the connection, and declares itself to the edm. It then polls the input channel to receive the GDC availability mask from the edm. After having performed some checks on its validity, it writes the GDC availability mask in the LDC shared control region. It periodically checks on one side for run control commands and on the other side for the value of the flag wakeUpRequestFlag in the LDC shared control region to know whether it has to send a request for a new mask to the edm. The possible values of the wakeUpRequestFlag are: • EDM_REQUEST_FLAG_REQ: set by the edmAgent when a new mask is needed • EDM_REQUEST_FLAG_SENT: set by the edmClient after sending the request for a new mask. Each request for a new mask is accompanied by the event ID for which the request is issued. This allows the edm to discard multiple requests of GDC availability mask for the same validity range coming from several LDCs, which may happen since not all the LDCs participate in all the events. 13.2.3 The edmAgent process In the initialization phase, the edmAgent reads the first GDC availability mask from the edmFifo in the control region. When the run is started, it takes every sub-event descriptor from the readout process and performs some actions depending on the event ID. There are three main cases: 1. The event ID is higher than the lastEventId field of the GDC availability mask: ALICE DAQ and ECS manual The EDM architecture 209 • the edmAgent tries to get the new GDC availability mask from the edmFifo. • if a new mask is available in the emdFifo it uses it to set the destination GDC and calculates the threshold of the current mask as currentThreshold = lastEventId field - edmDelta. • if a new mask is not available in the edmFifo, the edmAgent checks if the request for a new mask is already pending, in which case it waits for the new mask to be written in the edmFifo by the edmClient. The waiting time expressed in microseconds as recMaskSleepTime. • if there is no pending request for a new mask, the edmAgent checks whether the LDC on which it is running is the one which has to issue the request. In order to avoid that all the LDCs participating in the same trigger issue a request for a new GDC availability mask, the following algorithm has been implemented: only the LDC whose identifier is the lowest identifier involved in the event instructs the edmClient to issue the request. • in case the request for a new mask is issued by setting the wakeUpRequestFlag to EDM_REQUEST_FLAG_REQ in the shared control region, in such a way that the edmClient process running in the same LDC can do the actual request to the edm via the socket library. • when the mask is available, it fills the eventGdcId field of the header with the destination GDC, returned by the distribution algorithm. 2. The event ID is between the current threshold and the lastEventId field of the GDC availability mask: • if there is already a pending request, simply fills the eventGdcId field of the header with the destination GDC, returned by the distribution algorithm. • if there is no request for a new mask pending and no mask is available in the edmFifo, the edmAgent checks whether the LDC on which it is running is the one which has to issue the request; given that is the case it issues the request for a new mask just as in the previous case. 3. The event ID is smaller than the current threshold, just fill the eventGdcId field of the header with the destination GDC, returned by the distribution algorithm. The cycle is repeated until the run is declared as stopped and either the edm closes its channel or an abort (quick exit) sequence is started. The edmAgent is performing various checks on the GDC identifiers of the GDC availability mask (for example if the GDC identifiers are compatible with the ones declared in the configuration database) and, in case of error, asks to stop the run. ALICE DAQ and ECS manual The event distribution manager 210 13.3 The synchronization with the run control The EDM software is running under the control of the runControl system. The run is declared as started only after the completion of the following sequence of operations: • all the GDCs and the LDCs declare themselves to the edm. • the edm sends the first GDC availability mask to all the edmClient processes running on the LDCs. • the edmClient writes it into the edmFifo in all the LDCs. • the edmAgent reads it from the edmFifo and sets it as the current GDC availability mask in all the LDCs. 13.4 Information and error reporting The EDM uses the DATE infoLogger package (see Chapter 11) to report statistics, information and error messages. ALICE DAQ and ECS manual The runControl 14 This chapter describes the architecture of the runControl system, its various components, and their interactions. 14.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 14.2 Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 14.3 The runControl process . . . . . . . . . . . . . . . . . . . . . 213 14.4 The runControl interface . . . . . . . . . . . . . . . . . . . . 216 14.5 The runControl Human Interface . . . . . . . . . . . . . . . 216 14.6 The Logic Engine . . . . . . . . . . . . . . . . . . . . . . . . . 216 14.7 The rcServers . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 14.8 The RCS interface . . . . . . . . . . . . . . . . . . . . . . . . 218 14.9 Run parameters . . . . . . . . . . . . . . . . . . . . . . . . . 218 14.10 Run-time variables . . . . . . . . . . . . . . . . . . . . . . . . 224 14.11 Control of the log messages . . . . . . . . . . . . . . . . . . . 228 14.12 Log Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 ALICE DAQ and ECS manual The runControl 212 14.1 Introduction Within a DAQ system, several data acquisitions can be performed at the same time: this is the case, for example, of several detectors independently collecting calibration data. Every data acquisition requires a configuration and the definition of parameters and options. Moreover it is performed by several processes that must be started and stopped at the right moment on many machines. The runControl system handles the configuration and synchronization issues. It is based on Finite State Machines and it uses packages external to DATE: DIM [3], a Distributed Information Manager and SMI++ [4] are, in particular, heavily used. 14.2 Architecture Every data acquisition, performed for one single detector or a group of detectors defined by the Experiment Control System, is controlled by a runControl process that steers the data acquisition according to operator commands. Several runControl processes with different names can run at the same time and control different data acquisitions. Every runControl process has a runControl interface based on Finite State Machines. This interface receives all the commands sent to the runControl process and rejects those incompatible with the current status of the process. The interface also guarantees that, at any time, the source of commands is unique. It may be a given runControl Human Interface or a component of the Experiment Control System (ECS), described in the second part of this manual. For the same runControl process, many runControl Human Interfaces can coexist, but at most one at a time can have the mastership of the runControl process: this last one can be used to send active control commands, whereas the others can only be used to get information. When the authorized source of commands is the ECS, none of the runControl Human Interfaces can send active commands: this possibility is restricted to the ECS. When the list of machines to be used for a given data acquisition is defined, the runControl process spawns a Logic Engine process. The Logic Engine contains all the logic about starting and stopping the different processes on the different machines. The Logic Engine translates operator commands into sequences of commands that are then sent, in parallel, to the remote machines. On every remote machine a process, called rcServer, can start and stop processes according to commands received from the Logic Engines. The rcServer also performs some local error handling and returns various counters and information to the other DAQ machines. An rcServer can be used, at different times, by different runControl processes and can therefore receive commands from different Logic Engines in the context of different data acquisitions. An interface, common to all the rcServers, guarantees that every rcServer is used at any time by at most one runControl process and receives commands ALICE DAQ and ECS manual The runControl process 213 from one Logic Engine in the context of one and only one data acquisition. This interface is called RCS interface. The architecture of the runControl system is shown in Figure 14.1. Figure 14.1 The runControl system architecture. 14.3 The runControl process At startup time, a runControl process with name defined by the symbol RCNAME performs the following operations: • reads the detectors and roles DATE database, applying the following restriction: • if ${RCNAME} does not start with the string "ALL", then ${RCNAME} is treated as a detector name and the runControl process loads from the DATE database the information about that detector only. For example, if the name assigned to the runControl process is TPC, then the runControl process loads information about the TPC and ignores the other detectors. • if ${RCNAME} starts with the string "ALL", then ${RCNAME} is treated as the potential name of an Experiment Control System partition prefixed with ALL. In this case, if an ECS partition with that name exists, then the runControl process loads informations about the detectors belonging to the partition and ignores the other detectors. If an ECS partition with that name does not exist, then the runControl process loads information about all the detectors. For example, the name assigned to the runControl process is “ALLITS”: if an ECS partition named ITS exists, then the runControl process loads information about the detectors of the ITS partition; if an ECS partition named ITS does not exist, then the runControl process loads information about all the detectors. ALICE DAQ and ECS manual The runControl 214 • in both cases (i.e ${RCNAME} starting or not starting with the string “ALL”), if the DATE database contains information about detectors named “HLT” and “TRIGGER”, then the information about these detectors is loaded in memory knowing that HLT and TRIGGER play special roles within the data acquisition. • sets the current DAQ configuration to an empty one, sets all the run parameters to their DATE hardcoded defaults, and resets all the run options. • if a default DAQ configuration has been saved in the DATE database, then the default DAQ configuration is loaded and replaces the empty one. The name of the default DAQ configuration in the MySQL database is DEFAULT. • if a set of customized, default run parameters has been saved in the DATE database, then the set of customized, default run parameters is used to overwrite the DATE hardcoded defaults. The name of this set in the MySQL database is DEFAULT . • if a set of customized, default run options has been saved in the DATE database, then the run options are set accordingly. The name of this set of run options in the MySQL database is DEFAULT . Having completed the above sequence of operations, the runControl process sets its status to DISCONNECTED and waits for operator commands. The main commands are the following: • CONNECT(name): loads from the DATE database the DAQ configuration with the given name and generates a new current configuration. It then creates the Logic Engine with the logic for the machines selected in the new current configuration. If the name of the DAQ configuration is NONE, then the first step is skipped and the runControl process continues with the existing current configuration. If the name of the DAQ configuration is NEW, then the runControl process gets the new DAQ configuration from the runControl Human Interface having the mastership of it (obviously this possibility does not exist when the source of active commands is the ECS). If the CONNECT command completes successfully, the runControl process locks the rcServers referenced by the newly created Logic Engine and sets its status to CONNECTED. • LOCK_PARAMETERS(name): loads from the DATE database a set of run parameters with the given name. If the name is NONE, then the runControl process continues with the current run parameters. If the name is NEW, then the runControl process gets the new run parameters from the runControl Human Interface having the mastership of it (obviously this possibility does not exist when the source of active commands is the ECS). If the LOCK_PARAMETERS command completes successfully, the runControl process sets its status to READY. • START_PROCESSES(name): loads from the DATE database a set of run options with the given name. It then tells the Logic Engine to start the required data-acquisition processes on all the selected machines. If the given name is NONE, then the first step is skipped and the runControl process continues with the current run options. If the name is NEW, then the runControl process gets the run options from the runControl Human Interface having the mastership of it (obviously this possibility does not exist when the source of active commands is the ECS). ALICE DAQ and ECS manual The runControl process 215 If the source of active commands is a runControl Human Interface, the runControl reads the current run number from the DAQ database, increments it and saves it back. If the source of active commands is the ECS, then the run number is defined by the ECS and transmitted to the runControl with the START_PROCESSES command. If the START_PROCESSES command completes successfully (i.e. the runControl gets from the Logic Engine a feedback confirming that all the required processes are running), the runControl process sets its status to STARTED. • START_DATA_TAKING: sends to all the selected machines (via theLogic Engine) the authorization to start the data taking. The runControl process then sets its status to RUNNING. • STOP_DATA_TAKING: tells the Logic Engine to stop the data-acquisition processes on all the selected machines. When all the processes are stopped, the runControl process sets its status to READY. • STOP_PROCESSES: has the same effect as STOP_DATA_TAKING. The only difference is that it can be issued when the actual data taking has not yet been started. • ABORT_PROCESSES: has the same effect as STOP_PROCESSES. The ABORT_PROCESSES command is however stronger than the STOP_PROCESSES command and may actually kill the processes that fail responding. The ABORT_PROCESSES command can be sent from a runControl Human Interface when an error condition has activated the Abort button. • UNLOCK_PARAMETERS: unlocks the run parameters and sets the status of the runControl process to CONNECTED. • DISCONNECT: stops the Logic Engine and unlocks the rcServers referenced by the stopped Logic Engine. The runControl process then sets its status to DISCONNECTED. In addition to the operator commands, the runControl process gets some feedback from the Logic Engine. This feedback allows the handling of requests, such as EndOfRun requests issued by processes running on the remote machines and transmitted by the rcServers to the Logic Engine. The runControl process also fills the eLogbook with information about the different runs (start time, end time, list of detectors). Finally the runControl process acts as a DIM server and provides the following DIM services to the subscribing clients: • ${DAQ_ROOT_DOMAIN_NAME}${RCNAME}_DAQ::${RCNAME}_CONTROL_ MESS : clients subscribing to this service receive the information and error messages issued by the runControl process with name ${RCNAME}. The DAQ_ROOT_DOMAIN_NAME environment variable is defined in the DATE database. • ${DAQ_ROOT_DOMAIN_NAME}${RCNAME}_DAQ::${RCNAME}_CONTROL_ RUNNUMBER : clients subscribing to this service receive the run number being used by the runControl process with name ${RCNAME}. The DAQ_ROOT_DOMAIN_NAME environment variable is defined in the DATE database. ALICE DAQ and ECS manual The runControl 216 • ${DAQ_ROOT_DOMAIN_NAME}${RCNAME}_DAQ::${RCNAME}_CONTROL_ EOR: clients subscribing to this service are notified when the run being controlled by the runControl process with name ${RCNAME}is finished. The DAQ_ROOT_DOMAIN_NAME environment variable is defined in the DATE database. 14.4 The runControl interface Every runControl process has a runControl interface with two main functions: • it rejects commands incompatible with the current status of the runControl process. • it guarantees that, at any time, the source of commands is unique: a given runControl Human Interface or the ECS. The runControl interface is implemented as an SMI domain containing objects required to perform the two main functions described above. It also contains a few other objects associated to minor functions, such as enabling the Abort button in the runControl Human Interfaces or keeping a track of the final state of the last performed data taking. The name of this SMI domain is ${DAQ_ROOT_DOMAIN_NAME}${RCNAME}_DAQ where RCNAME is a variable containing the name assigned to the runControl process at start time and DAQ_ROOT_DOMAIN_NAME is the environment variable is defined in the DATE database. 14.5 The runControl Human Interface The runControl Human Interface may be used to send all the commands described in Section 14.3 to the runControl process. In addition to that, it allows database operations that can be performed without commands to the runControl process. Examples of operations of this second type are the definition of a default DAQ configuration, the creation of default sets of run parameters and run options, the creation of named sets of run parameters and run options to be used during special runs. A detailed description of the runControl Human Interface is given in the ALICE DAQ WIKI. 14.6 The Logic Engine The Logic Engine is based on the list of machines selected to play a role in the data acquisition and therefore it cannot exist as long as the DAQ configuration is ALICE DAQ and ECS manual The rcServers 217 not defined. The Logic Engine is created by the runControl process when it receives the CONNECT command. The Logic Engine receives commands from the runControl process, creates from these commands and from the StartOfRun and EndOfRun logic the sequences of commands to be executed on every remote machine, sends these commands to the rcServers running on the remote machines. It also returns to the runControl process the feedback about requests issued by processes running on the remote machines, such as EndOfRun requests. The Logic Engine could be handled by a single SMI domain. However, in practice it is implemented as a set of SMI domains, where every domain controls a group with a limited number of remote machines. An additional SMI domain coordinates these domains and therefore the groups of remote machines. This implementation allows better usage of CPU resources and, if necessary, the cooperation of more than one PC. The name of the top level domain is ${DAQ_ROOT_DOMAIN_NAME}${RCNAME}_CONTROL where RCNAME is a variable containing the name assigned to the runControl process at start time and DAQ_ROOT_DOMAIN_NAME is the environment variable is defined in the DATE database. The other domains dealing with groups of remote machines are named ${DAQ_ROOT_DOMAIN_NAME}${RCNAME}_CONTROL_1, ${DAQ_ROOT_DOMAIN_NAME}${RCNAME}_CONTROL_2, etc. 14.7 The rcServers An rcServer process must run on all the machines of the DAQ system where the Logic Engines need to start processes. At startup time, every rcServer creates on the local machine a shared memory control region. This region contains various flags and counters and is used as interprocess communication object by the rcServer and its children. The rcServer then waits for a runControl process needing its services. When a runControl process executes a CONNECT command, it creates a Logic Engine, locks the rcServers running on the remote machines that are part of its DAQ configuration, and starts using them. These rcServers provide to the runControl process, to its Logic Engine, and to its runControl Human Interfaces the following services: • start and stop processes according to commands from the Logic Engine. Two types of processes are started and stopped by the rcServer: processes controlled through the shared memory control region and processes that, once started interact with the Logic Engine directly. The DATE processes are processes of the first type: the shared memory control region is used by the rcServer to send commands and parameters to the processes and by the processes to issue requests, such as EndOfRun requests, and to update various flags and counters. Once started, the processes of the second type interact with the Logic Engine and are ignored by the rcServer. Examples of processes of this type are the DDL Data Generators, and some synchronous process required by calibration procedures. ALICE DAQ and ECS manual The runControl 218 • perform local error handling. The rcServer continously checks that the started DATE processes are alive. When it finds a missing process, it issues an EndOfRun request. • handle EndOfRun requests issued by the running processes. • provide the information required by the Status Display as a DIM service. • provide error and information messages as a DIM service. When the runControl process executes the DISCONNECT command, it unlocks the previously locked rcServers. These rcServers start again waiting for a runControl process needing their services. At any time, the crash of an rcServer being used by a runControl process forces the runControl process to execute a DISCONNECT command. 14.8 The RCS interface This interface guarantees that the rcServers receive only valid commands and that every rcServer is used by at most one runControl process in the context of one and only one data acquisition. The RCS interface is implemented as an SMI domain whose name is ${DAQ_ROOT_DOMAIN_NAME}_RCSERVERS. 14.9 Run parameters This section describes the Common and role specific run parameters used by the runControl and by the sprocesses started on the various machines. The Common RunParameters are described in (Table 14.1). Table 14.1 Common RunParameters Name Parameter name Description Default Range Collider mode colliderModeFlag Defines the event ID field of the base event header 1 0,1 1 0,1 • 0 = Fixed target • 1 = Collider mode readout sets the event type attribute field in the event header accordingly Common Data Header Present cdhPresentFlag Common data header in raw data • 0 = Not present • 1 = Present ALICE DAQ and ECS manual Run parameters Table 14.1 219 Common RunParameters Name Parameter name Description Default Range Burst structure in Cole burstPresentFlag When Cole is used 0 0,1 100 >= 0 No. of events in burst simBurstLength • 0 = No burst structure • 1 = Burst mode When Cole is used, defines number of events per burst The LDC, GDC, and EDM RunParameters are described in Table 14.2, Table 14.3, and Table 14.4 Table 14.2 LDC RunParameters Name runParameter name Description Default Range Max. number of sub-events maxEvents Maximum number of sub-events in a run. Zero (0) means no limit. If set, when the LDC hits the limit, an end of run request is issued. 0 >= 0 Max. bytes to record maxBytes Maximum number of bytes to be collected in a run. Zero (0) means no limit. If set, when the LDC hits the limit, an end of run request is issued. 0 >= 0 Max. number of bursts maxBursts Maximum number of bursts to be collected in a run. Zero (0) means no limit. If set, when the LDC hits the limit, an end of run request is issued. 0 >= 0 Max. number of errors maxErrors Maximum number of allowed non fatal errors. 10 >= 0 Max. event size maxEventSize It indicates the maximum size in bytes of a sub-event. 2 000 000 >= 0 Max. file size maxFileSize Each run may be recorded on multiple files. This is the maximum size of each file in bytes. 0 0-2.e9 • Zero (0) means no limit. • A positive value should be used only when the RecordingDevice is a disk file. This parameter is ignored if the Recording disabled run option is selected or when the recording is done only in the GDCs. ALICE DAQ and ECS manual The runControl 220 Table 14.2 LDC RunParameters Name runParameter name Description Default Range Logging level logLevel It controls the generation of messages by all the DATE processes running on an LDC machine. The possible values are described in Section 14.11. 10 0 - 100 Local Recording device localRecordingDevice The setting must be done according to Section 13.2. This parameter is ignored if the Recording disabled run option is selected or when the recording is done only in the GDCs. /dev/null SOR in Separate File sorSeparateFile If set to 1 the SOR event is stored in a separe file 0 0-1 Paged data flag pageDataFlag Defines the event data structure. Possible values: 1 0,1 • Streamlined events (0) (see Section 5.3.1) • Paged events (1) (see Section 5.3.2) Monitor enable flag monitorEnableFlag Switch to enable and disable the possibility of monitoring. It may introduce a penalty on the data rate performances. Zero (0) means disabled, one (1) enabled. 1 0,1 LDC socket size ldcSocketSize Defines the size of the socket used by the recording library (described in Section 20.4). Possible values: 0 Integer 30 0 - 600 Max. time for SOR/EOR phases phaseTimeoutLimit • 0: system default • >0: socket size set to given value • <0: socket size = MIN (-ldcSocketSize, maxEventSize) Maximum duration (in seconds) of any phase of the start and stop procedures executed on the LDC. The run is aborted if any LDC process does not complete the phase in due time. • If zero (0) a value of 30 s is used by default. ALICE DAQ and ECS manual Run parameters Table 14.2 221 LDC RunParameters Name runParameter name Description Default Range Recorder sleep time recorderSleepTime The recorder goes to sleep while events are arriving to give priority to readout. The time interval (expressed in microseconds) is picked up from this parameter. If zero (0) or negative a value of 10 microseconds is used. 0 Integer Completion sleep time checkCompletionSleepTime Sleeptime (in milliseconds) when checking for I/O completion in recorder. 1 1 - 1000 Rec sleep for mask recMaskSleepTime Polling loop interval (in microseconds), when waiting for edm masks. If zero (0) or negative a value of 500 000 microseconds is used. 0 Integer Max. # of sleeps for mask recMaskSleepCntLimit Maximum number of consecutive polling loops when waiting for an edm mask, before aborting. 500 >1 startOfData/ endOfData event enabled sodEodEnabled START_OF_DATA/END_OF_DATA 1 0, 2 startOfData timeout startOfDataTimeout Timeout when waiting for START_OF_DATA events 10 >= 1 endOfData timeout endOfDataTimeout Timeout when waiting for END_OF_DATA events 10 >= 1 EDM agent enable edmEnabled EDM agent flag 1 Locked HLT agent enable hltEnabled HLT agent flag 1 Locked Real Hostname realHostname IP hostname Table 14.3 event flag. If set to 0 the SOD event is disabled. If set to 1 the SOD event is enabled. if set to 2 the SOD event is enabled and all the events received before it are discarded. Locked GDC RunParameters Name runPramater name Description Default Range Max. bytes to record maxBytes Maximum number of bytes to be collected in a run. Zero (0) means no limit. If set, when the GDC hits the limit, an end of run request is issued. 0 >= 0 Max. number of errors maxErrors Maximum number of allowed non fatal errors. 10 >= 0 ALICE DAQ and ECS manual The runControl 222 Table 14.3 GDC RunParameters Name runPramater name Description Default Range Max. SOR/EOR file size maxEventSize It indicates the maximum size in bytes of SOR/EOR file event. Please note that this parameter applies to SOR/EOR file events only. 4 000 000 >= 0 Max. file size maxFileSize Each run may be recorded on multiple files. This is the maximum size of each file in bytes. 0 0-2.e9 0 - 100 • Zero (0) means no limit. • A positive value should be used only when the recordingDevice is a disk file. This parameter is ignored if the Recording disabled or the Recording on PDS run option is selected. Logging level logLevel It controls the generation of messages by all the DATE processes running on a GDC machine. The possible values are described in Section 14.11. 10 Local Recording device localRecordingDevice The setting must be done according to Section 13.2. This parameter is ignored if the Recording disabled or the Recording on PDS run option is selected. /dev/null SOR in Separate File sorSeparateFile If set to 1 the SOR event is stored in a separe file 0 0-1 Monitor enable flag monitorEnabledFlag Switch to enable and disable the possibility of monitoring. It may introduce a penalty on the data rate. Zero (0) means disabled, one (1) enabled. 1 0,1 Max. time for SOR/EOR phases phaseTimeoutLimit Maximum duration (in seconds) of any phase of the start and stop procedures executed on the GDC. The run is aborted if any GDC process does not complete the phase in due time. 30 0 - 600 • If zero (0) a value of 30 s is used by default. ALICE DAQ and ECS manual Run parameters Table 14.3 223 GDC RunParameters Name runPramater name Description Default Range Nearly empty ebNearlyEmpty Threshold (in %) below which the eventBuilder changes its state from ebNearlyFull to ebNearlyEmpty 10 0 - 100 Nearly Full ebNearlyFull Threshold (in %) above which the eventBuilder changes its state from ebNearlyEmpty to ebNearlyFull 90 0 - 100 Max. number of events maxEvents This parameter is kept for future development and is not used in the present version. 0 Locked EDM agent enable edmEnabled EDM agent flag 1 Locked HLT agent enable hltEnabled HLT agent flag 1 Locked Real Hostname realHostname IP hostname Table 14.4 Locked EDM RunParameters Name runPrameter name Description Default Range Max. number of errors maxErrors Maximum number of allowed non fatal errors. 1 >= 0 Logging level logLevel It controls the generation of messages by all the DATE processes running on an EDM machine. The possible values are described in Section 14.11 10 0 - 100 Max. time for SOR/EOR phases phaseTimeoutLimit Maximum duration (in seconds) of any phase of the start and stop procedures executed on the EDM. The run is aborted if any EDM process does not complete the phase in due time. 30 0 - 600 • If zero (0) a value of 30 s is used by default. Poll time out edmPollTimeOut Sleep time (in milliseconds) when polling the input channels. 100 >= 0 Quorum percent edmQuorumPercent Percentage of GDCs required to be available before new masks are sent. 50 0 - 100 edm delta edmDelta Size of the bottom range of validity for a mask, where a request for a new mask must be issued (in units of eventId). 200 >= 0 ALICE DAQ and ECS manual The runControl 224 Table 14.4 EDM RunParameters Name runPrameter name Description Default Range GDCs validity mask interval edmInterval Size of range validity for a mask (in units of eventId). 5000 >= 0 Real Hostname realHostname IP hostname Locked 14.10 Run-time variables The run-time variables that can be seen via the runControl are described in Table 14.5, Table 14.6, and Table 14.7. Table 14.5 LDC run-time variables Name Description Number of equipments Number of equipments in the 1st level vector. Set by the routine ArmHw. Number of triggers Number of triggers. Incremented by readout at each physics event (not for the other types of events) after calling ReadEvent, and then stored in eventHeader.triggerNb. Current Trigger rate Triggers per second since the last Status Display update. Computed by rcServer. Average Trigger rate Triggers per second since the SOR. Computed by rcServer. Number of sub-events Number of processed sub-events. Incremented by readout for all types of events before calling ReadEvent. Sub-event rate Sub-events per second since the last Status Display update. Computed by rcServer. Sub-events recorded Number of sub-events recorded by the LDC on disk, or sent to the GDCs. Set by recorder. Sub-event recorded rate Sub-events recorded per second since the last Status Display update. Computed by rcServer. Bytes injected Number of KB received by the LDC. Set by readout. Byte injected rate Number of bytes injected per second since the last Status Display update. Computed by rcServer. Bytes recorded Number of KB recorded by the LDC on disk or sent to the GDCs. Set by recorder. ALICE DAQ and ECS manual Run-time variables Table 14.5 225 LDC run-time variables Name Description Byte recorded rate Bytes recorded per second since the last Status Display update. Computed by rcServer. Bytes in buffer Difference between Bytes injected and Bytes recorded. Computed by rcServer. Nb evts w/o HLT decision Number of events waiting for HLT decision. inBurst flag This variable is set to one (1) during the burst and is reset to zero (0) otherwise. Set by readout, only when the burstPresentFlag Common RunParameter is set to 1. Recorder pid PID of the recorder process. Number of bursts Set by readout at each event, after calling ReadEvent, to the value found in eventHeader.burstNb. Number of sub-events in burst Set by readout at each event, after calling ReadEvent, to the value found in eventHeader.nbInBurst. recMaskSleepLoopCnt Copy of internal counter of consecutive sleeps waiting for edm mask. Set by edmAgent. recMaskSleepCnt Number of events for which an edm mask was not available. Set by edmAgent. Nb. of Readout FIFO full Counts how many times readout has been waiting for space of the readoutReady FIFO. Set by readout. edmClient SOR/EOR phases A number indicating the phase of the start or stop procedure executed by the edmClient process. Zero (0) means completion. wakeUpReqFlag Flag of communication between edmAgent and edmClient to indicate the status of a request for an edm mask. Used by edmAgent and edmClient. EventID for wakeup request Event identifier for which a request of wake up has been triggered. Used by edm, edmClient, edmAgent. edmMaskValidityRange Validity range for the edm mask. Set by edm. edmMask edm mask lastEventId EventId monotonically increasing for all the special event types. Set in readout. readout SOR/EOR phases A number indicating the phase of the start or stop procedure executed by the readout process. Zero (0) means completion. ALICE DAQ and ECS manual The runControl 226 Table 14.5 LDC run-time variables Name Description edmAgent SOR/EOR phases A number indicating the phase of the start or stop procedure executed by the edmAgent process. Zero (0) means completion. hltAgent SOR/EOR phases A number indicating the phase of the start or stop procedure executed by the hltAgent process. Zero (0) means completion. recorder SOR/EOR phases A number indicating the phase of the start or stop procedure executed by the recorder process. Zero (0) means completion. Machine identifier Machine identifier as defined in the DATE roles database. Spare variable 1 - 10 Reserved for DATE developers. Spare String 1 - 5 Reserved for DATE developers. Site Spec 1 - 5 Reserved for DATE developers. Site Spec String 1-2 Reserved for DATE developers. Table 14.6 GDC run-time variables Name Description Number of sub-events Number of processed sub-events. Incremented by eventBuilder Sub-event rate Sub-events per second since the last Status Display update. Computed by rcServer. Events recorded Number of events recorded on disk by the GDC. Set by eventBuilder. Event recorded rate Events recorded per second since the last Status Display update. Computed by rcServer. Bytes recorded Number of KB recorded on disk by the GDC. Set by eventBuilder. Byte recorded rate Bytes recorded per second since the last Status Display update. Computed by rcServer. File count Number of files recorded in the current run. Set by the eventBuilder. Nb. times nearly full Number of times the eventBuilder has declared itself nearly full. Set by the eventBuilder (internal counter). ALICE DAQ and ECS manual Run-time variables Table 14.6 227 GDC run-time variables Name Description Nb. times nearly empty Number of times the eventBuilder has declared itself nearly empty. Set by the eventBuilder (internal counter). Nb. of incomplete events Number of incomplete events. Set by the eventBuilder (internal counter). eventBuilder pid PID of the eventBuilder process. Status of EVB vs. EDM String describing the status of the eventBuilder sent to the edm (edm active only). Set by eventBuilder. eventBuilder SOR/EOR phases A number indicating the phase of the start or stop procedure executed by the eventBuilder process. Zero (0) means completion. Machine identifier Machine identifier as defined in the DATE roles database. Spare variable 1 - 10 Reserved for DATE developers. Spare String 1 - 5 Reserved for DATE developers. Site Spec 1 - 5 Reserved for DATE developers. Site Spec String 1 - 2 Reserved for DATE developers. Table 14.7 EDM run-time variables Name Description FIFO size Size of edm mask FIFO. EDM FIFO full count Number of times the edm mask FIFO is full. Set by edmClient. EventID for wakeup request Event identifier for which a request of wake up has been triggered. Used by edm, edmClient, edmAgent. maxWakeUpId The highest event identifier for which a request of wake up has been triggered. Set by edm. Last validity range threshold Last validity range threshold sent. Set by edm. Last validity range upper bound Last validity range upper bound sent. Set by edm. edmMaskValidityRange edm mask validity range edmMask edm mask Unavailable GDCs List of GDCs not present in the edm mask. Computed by rcServer. ALICE DAQ and ECS manual The runControl 228 Table 14.7 EDM run-time variables Name Description edm SOR/EOR phases A number indicating the phase of the start or stop procedure executed by the edm process. Zero (0) means completion. Machine identifier Machine identifier as defined in the DATE roles database. Spare variable 1 - 10 Reserved for DATE developers. Spare String 1 - 5 Reserved for DATE developers. Site Spec 1 - 5 Reserved for DATE developers. Site Spec String 1 - 2 Reserved for DATE developers. 14.11 Control of the log messages All DATE packages create log messages for information, error reporting and statistics purposes. It is possible to control, on a site-by-site and role by role basis, the amount of information generated during DATE operation. The run parameter loglevel is used to check the amount and the details of the messages sent via the DATE infoLogger. The following conventions are commonly followed by all the DATE processes: • loglevel equal to 0: no statistics and no logging (even in case of error). • logLevel above 0: run statistics are appended to the logBook stream and error messages are sent to the runLog and package-specific streams. • loglevel between 10 and 19 (LOG_NORMAL_TH and LOG_DETAILED_TH-1): normal level of logging. Run statistics, errors and information messages are created. This is the level which should be used in normal conditions and during data taking. logLevel of 20 (LOG_DETAILED_TH) and above: level of logging used for the development of the DATE software. A lot of messages are produced and in particular, some messages are produced for each event. This level has a dramatic effect on the performance and should therefore not be used during normal data taking. 14.12 Log Files Error and information messages are logged by all the components of the runControl system using the infoLogger (Section 11.5). However, the stdout and stderr streams of the processes are stored as temporary files in the ${DATE_SITE_TMP} directory. These files are usually read only by the DATE developers and are overwritten when the associated processes are restarted. In ALICE DAQ and ECS manual Log Files 229 some cases, the previous version of the log file is renamed to allow more debugging. All these temporary log files have a .stdout postfix and a name that follows the conventions listed below: • log files created by SMI domains have a name in upper case characters, equal to the name of the SMI domain. • log files created by runControl processes have a name that is the concatenation of the runControl process name and the fixed string '_control'. The name is in lower case characters. • log files created by runControl Human Interfaces have a name that is the concatenation of the runControl process name, the fixed string '_controlhi', and the process identifier (PID) of the runControl Human Interface. The name is in lower case characters. • log files created by rcServers have a name that is equal to the symbolic name assigned in the DATE database to the component of the DAQ system where the rcServer runs. If the symbolic name of an LDC is pcxxldc, then the rcServer running on it creates a 'pcxxldc.stdout' log file. The name is in lower case characters. ALICE DAQ and ECS manual 230 The runControl ALICE DAQ and ECS manual The physmem package 15 This chapter describes the DATE package physmem. The package contains a Linux kernel module to support shared access to a large contiguous block of non-paged physical memory which has been reserved at boot time. The installation procedure, a description of the utility programs, and some information about the kernel module implementation are presented. ALICE DAQ and ECS manual 15.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 15.2 Installation of the physmem driver . . . . . . . . . . . . . . 232 15.3 Utility programs for physmem . . . . . . . . . . . . . . . . . 237 15.4 Internals of the physmem driver . . . . . . . . . . . . . . . . 240 15.5 Physmem application library . . . . . . . . . . . . . . . . . . 244 The physmem package 232 15.1 Introduction The supporting mechanism of a memory bank required by each DATE host can be either of type IPC, PHYSMEM, BIGPHYS, or HEAP (see Chapter 4). If a memory bank needs to be large and non-paged, the best choice on the Linux operating system is PHYSMEM. Furthermore, a memory bank of type PHYSMEM allows to obtain the physical base address of this memory, which is mandatory for the RORC readout software (see Chapter 7). Since a Linux operating system does not provide a mechanism to allocate and deallocate large amounts of non-paged memory, this chapter describes the physmem approach that was developed for DATE. It exploits the feature that a separate block of memory can be reserved at boot time, which is not seen by Linux. An additional kernel module driver is needed to access this physmem memory. In comparison to the bigphysarea approach, which is based on the same principles, the physmem approach does not rely on specific Linux kernel patches. The DATE package physmem provides all the necessary software to make the physmem memory available. This package is self-contained and resides in the ${DATE_ROOT}/physmem directory. The following documentation describes the installation procedure of the physmem kernel module (see Section 15.2), the usage of the utility programs (see Section 15.3), and gives some information about the physmem kernel module internals (see Section 15.4). 15.2 Installation of the physmem driver This section provides a guide to install and configure the software in order to access the physmem memory on a machine that runs a Linux operating system with a 2.4 and 2.6 kernels. The installation of the physmem driver is part of the basic DATE installation. 15.2.1 Configuring the boot loader As a first step, the physical memory of a machine must be partitioned into a region for Linux and a region for physmem. This can be achieved during the boot process of a Linux operating system by passing the mem parameter to the kernel. This parameter defines the size of the Linux memory region, whereas the remaining memory can be used for physmem. For example, if a machine has 4.5 GB of memory installed and the mem parameter is set to 1024M, then the Linux memory region encloses 1 GB and the physmem memory region gets the remaining 3.5 GB. However, the Linux operating system does not always set precisely this memory boundary as requested by the mem parameter. The boot loader of a Linux operating system can be used to pass the mem parameter to the kernel. In case of boot loader GRUB, Listing 15.1 shows an example of its configuration file /etc/grub.conf to trim the memory region for Linux to 1 GB (line 3). In case of boot loader LILO, Listing 15.2 shows an example of the ALICE DAQ and ECS manual Installation of the physmem driver 233 configuration file /etc/lilo.conf to trim the memory region for Linux to 1 GB (line 4). Listing 15.1 Example of GRUB to trim the Linux memory region to 1 GB 1: title Red Hat Linux (2.4.21-4) 2: root (hd0,0) 3: kernel /boot/vmlinuz-2.4.21-4 ro root=/dev/hda1 mem=1024M 4: initrd /boot/initrd-2.4.21-4.img Listing 15.2 Example of LILO to trim the Linux memory region to 1 GB 1: image=/boot/vmlinuz-2.4.21-4 2: label=linux2421-4 3: initrd=/boot/initrd-2.4.21-4.img 4: append="mem=1024M" 5: read-only 6: root=/dev/hda1 After rebooting the machine, the memory region for Linux will be reduced to the value given in the respective configuration file. This can be verified by issuing the following command: > cat /proc/meminfo 15.2.2 Setting up the physmem driver The physmem memory is represented by the two device files /dev/physmem0 and /dev/physmem1, both with major device number 122. Device file /dev/physmem0 with minor device number 0 is exclusively used by the RORC utility programs (see Chapter 20) and the assigned physmem memory has a default size of 96 MB. Device file /dev/physmem1 with minor device number 1 is used for a memory bank of type PHYSMEM. The assigned memory for each physmem device is a separate block of the physical memory. The physmem software package is installed together with the DATE kit. However, if someone wants to use the stand-alone DDL and RORC software (described in Chapter 20) he has to install the physmem driver and library. For a stand-alone installation, follow the given procedure below: • The header, source, object and executable files of the physmem driver, library and test programs are in the common AFS area: /afs/cern.ch/alice/daq/ddl/physmem/ This directory contains the different versions of the software as separate sub-directories. It also contains the different versions in compressed formats. • The compressed file names show the version number and the time of archiving. Always use the latest date of a given version. The latest distributed version can be found on the following Web page: http://cern.ch/ddl/rorc_support.html • Copy the compressed file on a local directory and uncompress it. Use the ALICE DAQ and ECS manual The physmem package 234 following command for extracting the files: gtar -xvzf physmem_vx.y_year.month.day.tgz vx.y/ where x.y is the version number of the package. • To do the driver, library and utility compilation, type the following commands: cd vx.y make -f Makefile clean make -f Makefile • To create the device files, and prepare the driver to be loaded at boot time type as root the following commands make -f Makefile dev This command creates the device files, loads the driver module to the appropriate place and edits the /etc/rc.modules file for automatic loading of the module at boot time. If one wants to give a parameter to the driver, she/he can modify this file. • To load the physmem driver kernel module without booting type as root: make -f Makefile load • In case an older version of the physmem driver is already loaded, then type as user root: make -f Makefile reload • To check if the driver is loaded type: ./check_driver. This script shows if the driver is loaded (calling /sbin/lsmod) and the driver messages during load time (calling dmesg). Listing 15.3 gives an example dialog in which the memory regions for physmem and for Linux are 3220504576 bytes (786256 pages) and 1073741827 bytes (262144 pages) respectively. In this example the assigned memory to /dev/physmem0 is 100663296 bytes (the default 96 MB) starting at physical address 0x40000000, and the assigned memory to /dev/physmem1 is 3119841280 bytes (2975 MB), starting at physical address 0x46000000. There is a “gap” in the physmem1 memory between the physical addresses 0xcff50000 to 0x100000000, which is assigned to other devices by the BIOS, dividing physmem1 memory into 2 zones. ALICE DAQ and ECS manual Installation of the physmem driver Listing 15.3 235 Example to list the physmem physical base addresses and sizes 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: > ./check_driver lsmod: Module physmem Size 44680 Used by 0 dmesg: physmem: loading driver version 4.13 with physmemsize=0 and physmemsize0=0 (all in MB) physmem: linux version code = 2.6.18 (SLC 5.4), instruction set = 64 bit physmem: physical base address: 0x40000000 physmem: physmem total size: 3220504576 bytes (786256 pages) physmem: Linux total size: 1073741824 bytes (262144 pages) physmem: device physmem0 starts at 0x40000000 with 100663296 bytes (96 MB) physmem: device physmem0 uses 1 mem zone(s) physmem: physmemMapZones [ device ] [ zone ] [ physZone ] physmem: physmemMapZones [ 0 ] [ 0 ] [ 0 ] physmem: device physmem1 starts at 0x46000000 with 3119841280 bytes (2975 MB) physmem: device physmem1 uses 2 mem zone(s) physmem: physmemMapZones [ device ] [ zone ] [ physZone ] physmem: physmemMapZones [ 1 ] [ 0 ] [ 0 ] physmem: physmemMapZones [ 1 ] [ 1 ] [ 1 ] physmem: physZone0: start 0x40000000, end 0xcff50000 physmem: physZone1: start 0x100000000, end 0x130000000 In the case of a 64 bit architecture, if the memory size is bigger then 4 GB, the BIOS can assign some memory area to devices like video memory just below of the 4 GB limit. This part of the memory can not be used by the physmem devices. In this way one of the physmem devices (generally /dev/physmem1) is divided into two zones. Both zones have continuous physical addresses but there is a “gap” in the physical addresses between the two zones (even if the mapped user addresses are continuous between the two zones). In the previous Listing 15.3 we can see an example of the zones. The programs using physmem devices should be careful not to use the memory between the zones. The routines in the physmem library give assistance to this problem: a routine (physmemBlockIsNotContinuous(), see its description in Section 15.5) checks whether a given memory area contains unusable addresses. During the loading process of the physmem kernel module the entire memory beyond the region of Linux is claimed as physmem memory. However, a specific size of the total memory region for physmem can be enforced by passing the parameter physmemsize to the physmem kernel module. This parameter is optional and specifies the total size of the memory region for physmem in MB. If this parameter is 0 or not present, the whole memory beyond the region of Linux will be claimed. As an example, the following commands load the physmem kernel module by putting a limit of 256 MB to the physmem memory: > /sbin/rmmod physmem > /sbin/insmod Linux/physmem.ko physmemsize=256 The size of the assigned memory for device/dev/physmem0 can be controlled by passing the parameter physmemsize0 to the physmem kernel module. This parameter is optional and specifies the size in MB. If this parameter is 0 or not present, the default of 96 MB will be used. As an example, the following commands ALICE DAQ and ECS manual The physmem package 236 load the physmem kernel module by putting a limit of 16 MB to the device /dev/physmem0: > /sbin/rmmod physmem > /sbin/insmod Linux/physmem.ko physmemsize0=16 The size of the assigned memory for device/dev/physmem1 is given by the total size of the as physmem memory reduced by the size of the assigned memory for device/dev/physmem0, but not more than 2 GB, in case of 32 bit system. Both physmem module parameters can be passed to the physmem kernel module in one line. If one wants the driver to be loaded with some parameters at the next boot time, she/he can modify the /etc/rc.modules file accordingly. Figure 15.1 in Section 15.4 shows the physical memory layout. 15.2.3 Testing the physmem driver The DATE package physmem includes an utility program physmemTest to write and read back a pattern to the first and to the last 10000 bytes of the assigned memory for each physmem device. It can be used to test the functions of the physmem driver. Listing 15.4 shows how to start this utility program and a successful response. For the description of the program see Section 15.5. No other application should use the physmem memory during this test in order to avoid data corruption. ALICE DAQ and ECS manual Utility programs for physmem Listing 15.4 237 Example of testing the physmem driver with utility physmemTest 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: > /date/physmem/Linux/physmemTest -m 0 Opening /dev/physmem0 Physical address = 0x40000000 Physical usable size of device (no hole) : 0x6000000 + Number of memory zones : 1 + Mem zone 1 : 0x40000000 -> 0x46000000 (size = 0x6000000) Mmap done to 0x2accfcbda000 -> 0x2acd02bda000, trying to access physmem + test write: from 0x2accfcbda000, 1000 bytes written + test read: from 0x2accfcbda000, all 1000 bytes are ok Writing just before the memory end + test write: from 0x2acd02bd9c18, 1000 bytes written + test read: from 0x2acd02bd9c18, all 1000 bytes are ok unmapp'ing physical full size of device (including hole) = 0x6000000 re-mapp'ing Mmap done to 0x2accfcbda000 -> 0x2acd02bda000, trying to access physmem Reading from the start of memory + test read: from 0x2accfcbda000, all 1000 bytes are ok Reading just before the memory end + test read: from 0x2b2f31e25c18, all 1000 bytes are ok > > /date/physmem/Linux/physmemTest -m 1 Opening /dev/physmem1 Physical address = 0x46000000 Physical usable size of device (no hole) : 0xb9f50000 + Number of memory zones : 2 + Mem zone 1 : 0x46000000 -> 0xcff50000 (size = 0x89f50000) + Mem zone 2 : 0x100000000 -> 0x130000000 (size = 0x30000000) Mmap done to 0x2b0e21b11000 -> 0x2b0edba61000, trying to access physmem + test write: from 0x2b0e21b11000, 1000 bytes written + test read: from 0x2b0e21b11000, all 1000 bytes are ok Writing just after the 'hole' in memory + test write: from 0x2b0eaba61000, 1000 bytes written + test read: from 0x2b0eaba61000, all 1000 bytes are ok unmapp'ing physical full size of device (including hole) = 0xea000000 re-mapp'ing Mmap done to 0x2b0e21b11000 -> 0x2b0f0bb11000, trying to access physmem Reading from the start of memory + test read: from 0x2b0e21b11000, all 1000 bytes are ok phys_zones[1].start = 0x100000000 Reading just after the 'hole' in memory + test read: from 0x2b0edbb11000, all 1000 bytes are ok Writing just before the phys_zone[1].end + test write: from 0x2b0f0bb10c18, 1000 bytes written + test read: from 0x2b0f0bb10c18, all 1000 bytes are ok 15.3 Utility programs for physmem The DATE package physmem includes several utility programs for the physmem memory. They are useful for debugging purpose. The program physmemTest performs a simple read/write test on the first and last 10000 bytes of the assigned memory for each physmem device. With the help of the program physmemFill a pattern (also made of zeros for cleaning purposes) can be written into a section of the physmem memory. A section of the physmem memory can be displayed with the program physmemDump. In the following their synopsis is presented. ALICE DAQ and ECS manual The physmem package 238 physmemTest Synopsis Description physmemTest [{-m|-M} DEVICE_NUMBER] The physmemTest program opens the device defined by the DEVICE_NUMBER program parameter, finds out the physical base address and the size of assigned memory, and maps to the whole assigned physmem memory using both mapping methods. For the details of mapping methods see Section 15.4, ”Internals of the physmem driver”. If the assigned physmem memory has only one memory zone the program writes a pattern (incremental data words with starting value 0) to the first 10000 bytes, reads back and checks the first 10000 bytes, writes a pattern to the last 10000 bytes, reads back and checks the last 10000 bytes. Then it remaps the assigned memory using the second mapping method, reads back and checks the first and last 10000 bytes, and closes the corresponding device. When the assigned physmem memory has two or more memory zones the program writes a pattern (incremental data words with starting value 0) to the first 10000 bytes, reads back and checks the first 10000 bytes, writes a pattern to the first 10000 bytes of the second zone, reads them back and checks them. Then it remaps the assigned memory using the second mapping method, reads back and checks the first 10000 bytes of the first and second zones. At the end it writes the pattern to the last 10000 bytes of the second zone, reads them back and checks them, and closes the corresponding device. Parameters: Example • DEVICE_NUMBER: this parameter defines the minor device number. The default value is 1 indicating device /dev/physmem1. see Listing 15.4. Tests the /dev/physmem0 and /dev/physmem1 memory. physmemFill, physmemFillWithAddress Synopsis physmemFill {{-o|-O} [{-l|-L} [{-s|-S} [{-m|-M} [{-d|-D} OFFSET | {-p|-P} PHYSICAL_ADDRESS} LENGTH START] [{-i|-I} INCREMENT] DEVICE_NUMBER] WORD_WIDTH] physmemFillWithAddress [{-m|-M} DEVICE_NUMBER] [{-d|-D} WORD_WIDTH] Description The physmemFill program opens the chosen physmem device, finds out the physical base address and the size of assigned memory, maps to the whole assigned physmem memory, writes a pattern into a section of the memory according to the parameters, unmaps the assigned memory, and closes the corresponding device. ALICE DAQ and ECS manual Utility programs for physmem 239 The physmemFillWithAddress program fills the words of whole chosen physmem device with its own physical address. It can be useful when one to check if a program really changed the contains of the physmem memory. Parameters: Example • OFFSET: this parameter defines the start of the section to be filled. It is given as offset in bytes (flag -o) or in words (flag -O) relative to the user base address of the assigned memory. Values starting with 0x or 0X are interpreted as hexadecimal values. • PHYSICAL_ADDRESS: this parameter gives the physical address of the start of the section to be filled. Either OFFSET or PHYSICAL_ADDRESS must be specified. • LENGTH: this parameter defines the length of the section to be filled. It is given in bytes (flag -l) or in words (flag -L). The size of the word is given by the WORD_WIDTH parameter. Values starting with 0x or 0X are interpreted as hexadecimal values. • START: the section is filled with incremental data words. This parameter defines the starting value. The default value is 0. • INCREMENT: the section is filled with incremental data words. This parameter defines the increment value. The default value is 1. • DEVICE_NUMBER: this parameter defines the minor device number. The default value is 0 indicating device /dev/physmem0. • WORD_WIDTH: this parameter specifies whether the fill values are 4- or 8-byte long. This size is taken at the calculation of world offset (flag -L). For 32 bit architecture machine only 4-byte length is allowed. The default value is 4. > /date/physmem/Linux/physmemFill -o 0 -l 1000 -m 1 -i 0 Fills the first 1000 bytes with 0 of the /dev/physmem1 memory. physmemDump Synopsis physmemDump {{-o|-O} [{-l|-L} [{-m|-M} [{-d|-D} OFFSET | {-p|-P} PHYSICAL_ADDRESS} LENGTH] DEVICE_NUMBER] WORD_WIDTH] Description The physmemDump program opens the chosen physmem device, finds out the physical base address and the size of assigned memory, maps to the whole assigned physmem memory, prints out the device properties (number of zones, size of zones and gaps, physical and user offsets, etc.), reads a section of the memory according to the parameters and prints it in a formatted way, unmaps the assigned memory, and closes the corresponding device. If the chosen physmem device is divided into separate zones, the program skips the area between zones. Parameters: • OFFSET: this parameter defines the start address of the section to be read. It is ALICE DAQ and ECS manual The physmem package 240 given as offset in bytes (flag -o) or in words (flag -O) relative to the user base address of the assigned memory. Values starting with 0x or 0X are interpreted as hexadecimal values. Example • PHYSICAL_ADDRESS: this parameter gives the physical address of the start of the section to be read. Either OFFSET or PHYSICAL_ADDRESS must be specified. • LENGTH: this parameter defines the length of the section to be read. It is given in bytes (flag -l) or in words (flag -L). The size of the word is given by the WORD_WIDTH parameter. Values starting with 0x or 0X are interpreted as hexadecimal values. If a 0 length is given the program only prints the device properties as the number of zones, the size of zones and gaps, the physical and user offsets. • DEVICE_NUMBER: this parameter defines the minor device number. The default value is 0 indicating device /dev/physmem0. • WORD_WIDTH: this parameter specifies whether to dump the memory words 4or 8-byte long. This size is taken at the calculation of world offset (flag -L). For 32 bit architecture machine only 4-byte length is allowed. The default value is 4. > /date/physmem/Linux/physmemDump -o 0 -l 1000 -m 1 Dumps the first 1000 bytes of the /dev/physmem1 memory. 15.4 Internals of the physmem driver The physmem kernel module services the module initialization as well as the cleanup requests and the file operations on the device files with major number 122. The source code of this kernel module can be found in file physmem.c together with physmem.h in the physmem package. During the loading of the physmem kernel module, the device is registered, the memory end of the Linux region is determined, and the size of the memory beyond this boundary is obtained by gradually mapping 4 KB pages until a write/read test fails on them. Eventually, the physmem memory is assigned to the devices /dev/physmem0 and /dev/physmem1, (see Figure 15.1). During the unloading of the physmem kernel module, the device is unregistered. The implemented file operations are open(), close(), mmap(), munmap(), and ioctl(). These are standard Linux system routines whose synopsis with reference to physmem kernel module are presented below. The usage of these routines is illustrated by file physmemTest.c in the physmem package. In the case of 64 bit machine architecture the physmem memory can be divided into separate area, so called memory zones. The reason of this is that the BIOS assigns some memory area to devices like video memory just below of the 4 GB limit. This part of the memory can not be used by the physmem devices. In this way one of the physmem devices (generally /dev/physmem1) can be divided into two zones. Both zones have continuous physical addresses but the “gap” addresses between the zones can not be used. ALICE DAQ and ECS manual Internals of the physmem driver 241 The file operation mmap() can map the physmem devices in two different ways; usable or full mapping. In the first case the user addresses are continuous for the usable part of the device. This means that the last user address of a memory zone is followed by the first user address of the next zone. There is no gap in the user’s space. To calculate the physical addresses of a given user address one has to take into account the physical gap between the zones. In the case of full mapping, the user address space is continuous, i.e. the gap area contains user address as well. This facilitates the calculation of physical address of a given user address. On the other hand the user can accidentally read or write from and to the forbidden gap area, which can lead to system crash. The two mapping methods can be selected by an ioctl() call before calling mmap() operation. Physical Memory module parameter physmemsize or by default automatically “gap area” used by system /dev/physmem1 module parameter physmemsize0 or by default 96 MB /dev/physmem0 kernel parameter mem=xxxxM via the boot-loader Linux [ Figure 15.1 Memory layout of physmem open C Synopsis #include #include int open( const char *pathname, int flags ) Description Converts the parameter pathname, which has to be the absolute path of the physmem device file, into a file descriptor to handle subsequent I/O operations. Several processes can open this device file. The kernel module counter is always 0. ALICE DAQ and ECS manual The physmem package 242 Parameters Returns • pathname: pointer to the physmem device file name • flags: this parameter must include one of the access mode O_RDONLY, O_WRONLY or O_RDWR. These request opening the physmem area read- only, write-only, or read/write, respectively. A new file descriptor (non-negative integer), or -1 if an error occurred. close C Synopsis #include int close( int fd ) Description Closes the file descriptor fd that was created by function open(). The kernel module counter is always 0. Parameter Returns fd: the file descriptor of the physmem device. Zero on success, or -1 if an error occurred. mmap C Synopsis #include #include void *mmap( void *start, size_t length, int prot, int flags, int fd, off_t offset ) Description Maps to physmem memory for the address range that is specified by parameter offset and parameter length in bytes. The memory region is mapped in multiples of the page size, 4096 bytes. Several processes are allowed to create a mapping. No initialization of the specified memory region is done. Before calling mmap() it is mandatory to call function ioctl() with parameter value PHYSMEM_GETSIZE or PHYSMEM_GETFULLSIZE (see below the description of ioctl). The ioctl() call informs the driver of the mapping mode: usable or full mapping (see the introductory part of Section 15.4) Parameters • start: defines a preferred value for the pointer to be returned, but it is usually left to 0.It must be a multiple of the page size. • length: size of the mapped memory area in bytes. • prot: this argument describes the desired memory protection (and must not conflict with the open mode of the device) is either PROT_NONE or the bitwise OR of one or more of the other PROT_* flags: PROT_EXEC : Pages may be executed. PROT_READ : Pages may be read. PROT_WRITE : Pages may be written. ALICE DAQ and ECS manual Internals of the physmem driver 243 PROT_NONE : Pages may not be accessed. Returns • flags: this parameter specifies the type of the mapped object (bits MAP_FIXED, MAP_SHARED, or MAP_PRIVATE, whereas the latter two bits are exclusive). • fd: this parameter is the file descriptor that was produced by the function open() • offset: start address in bytes of the mapped physmem memory range. It should be a multiple of the page size. A pointer to the mapped area, or -1 if an error occurred. The physmem memory can be accessed with this returned pointer. munmap C Synopsis #include #include int munmap( void *start, size_t length ) Description Releases the mapping to physmem memory for the address range that is specified by the parameter start and the parameter length in bytes. The region is also automatically unmapped when the process terminates. On the other hand, closing the file descriptor does not unmap the region. Parameters • start: start address of the mapped physmem area, which is the pointer returned by the function mmap(). • length: size of the mapped memory area in bytes. Returns Zero on success, or -1 if an error occurred. ioctl C Synopsis #include #include “physmem.h” int ioctl( int fd, int request, void *argument ) Description Parameter The sizes and the physical base addresses of the physmem memory and the usable memory zones can be obtained with this device-specific function. It is mandatory to call this function with parameter request set to PHYSMEM_GETSIZE or PHYSMEM_GETFULLSIZE, before calling mmap(). The call not only returns the requested value, but also informs the driver of the mapping mode: usable or full mapping (seeSection 15.4). • fd: the file descriptor that was created by function open(). ALICE DAQ and ECS manual The physmem package 244 • request: this parameter defines the out parameter argument to be returned. It can be set to the values: PHYSMEM_GETADDR, PHYSMEM_GETSIZE, PHYSMEM_GETFULLSIZE, PHYSMEM_GETNUMMEMZONES and PHYSMEM_GETMEMZONES. • argument: pointer to the out parameter. Its type, size and value are encoded in the request parameter. According to the request, the returned values are: - PHYSMEM_GETADDR: the parameter argument returns the physical address of the beginning of the physmem device. Its type is unsigned long. - PHYSMEM_GETSIZE: the parameter argument returns the usable size of the physmem device where the (possible) holes are not taken into account. Its type is unsigned long. - PHYSMEM_GETFULLSIZE: the parameter argument returns the total size of the physmem device, where the (possible) holes are taken into account, even if the memory located there cannot be used. Its type is unsigned long. - PHYSMEM_GETNUMMEMZONES: the parameter argument returns the number of memory zones used by the device. Its type is unsigned int. - PHYSMEM_GETMEMZONES: to be called with an allocated array of “number of memory zones used by the device” elements of type struct memZoneStruct. The call will populate the given array with the physical start addresses, physical end addresses and sizes of the different used memory zones. Returns Zero on success, or -1 if an error occurred. 15.5 Physmem application library In the following we present the synopsis of the C routines and functions useful to build programs using physmem area. The physmem_lib.c, physmem_lib.h and physmem_lib.o files are part of the physmem package. Before calling any of the following routines one has to include a header file: #include “physmem_lib.h” This file contains the type definition of the structures referred in the routines and the routine’s prototypes. It also has reference to physmem.h header file, which contains the definitions special to physmem driver. The physmem memory area can be divided into memory zones as described in Section 15.4. Only the memory parts inside the zones can be used. A zone is characterized by its starting and ending addresses. The start address is the address of the first byte of the zone, the end address is the address of the byte following the last address of the zone. Several structures are defined in the header files to contain the physical and the user addresses and offsets of the zones. (the offsets are the differences of a given address and the start address of the physmem area.) ALICE DAQ and ECS manual Physmem application library 245 All of the following routines uses the physmem handler. It is defined by the open routine and contains the necessary informations for the correct executions of the other routines. physmemOpen Synopsis #include physmem_lib.h int physmemOpen(physmemHandler_t int int *device, minor, full_size) Description The routine opens the device /dev/physmem and maps the area according the value of full_size. It also gets the size, the user and physical address of physmem area, the coordinates (start and end addresses, and size) of zones and gaps, and fills this information into the handler structure pointed by device. Parameters: • device: this parameter points to device handler to be filled by the routine in case of successful open. • minor: this parameter defines the minor number of the physmem device. Its value could be 0 or 1. • full_size: this parameter defines the mapping method of the area. A value of 0 means: usable mapping: the user addresses are continuous for the whole device. This means that last user address of a memory zone is followed by the first user address of the next zone. There is no gap in the user’s space. A value of 1 means: full mapping: the user address space is continuous, i.e. the gap area has user addresses as well. Return value 0 if the open and mapping was successful, -1 in any other case physmemClose Synopsis #include physmem_lib.h int physmemClose(physmemHandler_t device) Description The routine removes the memory mapping of the PHYSMEM area referred by handler device and closes area. Parameters: • device: the handler of the physmem device defined by a previous call of physmemOpen() routine. ALICE DAQ and ECS manual The physmem package 246 Return value 0 if the unmapping and close were successful, -1 in any other case. physmemPrintZones Synopsis #include physmem_lib.h void physmemPrintZones(physmemHandler_t device) Description The routine displays the start- and end-offsets of all the memory zones of the physmem area referred by the handler device. The offsets are the differences of the address of the given zone and the address of the start of the physmem area. Parameters: • device: the handler of the physmem device defined by a previous call of physmemOpen() routine. physmemBlockIsNotContinuous Synopsis Description #include physmem_lib.h int physmemBlockIsNotContinuous( volatile unsigned int *block_start, volatile unsigned int *block_end, physmemHandler_t device, int *start_status, int *end_status) The routine investigates if a memory block’s start and end are in the same physmem memory zone or not. It returns the zone’s (or gap’s) number of the start and end of block. The memory zones are numbered staring from 0, while the gaps are numbered by negative numbers starting from -1. An example of the memory zone and gap numbering: gap -1 [--zone 0--] gap -2 [--zone 1--] gap -3 [--zone 2--] i.e. the memory zone N is surrounded by gaps (N-2) and (N-3). Parameters: • block_start: user address of the block start. • block_end: user address of the block end (last byte of the block +1). • device: this parameter points to device handler to be filled by the routine in case of successful open. • start_status: returns the number of memory zone (>= 0) or number of gap (<0) where the block starts • end_status: returns the number of memory zone (>= 0) or number of gap (<0) where the block ends ALICE DAQ and ECS manual Physmem application library Return value 247 0 if the start and end of the block is in the same zone, -1 in any other case. physmemPhysOffset Synopsis #include physmem_lib.h unsigned long physmemPhysOffset(unsigned long user_offset, physmemHandler_t device) Description This utility routine calculates the physical memory offset from the user offset. Parameters: • user_offset: user memory offset value. • device: this parameter points to device handler to be filled by the routine in case of successful open. Return value the physical offset, or -1 if no physical address in the physmem area belongs to the given user offset. It happens if the user address belonging to the user offset is outside of the physmem memory zones. physmemUserOffset Synopsis #include physmem_lib.h unsigned long physmemUserOffset(unsigned long phys_address, physmemHandler_t device) Description This utility routine calculates the user memory offset from the physical address. Parameters: • user_address: physical memory address value. • device: this parameter points to device handler to be filled by the routine in case of successful open. Return value the user offset, or -1 if no user offset in the physmem area belongs to the given physical address. It happens if the physical address is outside of the physmem memory zones. ALICE DAQ and ECS manual 248 The physmem package ALICE DAQ and ECS manual Utility packages 16 This chapter describes the DATE utility packages banksManager, bufferManager, simpleFifo and recordingLib, their architecture, the API and the associated procedures and utilities. These packages are included in the standard DATE kit and are mainly used internally by DATE. ALICE DAQ and ECS manual 16.1 The banks manager package . . . . . . . . . . . . . . . . . . 250 16.2 The bufferManager package . . . . . . . . . . . . . . . . . . 254 16.3 The simpleFifo package . . . . . . . . . . . . . . . . . . . . . 259 16.4 The recording library package . . . . . . . . . . . . . . . . . 264 Utility packages 250 16.1 The banks manager package 16.1.1 Introduction All DATE actors need some support for memory banks. These are handled by the DATE banksManager package which provides a common, configurable and flexible interface that includes features such as dynamic sizing and automatic initialization. 16.1.2 Architecture The base of the banksManager package is the banks database as defined by the DATE database package. In the database are listed, role by role, all the banks that needed to run together with their characteristics (size, type, support). At each start of run, the rcServer daemon creates (if needed) and initializes the banks required by the role according to the runtime configuration of the DATE system. Banks are mapped on each actor running on the given machine in strict order. A unique ID is given to each bank. Different actors may see the same banks mapped at different virtual addresses: for this reason, exchange of absolute pointers to entities contained in the banks is forbidden. Inter-actors exchanges should always be done using the offset between the beginning of the bank and the pointer plus the identifier of the bank itself. These two entities are guaranteed to map to the same object from any process using the banksManager package. A set of macros and symbols are given to facilitate the procedure. Code using the banksManager package should be compiled in a DATE environment, using the DATE symbols and via the DATE makefiles rules. The include file ${DATE_BANKS_MANAGER_DIR}/banksManager.h should be included and the library ${DATE_BANKS_MANAGER_BIN}/libBanksManager.a should be used for linking. The library makes intensive use of the central definition file ${DATE_COMMON_DEFS}/event.h. It is included automatically if the DATE compilation rules are used. 16.1.3 Entries and symbols dateInitControlRegion C Synopsis #include “banksManager.h” int DATE_INIT_CONTROL_REGION( hostRole ) int dateInitControlRegion( hostRole, eventId, rcShmId ) Description Initialize the control region. This entry is used only by rcServer. The hostRole parameter is an ID for the role assumed by the machine. eventId and rcShmId are the unique IDs for the DATE event and run control block as defined by the common DATE definitions: they are used to validate the structures of the various modules wishing to map to the control buffer. The two IDs are written in the ALICE DAQ and ECS manual The banks manager package 251 control structure and are verified when any attempt is made to connect to it. On mismatch (code compiled on different stages or using incompatible DATE distribution kits) the whole operation is refused. This entry does not allocate memory banks other than the one needed for the run control block. The macro DATE_INIT_CONTROL_REGION can be used to call dateInitControlRegion with the appropriate parameters. Returns TRUE for success, FALSE on error. dateMapBanks C Synopsis #include “banksManager.h” int DATE_MAP_BANKS( hostRole ) int DATE_MAP_AND_INIT_BANKS( hostRole ) int dateMapBanks( hostRole, eventId, rcShmId, initTheBanks ) Description Map all banks needed for the given role. If initTheBanks is TRUE, the banks are also initialized (the rcServer is the process that does this automatically at each start of run). eventId and rcShmId are symbols defined by the DATE central include file and are used to match the IDs of the event control structure and of the run control structure as defined during the compilation phase. This ensures that all actors accessing the banks share the same data structures. Mapping can fail if these two IDs do not match the value found in the control block (as defined by dateInitControlRegion). The macro DATE_MAP_BANKS can be used to call dateMapBanks with the appropriate IDs. Similarly, DATE_MAP_AND_INIT_BANKS can be used to do the same and to request to initialize the banks. Returns TRUE for success, FALSE on error. BO2V/BV2O C Synopsis #include “banksManager.h” void *BO2V( int bankId, int offset ) int BV2O( int bankId, void *address ) Description Macros used to manipulate virtual addresses and their offsets. BO2V takes a bank ID plus an offset and returns the corresponding virtual address. BV2O takes a bank ID plus a virtual address to return an offset. ALICE DAQ and ECS manual Utility packages 252 rcShm/readoutReady/readoutFirstLevel/readoutSecondLevel/ readoutData/edmReady/eventBuilderReady/eventBuilderData/ hltReady/hltDataPages C Synopsis #include “banksManager.h” rcShm rcShmO rcShmSize rcShmBank readoutReady readoutReadyO readoutReadySize readoutReadyBank readoutFirstLevel readoutFirstLevelO readoutFirstLevelSize readoutFirstLevelBank readoutSecondLevel readoutSecondLevelO readoutSecondLevelSize readoutSecondLevelBank readoutData readoutDataO readoutDataSize readoutDataBank edmReady edmReadyO edmReadySize edmReadyBank eventBuilderReady eventBuilderReadyO eventBuilderReadySize eventBuilderReadyBank eventBuilderData eventBuilderDataO eventBuilderDataSize eventBuilderDataBank hltReady hltReadyO hltReadySize hltReadyBank hltDataPages hltDataPagesO hltDataPagesSize hltDataPagesBank Description Pointers, offsets, sizes and banks for the entities handled by the banksManager package. For the definition of the banks, please refer to the DATE database package. If the entity is not present, the pointer will be NULL, offset and bank will be set to -1, and the size will be set to 0. ALICE DAQ and ECS manual The banks manager package 253 edmInput/recorderInput C Synopsis #include “banksManager.h” void *edmInput void *recorderInput Description FIFOs used as input to the corresponding actor. Their actual value depends on the runtime configuration of DATE. physmemAddr/physmemBank C Synopsis #include “banksManager.h” void *physmemAddr int physmemBank int physmemNumZones struct memZonesStruct *physmemZones Description Physical address, bank ID, number of zones and descriptions of the memory zones used for the PHYSMEM driver. Respectively NULL, -1, 0 and NULL if not loaded. When physmemNumZones is zero or one, there are no gaps in the address range of the PHYSMEM zone. When physmemNumZones is > 1 s (e.g. when the system has more than 4 GB of RAM and where the starting physical address of the PHYSMEM memory block is below the 4 GB memory limit) then special care must be taken to avoid the gaps in the address range of PHYSMEM.: in this case, the description of the memory zones within PHYSMEM is given in the array physmemZones (see Section 15.4). 16.1.4 Internals The banksManager package is self-contained in the ${DATE_BANKS_MANAGER_DIR} directory. This directory contains the buffer manager module, a utility to dump the runtime configuration and the makefile for the package. No special setup is required. The banksManager package needs the database, runControl, physmem, bufferManager and simpleFifo packages to compile and link. When running in environments with multiple PHYSMEM zones (e.g. when the system has more than 4 GB of RAM and where the starting physical address of the PHYSMEM memory block is below the 4 GB memory limit) special care must be taken to handle the memory hole created by the BIOS between the 3.3 GB and the 4 GB marks. In these cases, the variable physmemNumZones contains a value > 1 and the array physmemZones contains the description of each zone. Compilation of the package is straightforward and requires no additional setup. The compilation symbol XTRA_CHECKS can be used to run more intensive run-time checks on the parameters and on the buffer itself. This symbol should not be defined for production. ALICE DAQ and ECS manual Utility packages 254 The dumpBanks utility (see Section 4.3.6) can be used to inspect the runtime configurations of the DATE banks. 16.2 The bufferManager package 16.2.1 Introduction The DATE bufferManager package provides the support for allocation and deallocation of memory coming from a common buffer via a lightweight protocol. The DATE architecture needs an efficient single-producer, multiple-consumers buffer manager. The DATE bufferManager package fulfills these requirements. The basic element of a bufferManager entity is a shared memory block to support the whole mechanism. No other resources (from DATE or from the Operating System) are needed. Inter-process synchronization is made via this shared memory block and it is based on linear test-and-set procedures. The package itself is never spin locking: this is left - if needed - at the caller’s level. 16.2.2 Architecture The bufferManager package uses the resources provided by the caller level, namely a memory block to be managed. This block must be shared between all actors who need access to it. A buffer can have only one producer (allocating memory blocks) and multiple consumers (deallocating memory blocks). A producer process can also act as a consumer process, although this feature is not part of the user requirement and may therefore be dropped in future releases of the package. External packages (e.g. the DATE simpleFifo package) must be used to transfer pointers to the memory blocks allocated by the bufferManager package between processes. A memory block allocated by this package can be used as conventional dynamic memory (same features as for memory allocated via the malloc system call). Any kind of data can be stored in these memory blocks. The only constraint is on memory pointers. As the Operating System may map on multiple processes the same memory block at different virtual addresses, special care must be taken to avoid sharing of virtual pointers. If the exchange of pointers is required, memory offsets (difference between the virtual address to exchange and the start virtual address of the memory block) must be specified together with a protocol to indicate the proper start virtual address. Here DATE assumes that all shared memory blocks are allocated via the banksManager package and a set of standard calls and macros are available to facilitate the task. If this is not the case, it is up to the caller to establish an alternate protocol to exchange virtual pointers. When standard DATE buffers are used as support for the bufferManager package, the dateBanks package will automatically initialize them at run time as soon as they are created. If the bufferManager package has to work on non-standard DATE buffers, special care must be taken to initialize the buffer in the appropriate sequence and to avoid out-of-sequence accesses to the same object, an action that may give unpredictable results. ALICE DAQ and ECS manual The bufferManager package 255 With the exception of bmInit, all entries where a buffer pointer is given as a parameter assume that this pointer has been used for a previous, successful call to bmInit. Little or no run-time checks are made on the validity of this and other parameters. The routine bmInit handles memory blocks allocated in PHYSMEM in such a way to avoid gaps in the address range (if any). The procedure used is to initialise the buffer and immediately allocate a memory block that spans across the memory gap(s) eventually present in the memory block. DATE actors will therefore never receive a memory block that includes any location belonging to these gap(s). The library implemented by the bufferManager package provides three sets of entries: one set for the producer process (who allocates blocks), one set for the consumer processes (who deallocate blocks) and one set common between all classes of processes. As the producer may also act as a consumer process, all calls relative to consumers are also available to the producer. Special care must be taken to use the set of entries appropriate to the role of each process. Due to the way compilers work on several architectures, it is recommended to keep all sizes (memory blocks, memory buffers) aligned to 32 bit. Failure to do so may result in raising of several signals (SIGSEGV, SIGBUS) in the calling process. Code using the bufferManager package should be compiled in a DATE environment, using the DATE symbols and via the DATE makefiles rules. The include file ${DATE_BM_DIR}/dateBufferManager.h should be included and the library ${DATE_BM_BIN}/libDateBufferManager.a should be used for linking. 16.2.3 Common entries bmGetVersionId C Synopsis #include “dateBufferManager.h” char *bmGetVersionId( void ) Description Returns Inquiry for the version ID of the package. Pointer to static string. bmGetError C Synopsis #include “dateBufferManager.h” char *bmGetError( void ) Description Inquiry for the string describing the last error occurred in the library. This string is in common between all buffers handled by the package and it is overwritten at each call made to the library. ALICE DAQ and ECS manual Utility packages 256 Returns Pointer to a static string. 16.2.4 Producer entries bmInit C Synopsis #include “dateBufferManager.h” int bmInit( void *buffer, int sizeOfBuffer ) Description Returns The memory block pointed to by buffer and of size sizeOfBuffer is initialized to be used as support for a buffer entity. The buffer pointer should contain the address of a memory block of at least sizeOfBuffer bytes. This entry should not be called when the same buffer is already in use as a memory buffer by other processes. For standard DATE buffers, this entry is called automatically when the buffer itself is created by the dateBanks package. No checks are made on the validity of the pointer or on the availability of the memory block. TRUE on successful completion, FALSE on error. bmValidate C Synopsis #include “dateBufferManager.h” int bmValidate( void *buffer ) Description Returns Check the structure of a buffer for correctness. This entry should be used to guarantee (up to a minimal level) the good status of the buffer in case memory corruption(s) are suspected. The check uses system resources and should therefore be avoided in environments where efficiency is an issue. TRUE if the buffer looks OK, FALSE otherwise. bmAllocate C Synopsis #include “dateBufferManager.h” void *bmAllocate( void *buffer, int sizeOfBlock ) Description Returns A block of at least sizeOfBlock bytes is allocated (if possible) from the given buffer. The library may fail to allocate for a given size - even if the buffer itself has enough space - whenever the data set is split into smaller subsets. -1 for fatal error, NULL if there is not enough free space in the buffer. Otherwise a pointer to the allocated memory block. ALICE DAQ and ECS manual The bufferManager package 257 bmResize C Synopsis #include “dateBufferManager.h” int bmResize( void *block, int newSize ) Description Returns The block pointed to by block is - if possible - resized to newSize bytes. The block must have been previously allocated to contain at least newSize bytes (see the entry bmAllocate). TRUE for success, FALSE on error. bmDefragment C Synopsis #include “dateBufferManager.h” int bmDefragment( void *buffer ) Description Returns The buffer pointed by buffer is thoroughly defragmented. This procedure can be used during quiescent phases - where the buffer is not in use - to repack distributed blocks of memory. The process is linear, does not move memory around the buffer and does not affect blocks eventually allocated. TRUE for success, FALSE on error. bmGetBlocksInUse C Synopsis #include “dateBufferManager.h” int bmGetBlocksInUse( void *buffer ) Description Returns Count the number of blocks currently allocated in the given buffer. Number of allocated blocks on success, -1 on error. bmGetTotalSpace C Synopsis #include “dateBufferManager.h” int bmGetTotalSpace( void *buffer ) Description Count the maximum number of bytes available in the given buffer. ALICE DAQ and ECS manual Utility packages 258 Returns Maximum number of available bytes on success, -1 on error. bmGetAvailableSpace C Synopsis #include “dateBufferManager.h” int bmGetAvailableSpace( void *buffer ) Description Returns Count the number of bytes currently available in the given buffer. Due to possible fragmentation of the buffer, this may not be the maximum size that can be allocated via the bmAllocate call but only an upper bound. Number of available bytes on success, -1 on error. bmGetNumAllocations C Synopsis #include “dateBufferManager.h” int bmGetNumAllocations( void *buffer ) Description Returns Count the number of allocation requests made to the buffer since the last call to bmInit. Number of allocations on success, -1 on error. bmGetNumFulls C Synopsis #include “dateBufferManager.h” int bmGetNumFulls( void *buffer ) Description Returns Count the number of allocation requests rejected due to lack of space made to the buffer since the last call to bmInit. Number of failed allocations on success, -1 on error. 16.2.5 Consumer entries bmDeallocate C Synopsis #include “dateBufferManager.h” int bmDeallocate( void *block ) ALICE DAQ and ECS manual The simpleFifo package Description Returns 259 The block pointed by block is deallocated. The package assumes that the block has been previously allocated via a bmAllocate call. TRUE on success, FALSE on error. 16.2.6 Internals The package is self-contained in the ${DATE_BM_DIR} directory. This folder contains the buffer manager module, a validation program and the makefile for the package. No special setup is required. Compilation of the package is straightforward and requires no additional setup. The compilation symbol XTRA_CHECKS can be used to run more intensive run-time checks on the parameters and on the buffer itself. This symbol should not be defined for production. The bufferManager package needs the database, runControl, physmem, banksManager and simpleFifo packages to compile and link. The dateBufferManagerValidate program can be used to validate the dateBufferManager library. It should be run without parameters to execute an extensive suite of tests and exit with an appropriate status message. See Listing 16.1 for an example run. Listing 16.1 Example of dateBufferManagerValidate run 1: > DATE buffer manager validator starting 2: dateBufferManager.c: single-producer multiple-consumer buffer handler V 1.2 compiled Sep 20 2002 17:15:38 3: DATE buffer manager validator completed 16.3 The simpleFifo package 16.3.1 Introduction Multi-process systems need inter-process synchronization tools. The DATE package requires a fast, lightweight exchange of data between process pairs (one data producer and one data consumer). The simpleFifo package fulfills this requirement. The basic element of a simpleFifo is a memory block to support the whole mechanism. No other resources (from DATE or from the Operating System) are needed. Inter-process synchronization is made via this shared memory block and it is based on linear test-and-set procedures. The package itself is never spin locking: this is left - if needed - at the caller’s level. ALICE DAQ and ECS manual Utility packages 260 16.3.2 Architecture The simpleFifo package implements a simpleFifo entity using a shared memory block provided by the caller. This block is partitioned into a control block and a data block. The simpleFifo entity can then be used to exchange blocks of arbitrary size in a first-in first-out fashion. A simpleFifo allows exactly one data producer and one data consumer. It provides a set of calls for the data producer, a set for the consumer and a third set common for producer and consumer. With the exception of fifoDeclare, all entries where a buffer pointer is given as a parameter assume that this pointer has been used for a previous, successful call to fifoDeclare. Little or no run-time checks are made on the validity of this parameter. Due to the way compilers work on several architectures, it is recommended to keep all sizes (memory block, FIFO head, FIFO tail) aligned to 32 bit. Failure to do so may result in raising of several signals (SIGSEGV, SIGBUS) in the calling process. Code using the simpleFifo package should be compiled in a DATE environment, using the DATE symbols and via the DATE makefiles rules. The include file ${DATE_SIMPLEFIFO_DIR}/simpleFifo.h should be included and the library ${DATE_SIMPLEFIFO_BIN}/libFifo.a should be used for linking. 16.3.3 Common entries fifoGetVersionId C Synopsis #include “simpleFifo.h” char *fifoGetVersionId( void ) Description Returns Inquiry for the version ID of the package. Pointer to a static string. fifoDeclare C Synopsis #include “simpleFifo.h” int fifoDeclare( void *buffer, int sizeOfBuffer ) Description The memory block pointed to by buffer and of sizeOfBuffer size is initialized to be used as support for a simpleFifo entity. The buffer pointer should contain the address of a memory block of at least sizeOfBuffer bytes. This entry should not be called when the same buffer is already in use as a simpleFifo by other ALICE DAQ and ECS manual The simpleFifo package 261 processes. No checks are made on the validity of the pointer or on the availability of the memory block. Returns 0: successful completion, -1 on error. fifoGetSize C Synopsis #include “simpleFifo.h” int fifoGetSize( void *buffer ) Description Returns Inquiry for the size of the data partition of thesimpleFifo pointed by buffer. Size in bytes of the data block of the FIFO. fifoCheck C Synopsis #include “simpleFifo.h” int fifoCheck( void *buffer ) Description Returns Check the structure of a simpleFifo for correctness. This entry should be used to guarantee (up to a minimal level) the good status of the simpleFifo in case memory corruption(s) are suspected. The check uses system resources and should therefore be avoided in environments where efficiency is an issue. TRUE if the FIFO is OK, FALSE otherwise. fifoGetOccupancy C Synopsis #include “simpleFifo.h” int fifoGetOccupancy( void *buffer ) Description Returns Get the occupancy of the given FIFO. Occupancy in percentage (100: completely full, 0: completely empty). fifoIsEmpty C Synopsis #include “simpleFifo.h” int fifoIsEmpty( void *buffer ) ALICE DAQ and ECS manual Utility packages 262 Description Returns Test the given FIFO for presence of data. TRUE if the FIFO is empty, FALSE otherwise. 16.3.4 Producer entries fifoGetFree C Synopsis #include “simpleFifo.h” void *fifoGetFree( void *buffer, int neededSize ) Description Get a pointer to a data block for the given size from the head of the FIFO. This block can be used for whatever purposes the caller may need, as long as its maximum size is respected. When done, the producer should validate the block in order to make it available to the consumer. Returns NULL if there is no available datablock for the requested size, -1 if the request is not valid for the FIFO, otherwise pointer to a memory block. If the entry returns NULL, it is up to the caller to take appropriate action (retry, sleep and retry, etc.). It is possible for this entry to return NULL followed by -1. fifoValidate C Synopsis #include “simpleFifo.h” void fifoValidate( void *buffer, int actualSize ) Description The head of the FIFO previously allocated with fifoGetFree is made available to the consumer. This block can be resized to the given actualSize that must be at most as big as the allocated size during fifoGetFree. The procedure must take care that all accesses to this location may - from now on - conflict with other processes including itself. 16.3.5 Consumer entries fifoHasData C Synopsis #include “simpleFifo.h” int fifoHasData( void *buffer ) Description Poll the FIFO for data. ALICE DAQ and ECS manual The simpleFifo package Returns 263 TRUE if data is available, FALSE otherwise. fifoGetFirst C Synopsis #include “simpleFifo.h” void *fifoGetFirst( void *buffer ) Description Returns Get the pointer to the tail of the FIFO. This pointer points to a memory block of arbitrary size. The actual size of the element head of the FIFO must be worked out by the caller. Pointer to the tail of the FIFO, NULL if the FIFO is empty. fifoSetFree C Synopsis #include “simpleFifo.h” void fifoSetFree( void *buffer, int size ) Description The tail of the FIFO is made available to the data producer. The block is assumed to have the given size, as specified by the caller. 16.3.6 Internals The package is self-contained in the ${DATE_SIMPLEFIFO_DIR} directory. This folder contains the FIFO handler module, a validation package, a simple performance measurement program and the makefile for the package. No special setup is required, the package is entirely self-contained and does not require other packages to compile and link. Compilation of the package is straightforward and requires no additional setup. The compilation symbol DEBUG can be used to produce some output on stdout in case of error. This symbol should not be used for production. The simpleFifoValidate program can be used to validate the simpleFifo library. It should be run on the same machine in two copies, one producer and one consumer. The procedure will run the given number of loops with basic simpleFifo operations (including some runtime checks) and exit with an appropriate status message. See Listing 16.2 for an example run. ALICE DAQ and ECS manual Utility packages 264 Listing 16.2 Example of simpleFifoValidate run 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: > ${DATE_SIMPLEFIFO_BIN}/simpleFifoValidate p 100& [1] 9606 FIFO V 1.03 validation producer side starting > ${DATE_SIMPLEFIFO_BIN}/simpleFifoValidate c 100 FIFO V 1.03 validation consumer side starting Producer: consumer started Consumer: producer started Producer: test completed OK. Sleeps: 2050836.000000. Checks: 3. Consumer: test completed OK. Sleeps: 2459685.000000. Checks: 2. [1]+ Done ${DATE_SIMPLEFIFO_BIN}/simpleFifoValidate p 100 16.4 The recording library package 16.4.1 Introduction Actors running in a DATE environment may need to record raw events on local or remote systems. DATE provides two standard recording libraries. They both allow efficient and tailored output of raw DATE events on a wide set of data channels (raw files, named pipes and network channels) and for all type of DATE events (streamlined, paged, fully-built events). 16.4.2 The low-level recording library The low-level recording library is used by processes who need full control over their output channels. This is the case, for example, for the eventBuilder process on the GDC. The low-level recording library is able to handle a set of channels (for multiple parallel output streams) and allows both synchronous and asynchronous output. 16.4.2.1 The callable interface The callable interface is defined by the C include file ${DATE_RECORDLIB_DIR}/recordingLib.h. This file must be used within a standard DATE environment in order to compile the calling program. The library is capable of handling multiple channels. All operations must indicate which channel they are related to. This is done with an index in the range [0 .. maxChannel-1], where maxChannel is the value returned by the library call recordingLibDeclareDevice. For some entries, it is possible to use the pre-defined symbol ALL_CHANNELS: in this case the entry operates on all channels that have been declared. The library allows a completion handler routine to be declared. This will run as a coroutine as soon as any I/O completes. Each outstanding output can have an optional user pointer. This is the address of an anonymous block of data associated to the channel, usually describing a data structure related to the pending output. The caller program can - at any time - set or get the user pointer associated to any channel. Normal usage is to declare the user ALICE DAQ and ECS manual The recording library package 265 pointer during the setup of a write operation and to retrieve it at the end of the operation. This library makes use of the DATE database package. The corresponding library must be included in the linking stage of the user code. The library makes use of the DATE infoLogger package. The appropriate library must be included in the linking stage of the user code. Normal log messages are issued using the recordingLib stream. Error and fatal messages are also recorded onto the runLog stream. The shared memory control region is updated by the library. The run parameters runNumber, maxFileSize, ldcSocketSize and maxEventSize are used for setup and run-time checks. The counters fileCount, bytesRecorded and eventsRecorded are updated at run-time. All sizes used in this library are expressed in bytes. recordingLibDeclareDevice C Synopsis #include “recordingLib.h” int recordingLibDeclareDevice( char *recordingDevice ) Description Declare the output channel(s) according to the given recordingDevice. The syntax of the recording device follows the conventions described in Section 10.2. Returns The number of channels corresponding to the recording device (zero for error). recordingLibOpenChannel C Synopsis #include “recordingLib.h” int recordingLibOpenChannel( int channel ) Description Open the channel with the given ID (ALL_CHANNELS: all channels are opened). If the call fails, the channel(s) is/are left in closed state. Returns TRUE on success, FALSE on error. recordingLibCloseChannel C Synopsis #include “recordingLib.h” int recordingLibCloseChannel( int channel ) ALICE DAQ and ECS manual Utility packages 266 Description Returns Close the given channel (or all the channels if ALL_CHANNELS is used as channel ID). On error, the channel(s) is/are left in an undefined state (all channels that can be closed are closed). TRUE on success, FALSE on error. recordingLibSetCallback C Synopsis #include “recordingLib.h” int recordingLibSetCallback( void callback( int channel ) ) Description Declare the routine to be called on completion for each output. The routine will receive the channel ID as the input parameter. Returns TRUE on success, FALSE on error. recordingLibStartSOR C Synopsis #include “recordingLib.h” int recordingLibStartSOR( void ) Description Declare the beginning of the “start of run” phase. Returns TRUE on success, FALSE on error. recordingLibEndSOR C Synopsis #include “recordingLib.h” int recordingLibEndSOR( void ) Description Declare the end of the “start of run” phase. Returns TRUE on success, FALSE on error. recordingLibSetupWrite C Synopsis #include “recordingLib.h” int recordingLibSetupWrite( int channel, ALICE DAQ and ECS manual The recording library package 267 void *buffer, int length, void *uptr ) Description Returns Setup the write for the given channel of the data in the given buffer for the given size. The given user pointer can be retrieved at any time during and after the output. TRUE on success, FALSE on error. recordingLibSetupWriteV C Synopsis #include #include “recordingLib.h” int recordingLibSetupWriteV( int channel, struct iovec *iov, int iovcnt, void *uptr ) Description Setup the write for the given channel of the data in the given I/O vector of the specified length. The given user pointer can be retrieved at any time during and after the output. For more details on the iovec structure, see the man page relative to the writev Unix system call. Returns TRUE on success, FALSE on error. recordingLibWriteNext C Synopsis #include “recordingLib.h” int recordingLibWriteNext( int channel ) Description Write the next data on the given channel. If the channel is set as non-blocking, the call writes only what is possible to write without blocking the operation. If the channel is set as blocking, the call will stall until the write is completed. Returns The number of bytes written, zero if none, -1 on error. recordingLibSetUptr C Synopsis #include “recordingLib.h” int recordingLibSetUptr( int channel, void *uptr ) ALICE DAQ and ECS manual Utility packages 268 Description Returns Set the user pointer associated to the given channel. TRUE on success, FALSE on error. recordingLibGetUptr C Synopsis #include “recordingLib.h” void *recordingLibGetUptr( int channel ) Description Get the user pointer associated to the given channel. Returns The user pointer associated to the channel (either via recordingLibSetUptr or during the recordingLibSetupWrite/recordingLibSetupWriteV call), NULL if not set or on error. recordingLibSetPortNumber C Synopsis #include “recordingLib.h” int recordingLibSetPortNumber( int port ) Description Set the TCP/IP port number to be used for network channels. This call affects only the channels that have not yet been opened. Returns TRUE on success, FALSE on error. recordingLibSetBlocking C Synopsis #include “recordingLib.h” int recordingLibSetBlocking( int blockingMode ) Description Set the channels as blocking (blockingMode TRUE) or non-blocking (blockingMode FALSE). Returns TRUE on success, FALSE on error. ALICE DAQ and ECS manual The recording library package recordingLibGetChannelName C Synopsis #include “recordingLib.h” char *recordingLibGetChannelName( int channel ) Description Get the name of the given channel. Returns String containing the name of the channel on success, NULL on error. recordingLibGetFd C Synopsis #include “recordingLib.h” int recordingLibGetFd( int channel ) Description Get the number associated to the given channel as returned by the Unix open system call. Returns Number of file descriptor on success, -1 if the channel is closed or on error. recordingLibGetNumWrites C Synopsis #include “recordingLib.h” int recordingLibGetNumWrites( int channel ) Description Get the number of write operations requested on the given channel. Returns Number of operations requested, -1 on error. recordingLibGetNumBytes C Synopsis #include “recordingLib.h” long64 recordingLibGetNumBytes( int channel ) Description Get the number of bytes written through the given channel. Returns Number of bytes written, -1 on error. ALICE DAQ and ECS manual 269 Utility packages 270 recordingLibGetChannelByGdcId C Synopsis #include “event.h” #include “recordingLib.h” int recordingLibGetChannelByGdcId( eventGdcIdType gdcId ) Description Get the ID of the channel associated to the given GDC ID. Returns ID of the channel, -1 on error. recordingLibDumpDatabase C Synopsis #include “recordingLib.h” void recordingLibDumpDatabase( void ) Description Dump the content of the library database via an infoLogger stream. 16.4.3 The high-level recording library The high-level recording library provides an abstract access layer to the low-level recording library. The recorder process running on the LDCs uses the high-level recording library. The high-level recording library can handle only DATE raw events and uses an approach similar to the one implemented in the low-level recording library. Basically, a set of channels is handled all together and many events can be sent simultaneously and asynchronously to any of the open channels. The library then handles the relations with the guest Operating System to queue and perform in parallel all the outstanding transfers. The dynamic resources associated to the calling process are used to adapt to the operating conditions. The library also handles the association of the event to the appropriate output channel. 16.4.3.1 The callable interface The callable interface is defined by the C include file ${DATE_RECORDLIB_DIR}/dateRec.h. This file must be included in a standard DATE environment in order to compile the calling program. All conventions valid for the low-level recording library (see Section 16.4.2 above) are also valid for the high-level recording library. This library makes use of the DATE database package, the DATE infoLogger package, and the low-level recording library. The file ${DATE_RECORDINGLIB_BIN}/libDateRec.a should be used to link the calling code. The library issues log messages using the infoLogger dateRec stream. Errors are also sent to the infoLogger runLog stream. ALICE DAQ and ECS manual The recording library package 271 The run parameter recordingDevice is used by this library (in addition to all the parameters and counters handled by the low-level recording library). dateRecInit C Synopsis #include “dateRec.h” int dateRecInit( void ) Description Initialize the library. Can be called only once in the lifetime of the calling process. Returns 0 on success, non-zero on error. dateRecSetup C Synopsis #include “dateRec.h” int dateRecSetup( void *event, void *userPtr ) Description Initialize the transfer of the given DATE event. The given user pointer will be returned whenever the transfer is either completed or aborted. Returns 0 on success, non-zero on error. dateRecGetCompleted C Synopsis #include “dateRec.h” int dateRecGetCompleted( int timeout, void **event, void **userPtr ) Description Get the data relative to the next completed transfer (if any). The timeout can be equal to 0 (do not wait), -1 (wait forever) or the minimum amount of milliseconds to wait for the next available completed transfer (the actual wait time could exceed the given value for fragmented outputs or for heavy loaded systems). The event pointer must be provided while the userPtr pointer can be NULL. Returns 0 on success, non-zero on error. If an output has been completed, the event pointer will contain the address of the data written, otherwise NULL will be returned. The event pointer will always be overwritten using whatever value was given in the transfer setup routine. ALICE DAQ and ECS manual Utility packages 272 dateRecGetNumPendings C Synopsis #include “dateRec.h” int dateRecGetNumPendings( void ) Description Get the number of outstanding write operations. Returns Number of outstanding operations (completed, in progress or pending), 0 if none. dateRecShutdown C Synopsis #include “dateRec.h” int dateRecShutdown( int forceShutdown ) Description Close all channels in a graceful way (forceShutdown FALSE) or by aborting all outstanding operations (forceShutdown TRUE): aborted writes can be retrieved via the dateRecGetAborted call described below. Returns 0 on success, non-zero on error (or on the impossibility of shutting down the system due to pending operations and forceShutdown set to FALSE). dateRecGetNumAborted C Synopsis #include “dateRec.h” int dateRecGetNumAborted( void ) Description Get the number of write operations aborted due to errors that can be retrieved via the dateRecGetAborted call. Returns Number of aborted operations, 0 if none. dateRecGetAborted C Synopsis #include “dateRec.h” int dateRecGetAborted( void **event, void **userPtr ) Description Get the data associated to the next aborted operation. This call can be iterated to get - one by one - all outstanding aborted operations. ALICE DAQ and ECS manual The recording library package Returns 273 0 on success, non-zero on error. dateRecLastError C Synopsis #include “dateRec.h” char *dateRecLastError( void ) Description Get a string describing the last error encountered by the library. Returns Pointer to a zero-terminated static string. 16.4.4 Internals The two recording libraries (high-level and low-level) are self-contained in the ${DATE_RECORDLIB_DIR} directory. This folder contains the low-level and the high-level recording libraries, the two include files and a small validation program. The validator.c program tests the capability of the low-level recording library to handle a set of parallel output local files. It can be called with several optional parameters the most important of which is the target directory (default: “/tmp/recordingLibValidation”) used to create the test files. This area must be big enough to store several megabytes of raw data that will be created by the validation program and removed upon termination. Run the program with parameter “-?” for a complete list of the available options. ALICE DAQ and ECS manual 274 Utility packages ALICE DAQ and ECS manual Interfaces 17 This chapter discusses the interfaces of DATE with other systems. ALICE DAQ and ECS manual 17.1 Interface with the Trigger System . . . . . . . . . . . . . . . 276 17.2 Interface to the High-Level Trigger . . . . . . . . . . . . . . 277 17.3 Interface to AliEn and the Grid . . . . . . . . . . . . . . . . . 283 17.4 File Exchange Server . . . . . . . . . . . . . . . . . . . . . . . 286 17.5 Interface to the Shuttle. . . . . . . . . . . . . . . . . . . . . . 288 Interfaces 276 17.1 Interface with the Trigger System The trigger system provides the synchronization between the experiment and the data acquisition. It identifies the events that are supposedly worth to be read out and activates the data–acquisition system. This role of the Trigger is documented in Chapter 8. The Trigger system is also a source of data that is read-out by the DAQ. The ALICE Central Trigger Processor will generate three data streams to the DAQ: 1. the CTP event fragment sent for every Trigger Level 2 accept (L2a) consists of 8 words carrying the same information that is broadcast to all the participating sub-detectors through the TTC B-channel. The format of the data is given in Figure 17.1. Figure 17.1 The format of the CTP event data. 2. the interaction record (see Figure 17.2) consists of: • a two-word header, consisting of an orbit number (first orbit of the record) and an Err flag, asserted if there is a gap just before the record in the continuous sequence of interaction records (under normal circumstances, the DAQ should receive interaction records for all LHC orbits). • a maximum of 250 words containing bunch crossing numbers in which interactions have been detected with InT flag set to zero (0) for peripheral events or set to 1 for semi-central interactions. • an optional incomplete record word, present when there are more than 250 interactions, indicated by a virtual bunch crossing number equal to 4095. ALICE DAQ and ECS manual Interface to the High-Level Trigger Figure 17.2 277 The format of interaction records. The CTP event fragments and the interaction record data shall be generated by the CTP and transmitted to the DAQ via the ALICE DDL. The hardware and the communication procedure shall be standard - identical to the channels that transmit the sub-detector readout. The nature of the data, and the timing and rate of their generation, on the other hand, differ significantly from the sub-detector readout and shall be formatted by a “customized” data format as indicated before. The CTP Readout will contribute to the event-building task. It is a redundant channel that carries exactly the same information broadcast, at the time of an L2a decision, to all the participating sub-detectors (L2a Message). It will be used by the ALICE data-driven DAQ system to resolve error conditions. The Interaction Record is an aid to the pattern recognition task. The generation of the record is continuous, rather than “triggered” by any CTP or DAQ action. The data do not “interfere” with any on-line operation - they only need to be archived for the off-line use. 17.2 Interface to the High-Level Trigger The overall architecture of the Trigger, DAQ and High-Level Trigger (HLT) systems is illustrated in Figure 17.3. ALICE DAQ and ECS manual Interfaces 278 Rare/All CTP L0, L1a, L2 BSY BSY LTU LTU 343 DDLs 25 GB/s L0, L1a, L2 TTC TTC H-RORC H-RORC FEP FERO FERO FERO FERO FERO 144 DDLs 1 GB/s FERO HLT Farm 343 DDLs 25 GB/s Event Fragment D-RORC D-RORC D-RORC D-RORC LDC 10 DDLs D-RORC D-RORC Detector LDC LDC FEP LDC LDC D-RORC D-RORC HLT LDC LDC LDC Load Balancing Sub-event Event Building Network EDM 2.5 GB/s Event GDC GDC GDC GDC File Storage Network PDS 1.25 GB/s TDS Figure 17.3 TRIGGER-DAQ-HLT overall architecture. The data-acquisition system takes care of the data flow from the DDL up to the storage of data on the PDS system. The task of the HLT system is to select the most relevant data from the large input stream and to reduce the data volume by well over an order of magnitude in order to fit the available storage bandwidth, while preserving the physics information of interest. This is achieved by a combination of event selection (triggering), data compression, or selection of Regions of Interest with partial detector readout. While executing either of these tasks, the HLT may also generate data to be attached to or partially replacing the original event. Care has been taken not to impose any architectural constraints which could compromise the HLT filtering efficiency, knowing that event selection will become more and more elaborated during the experiment lifetime. This way, filtering may be introduced in progressively sophisticated steps without affecting the performance and the stability of the data-acquisition system. 17.2.1 DAQ-HLT interface A schematic view of the DAQ-HLT interface is illustrated in Figure 17.4. ALICE DAQ and ECS manual Interface to the High-Level Trigger 279 Detector Detector Readout Readout DDL D-RORC D-RORC DDL DAQ DAQ HLT HLT DDL Storage Storage System System Figure 17.4 DAQ-HLT interface schematic view. The hardware interface is based on the DDL and its DIU/SIU cards, the same components used to transfer data from the detector electronics to the data-acquisition system. The DAQ system is implemented within a coherent hardware and software framework, with the HLT system operating as an external system, as shown in Figure 17.5. Detector Readout Electronics DDL DIU FEP DDL SIU H-RORC DDL SIU HLT Farm DDL DIU LDC DDL SIU DDL DIU D-RORC LDC D-RORC Event Building Network GDC GDC GDC GDC GDC Storage Figure 17.5 DAQ-HLT Data Flow overview. Every D-RORC sitting in the LDC can host two DIUs. These on-board DIUs can be used in two ways: both can be connected to the front-end electronics and serve as two readout link, or one DIU can be connected to the front-end electronics while the other is able to transfer a copy of all the raw data to the HLT RORC (H-RORC) sitting in the HLT computers, through the standard DDL. The H-RORC receives all ALICE DAQ and ECS manual Interfaces 280 the raw data as it has received from the front-end electronics. All the LDCs dedicated to the detectors which make use of the HLT system are equipped with D-RORCs working in the second mode. These are called Detector LDCs. The interface between the DAQ and the HLT system is the DIU output on the H-RORC. 17.2.2 HLT-DAQ interface After running the HLT algorithms, the HLT computers transfer the result of the processing, the trigger decisions, and the compressed data to the DAQ system, using again standard DDLs. Here the interface is the SIU input. The GDCs receive the sub-events from the Detector LDCs and any additional data generated by the HLT computers from the LDCs dedicated to the HLT, called HLT LDCs. The DATE software can accept as many data channels from the LDCs dedicated to the HLT as required, since it handles these channels as additional LDC data paths. The HLT LDCs will also receive messages specifying whether to discard or accept a given event. Furthermore, for accepted events, the HLT decision can specify the new pattern of sources for a given event, resulting in a partial readout of the raw data. A decision broker process, running on the HLT LDCs, will transfer the HLT information and decision to a decision agent process, running on the detector LDCs, as shown in Figure 17.6. LDC Raw data HLT data Decisions DDL DIU DDL DIU D-RORC DDL DIU DDL DIU D-RORC DATE banks Raw data bank NIC readout readout decision agent decision broker recorder recorder LDC DATE banks Raw data bank NIC Event Building Network Figure 17.6 Data flow in the LDC in the DAQ system with HLT active. The functions of the decision broker and of the decision agent are implemented by the hltAgent process, started on all the LDCs of the DAQ system whenever the DATE resources are configured to run in an environment where the HLT system is active. This process runs a server loop - similar to those of the recorder and eventBuilder - waiting for incoming events, sending and receiving HLT decisions and forwarding the result to the recorder process. ALICE DAQ and ECS manual Interface to the High-Level Trigger 281 17.2.3 Installation and operation The hltAgent requires a system with a minimum of one LDC declared as a HLT LDC and at least one LDC declared as a Detector LDC. It needs the entity hltReadyFifo (of size equivalent to that of the readoutReadyFifo) and the entity hltDataPages, big enough to contain the headers of the accepted events (this entity - when declared via IPC - can be oversized as the real limiting factor will come from the readoutReadyFifo and the readoutDataPages). When active, the hltAgent will use the infoLogger package (see Chapter 11) to create log messages with facility set to hltAgent. The standard output and standard error streams from the hltAgent are available in the file ${DATE_SITE_TMP}/${DATE_HOSTNAME}/hltAgent/hltAgent.log. A set of files are kept available to keep a historical record over a few runs. The hltAgent must be started using the script ${DATE_HLT_AGENT_BIN}/hltAgent.sh. The script needs no parameters and must be run within an adequate DATE environment (normally setup by the rcServer process of the DATE run control). The hltAgent can be configured in two modes: operation and emulation. When running in operation mode, the incoming events are considered as HLT decisions and/or HLT payloads and are treated as such. When running in emulation mode, the content of the incoming events is ignored, the hltAgent assumes that all the events received handled by the HLT LDCs must be considered as a potential HLT decision and it takes decisions based on a static configuration file. This file, called ${DATE_SITE_CONFIG}/hltEmulation.config, describes the behavior of the hltAgent via two text lines containing the following information: 1. Relative ratios of HLT decisions taken between all the hltAgents running on all the HLT LDCs, expressed as a list of integer values with the number of decisions to be taken by each hltAgent running on the HLT LDCs, e.g.: 10 2 1 1 1 This configures the hltAgent running on the first (as given by the host role ID) HLT LDC to create 10 decisions every 15 events, the hltAgent running on the second HLT LDC to create 2 decisions every 15 events and the hltAgents running on the third, fourth and fifth HLT LDC to create one decision every 15 events. It is possible to give more ratios than the HLT LDCs actually in use (the extra ones will be ignored); however, every HLT LDC must have a corresponding entry. 2. For each trigger class, the percentage of events to be fully accepted, of events to be rejected and - for events to be partially accepted - the percentage of active Detector LDCs to accept with a +/- range, e.g.: CENTRAL_TRIGGER 20 30 10 2 This tells the hltAgent, for events marked with CENTRAL_TRIGGER, to accept 20% of the incoming traffic, to reject 30% of the incoming traffic and to ALICE DAQ and ECS manual Interfaces 282 accept the remaining 100 - 20 - 30 = 50% of the incoming events using 10% +/2% of the active Detector LDCs. 17.2.4 Synchronization between hltAgents The hltAgents synchronize themselves using non-blocking TCP/IP channels. When the run starts, each hltAgent running on a HLT LDC connects to the next hltAgent using the order defined by the host ID. The hltAgent running on the HLT LDC with highest host ID connects to the hltAgent running on the HLT LDC with lower host ID. The result is a circular path that connects sequentially all the HLT LDCs. Next, all hltAgents running on the HLT LDCs connect to a subset of the Detector LDCs (the full set of Detector LDCs is split into subsets with about the same size). The result is a tree of depth 1 with the HLT LDCs as root nodes and the Detector LDCs as leaves, as shown in Figure 17.7. HLT LDC hltAgent Detector LDC hltAgent Figure 17.7 HLT LDC hltAgent Detector LDC hltAgent HLT LDC hltAgent Detector LDC hltAgent HLT LDC hltAgent Detector LDC hltAgent Interconnections between hltAgents. At run-time, when an LDC receives an event, this is given to the hltAgent who finds out if a HLT decision is due or not. If not, the packet is passed as-is to the next element in the chain (recorder or edmAgent). If instead a HLT decision is due, the hltAgents running on the HLT LDCs check if the event is a HLT Decision or a HLT Payload. If the event is a HLT Decision, this is decoded, interpreted and the result is sent both to the hltAgent running on the next HLT LDC and to all the connected Detector LDCs. If instead the event is not a HLT decision or if the hltAgent runs on a Detector LDC, then the hltAgent checks if a HLT decision for the given event has been already received, in which case the appropriate action is taken, or if there is no decision yet, and the event is added to a list of “pending” events. The hltAgents also wait for incoming HLT decisions and - when these are received - they are either applied (if the event is found in the list of “pending” events) or put aside for later use. hltAgents running on HLT LDCs also forward incoming HLT decisions to the next HLT LDC in the chain. All communications between hltAgents are handled asynchronously and do not block the agent itself. ALICE DAQ and ECS manual Interface to AliEn and the Grid 283 17.3 Interface to AliEn and the Grid Files written by DATE must eventually migrate to the Grid. This implies two separate operations: a copy of the file itself in Permanent Data Storage (PDS) and the registration of the file and its content in AliEn. 17.3.1 Transfer to PDS The data files, which are written in ROOT format, are transferred to PDS using one of the available protocols. For the PDS currently in use at ALICE (CASTOR) the XROOTD protocol is used (the RFCP protocol, used heavily in the past, is courrently obsoleted). In PDS, all the files are stored below a common root. The first level of partitioning concerns files written during global runs and files written during standalone runs. For the first class of files, the directory is named “global” while the second type of files are catalogued using the name of the detector. The second level of partitioning is meant to control the number of entries per directory (too many entries would pollute the directory catalogue, creating an unacceptable overload when querying the server). We achieved this objective by creating a tree sorted by year, month, day and hour (GMT) of creation of the individual data files. The information used to create this tree is not meant to be used for reference (for this we have the AliEn catalogue), it is there just to limit the number of entries in each directory (we could have used any other method for this purpose). In Listing 17.1 we can see an example, with excerpts coming from a snapshot taken in April 2009. Below the root directory in line 3 we can see a separate directory for each detector plus a directory global for the global runs (line 7). Below the global directory we have two directories for 2008 and 2009 (lines 24 and 25). If we expand the 2009 directory and descent into 04 (month of April) and 02 (2nd of April) we see (lines 28 to 37) one directory per each our of acquisition, where the data files (lines 40 and 41) are stored. An identical structure is shown for an example taken from the directory dedicated to ZDC standalone runs (lines 43 to 55) that took place September 18th 2008. ALICE DAQ and ECS manual Interfaces 284 Listing 17.1 Example of directory structure on CASTOR 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: $ rfdir -R /castor/cern.ch/alice/raw /castor/cern.ch/alice/raw: acorde emcal fmd global hmpid muon_trg muon_trk phos pmd sdd spd ssd t0 tof tpc trd v0 zdc [...] /castor/cern.ch/alice/raw/global: 2008 2009 [...] /castor/cern.ch/alice/raw/global/2009/04/02: 01 03 06 07 10 12 13 19 20 23 [...] /castor/cern.ch/alice/raw/global/2009/04/02/23: 09000067672031.10.root 09000067672032.10.root [...] /castor/cern.ch/alice/raw/zdc/2008/09/18: 08 09 11 12 14 15 16 22 23 [...] /castor/cern.ch/alice/raw/zdc/2008/09/18/23: 08000060012031.10.root The filenames have been optimized in agreement with the AliEn file catalogue, for efficiency and manageability. The syntax of the names is the following: • two digits for the year • nine digits for the run number • three digits for the host ID • dot ALICE DAQ and ECS manual Interface to AliEn and the Grid 285 • file sequential ID (arbitrary number of characters, must be anything unique within the run: we use the file sequential number, which is unique within each stream, followed by the stream number) • the file type: “.root” or “.tag.root” The result from the above encoding is a filename which is guaranteed to be unique in the lifetime of ALICE (it merges information coming form the run number, the host ID, the stream ID and the sequential number within the stream) and, at the same time, allows searches based on various key parameters (to ease operator’s intervention on the CASTOR namespace). See Listing 17.1, lines 40, 41 and 55, for examples of the syntax. During the ROOTification procedure, a file of file type “.root.guid” containing the GUID information (sort of unique identifier for the data stored in the events) is created by AliRoot for each ROOT file. The content of the GUID file is used during the registration procedure with AliEn and then the file itself is removed. Another file created by the ROOTification procedure is the TAG file (filetype “.tag.root”). A TAG file is created for each run. This file is copied to CASTOR using the same syntax as for the associated ROOT file. It is also registered in AliEn (without GUID as this information does not apply to TAG files). Once a file it has been moved, it must get registered in the AliEn files catalogue. For this procedure we need the following information: • file size (in bytes) • full filename in CASTOR • file creation time • LHC period associated to the run of the file • MD5 checksum of the file (calculated either during the transfer of the file or manually after the transfer) • GUID information of the file (ROOT files only) All the above information is stored in a stamp file that cal be handled in two different ways: via the AliEn registration gateway or via the alienspoold daemon. The two mechanisms can be used indivividually or together. The recommended one is the AliEn registration gateway. The AliEn registration gateway is a machine that can be reached from the DAQ TDSM daemons via the SCP protocol and that has access to the network where the AliEn files catalogue is connected (usually GPN). At the end of the transfer of a file, the TDSM Mover copies the associated stamp file into a dedicated directory local to the AliEn registration gateway, where a dedicated software daemon, developed and maintained by the ALICE Offline project, handles it. It is possible to define multiple AliEn gateways: the TDSM Mover hot-switches to the first available node (an error message will be raised periodically to inform the operator of the abnormal status of the “broken” AliEn gateway(s)). Whenever no gateway is available, the stamp file is stored in a dedicated TDS area and its registration will be retried by a dedicated AliEn interface process via the same protocol used by the TDSM Movers. The DAQ/ECS operator handles all events up to the copy of the file into the gateway (including installation and backup of the gateway(s)) while the Offline operator must take care of all events related to the ALICE DAQ and ECS manual Interfaces 286 handling of the information stored within the stamp file and its registration in the AliEn catalogue. The alienspoold daemon works via dedicated directory on TDS. This directory is periodically polled by a dedicated process named alienspoold. The daemon, developed and maintained by the ALICE Offline project, discovers the stamp file, takes care of the registration procedure using the information stored therein and removes the stamp file when done. For obvious reasons, the alienspoold process must run on a machine that has access to both the TDS network (where the stamp file is created) and to whatever network is used to host the AliEn files catalogue (usually GPN). Files with incomplete or invalid syntax are rejected by alienspoold and stored in a special “.garbage” sub-directory, where they can be retrieved for post-mortem analysis. The health status of the alienspoold process is continuously verified by the TDSM package, that takes care of saving the logs and of restarting the daemon if necessary, up to the notification to the DAQ/ECS operator whenever needed. It is the responsibility of the ECS/DAQ operator to handle events such as rejected registrations or failures in the protocol from/to the AliEn catalogue server. Periodic warnings may be issued by the TDSM package whenever stamp files keep piling up, pointing to a failure somewhere in the registration chain. 17.4 File Exchange Server The File Exchange Server (sometime named FES or FXS) is a transient storage to exchange information between DAQ and other systems (e.g. Offline, DCS, HLT). Some DAQ processes may copy files to the server, which stores them until they are picked up (e.g. by an external system). Each file is supposed to be read once only after it has been stored, by a single consumer, and will then be removed. The File Exchange Server is not meant to publish a file to be used by multiple remote consumers. The paradygm used is: store from DAQ, read by a single remote consumer, and then delete. The File Exchange Server consists of a server hosting the files and related meta information stored in a MySQL database, and a client API to access the files from the DAQ. We describe in this chapter the DAQ File Exchange Server, i.e. a mechanism to export files from DAQ to other systems. DCS and HLT have implemented their own File Exchange Server based on this architecture to export their files. The server part relies on: • a local directory, which must be accessible remotely by password-less scp or sftp for a given user, from the DAQ nodes where File Exchange Server access is needed. To enforce maximum security in production areas, and make sure the server is not used for other purpose than copying files, this remote user login is usually setup with the restricted shell implemented by the rssh package. It is however not mandatory, and the DAQ user needed to store the files can be defined as a normal interactive user. The DAQ user should have write access to the storage area. The external systems reading the files are given only read access to the files. ALICE DAQ and ECS manual File Exchange Server 287 • a database to store the metada associated to the hosted files. This is used as a directory entry to know what files are available, and to flag the files once they have been used remotely. The information is stored in a single table, described in Table 17.1. The DAQ user should have write access to the table, whereas external systems need only read access, and update on the time_processed column to flag the files which they have retrieved. The DAQ File Exchange Server is primarily used to export results from the Detector Algorithms (see Chapter 22) to the Offline Shuttle. Each file stored in the File Exchange Server has a unique name as defined by the field filePath which is derived from other identifiers to make it unique. Table 17.1 File Exchange Server daqFES_files table Field Description run The run number during which the file was created. detector The code of the detector creating the file. fileId The local file Id, which should be unique for a given process (DAQsource) in a given run. However, similar processes running in parallel on different DATE roles may use the same fileId. For example, the different instances of the same pedestal DA running on each LDC of a given detector may all export a file with the same fileId like pedestal.root. DAQsource The DATE role where the file was created, as defined by the DATE_ROLE_NAME environment variable set at runtime by the runControl launching the process. filePath The unique identifier of the file. It is built from a combination of run, detector, fileId and DAQsource to ensure unicity. time_created The time at which the file was stored in the File Exchange Server. time_processed The time at which the file is flagged to be deleted. When an external process reads a file from the File Exchange Server, it should update this field to notify the File Exchange Server that the file has been retrieved and can be removed. time_deleted The time at which the file has eventually been removed from the File Exchange Server, after it has been retrieved and flagged as such by a non-NULL time_processed field. size The file size. fileChecksum The MD5 checksum of the file. Access to the File Exchange Server from a DAQ process requires that the following two environment variables are defined (in the environment section of the DATE configuration database, with editDb, seeChapter 4.4): • DATE_FES_DB: access parameters to the File Exchange Server database. This variable should be defined with the syntax user:password@hostname:dbname where user / password are the credentials to access the File Exchange Server database named dbname running on hostname. • DATE_FES_PATH: access parameters to the File Exchange Server file repository. ALICE DAQ and ECS manual Interfaces 288 This variable should be defined with the syntax user@host:path so that a command like scp myfile user@hostname:path would copy myfile to the File Exchange Server repository directory on hostname. The database may be created using the ${DATE_INFOLOGGER_DIR}/daqFES_create.sql SQL command script. The File Exchange Server may be cleaned by running periodically (with appropriate rights) a script as the example provided ${DATE_INFOLOGGER_DIR}/daqFES_clean which should be adapted according to the needs (e.g. not destroying the files right away, but maybe moving them to an archive of the last ones as disk space allows). The database table may also be used for consistency checks (file size and MD5 sum), or to identify orphan entries (e.g. a file on disk with no database entry, or a database entry with no corresponding file on disk). Upon deletion of a File Exchange Server file, the files on disk are removed, but thecorresponding entries in the database should be left for logging and statistics purposes. The following command line tools are available in ${DATE_INFOLOGGER_DIR} to access the File Exchange Server from a DAQ node: • daqFES_ls : list the files available in the File Exchange Server. The environment variables DATE_RUN_NUMBER and DATE_DETECTOR_CODE may be defined (and then used as filters). • daqFES_store mylocalfile myId: store a local file named mylocalfile on the File Exchange Server using identification myId. The environment variables DATE_RUN_NUMBER, DATE_ROLE_NAME and DATE_DETECTOR_CODE should be defined. • daqFES_get filePath: get the file named filePath from the File Exchange Server (or all of them if filePath not provided) and mark them as used so that they may afterwards be deleted. The environment variables DATE_RUN_NUMBER and DATE_DETECTOR_CODE may be defined (and then used as filters). 17.5 Interface to the Shuttle The Shuttle is an Offline process that collects some information from DAQ after a run is finished in order to populate the Offline Condition DataBase (OCDB). This is meant in particular to retrieve the calibration results produced by the Detector Algorithms (see Chapter 22). The Shuttle is triggered either internaly by a timeout (periodical checks for new runs), or by the DAQ service providing end of run notification by DIM. This service (among others) is implemented in the logbookDaemon and named, as defined in DAQlogbook.h, /LOGBOOK/SUBSCRIBE/ECS_EOR. The service is updated with a run number each time a run is completed (ECS completion). A dedicated logbook table named logbook_shuttle (defined in logbook_create.sql) is populated by the ECS to tell the Shuttle which detectors are active in a run, and to keep the status of Shuttle processing for each of ALICE DAQ and ECS manual Interface to the Shuttle 289 them. The Shuttle gets the list of new runs from this table, and updates it accordingly when it processes them. A special flag test_mode may be set manually when needed to identify some runs where it is not wished that the results are taken into account by the Shuttle to populate OCDB. This may be the case to test new Detector Algorithms. The Shuttle accesses the logbook database and the DAQ File Exchange Server by direct SQL queries; there is no API for this purpose. ALICE DAQ and ECS manual 290 Interfaces ALICE DAQ and ECS manual Part II ALICE Experiment Control System Reference Manual November 2010 ALICE ECS Project ECS & ACT Preface 293 Preface The ALICE Experiment Control System (ECS) coordinates the activities performed on the particle detectors when running the experiment. These activities concern the operation and control of the detectors from the hardware point of view, the acquisition of experimental or calibration data, the Trigger system, and the High Level Trigger. They are called ‘online systems’. The ECS has been designed and implemented as a layer of software on top of the existing 'online systems' controlling the different activities. The ECS imposes only one constraint to these systems: they must provide status information and eventually accept commands through interfaces based on Finite State Machines (FSMs). The FSM package used in ALICE is SMI++ [4]. The ECS heavily relies on it and on the DIM [3] communication package both for the implementation of the interfaces between ECS and 'online systems' and for the implementation of the ECS major components. This part of the manual describes the ECS. • The integration between the ECS and the different 'online systems' is not equally developed. • Some of the detectors did not implement yet a DCS based on FSM: therefore the DCS states of these detectors are not included yet in the ECS. • Some of the detectors have not yet developed their calibration procedures. • Some information on the configuration of the ‘online systems’, such as the definition of the Trigger classes, is not available yet and therefore the ECS uses now some temporary definitions. The ECS will evolve to include the above issues as soon as they will be available. Its architecture has been tested during beam tests, and proved to be solid and flexible enough to include all the future extensions. ALICE DAQ and ECS manual 294 Preface ALICE DAQ and ECS manual ECS Overview 18 T his chapter describes the architecture of the Experiment Control System (ECS), its various components and their interactions. 18.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 18.2 Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 18.3 Stand-alone detectors . . . . . . . . . . . . . . . . . . . . . . 297 18.4 ECS architecture . . . . . . . . . . . . . . . . . . . . . . . . . 298 18.5 Detector Control Agent (DCA) . . . . . . . . . . . . . . . . . 298 18.6 The DCA Human Interface . . . . . . . . . . . . . . . . . . . 299 18.7 Partition Control Agent (PCA) . . . . . . . . . . . . . . . . . 299 18.8 The PCA Human Interface . . . . . . . . . . . . . . . . . . . 300 18.9 ECS/DCS Interface . . . . . . . . . . . . . . . . . . . . . . . 300 18.10 ECS/DAQ Interface . . . . . . . . . . . . . . . . . . . . . . . 301 18.11 ECS/TRG Interface . . . . . . . . . . . . . . . . . . . . . . . 301 18.12 ECS/HLT Interface. . . . . . . . . . . . . . . . . . . . . . . . 302 18.13 logFiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 18.14 Database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 18.15 Interactions with other systems . . . . . . . . . . . . . . . . 303 18.16 Auxiliary processes . . . . . . . . . . . . . . . . . . . . . . . 304 ALICE DAQ and ECS manual ECS Overview 296 18.1 Introduction The ALICE experiment consists of several particle detectors. Running the experiment implies performing a set of activities with these detectors. In ALICE these activities are grouped into four activity domains: Detector Control System (DCS), Data Acquisition (DAQ), Trigger (TRG) and High Level Trigger (HLT). Every activity domain requires some form of coordination and control: independent control systems have been developed for all of them. These systems, called ‘online systems’, are independent, may interact with all the particle detectors, and allow partitioning. Partitioning is the capability to concurrently operate groups of ALICE detectors called partitions. Before being operated together to collect physics data in the ALICE final setup, detectors were prototyped, debugged, and tested as independent objects. While this operation mode, called ‘stand-alone mode’, was absolutely vital in the commissioning and testing phase, it is also required during the operational phase to perform calibration procedures on individual detectors. Therefore it remains essential during the whole life cycle of ALICE. The Experiment Control System (ECS) coordinates the operations of the ‘online systems’ for all the detectors and within every partition. It permits independent, concurrent activities on part of the experimental setup by a same or different operators. The components of the ECS receive status information from the ‘online systems’ and send commands to them through interfaces based on Finite State Machines. The interfaces between the ECS and the ‘online systems’ contain access control mechanisms that manage the rights granted to the ECS: the ‘online systems’ can either be under the control of the ECS or be operated as independent systems. In the second case the ‘online systems’ provide status information to the ECS, but do not receive commands from it. 18.2 Partitions A partition is a group of particle detectors. From the ECS point of view, a partition is defined by a unique name that makes it different from other partitions and by two lists of detectors: the list of detectors assigned to the partition and the list of detectors excluded from the partition. The first list, called assigned detectors list, contains the names of the ALICE detectors that can be active within the partition. This static list represents an upper limit for the partition: only the detectors listed in the assigned detectors list can be active in the partition, but they are not necessarily active all the time. The assigned detectors lists for different partitions may overlap: a same detector can appear in different assigned detectors lists. Assigned detectors lists cannot be empty. The second list, called excluded detectors list, contains the names of the ALICE detectors that have been assigned to the partition, but are currently not active in the ALICE DAQ and ECS manual Stand-alone detectors 297 partition. This dynamic list is a subset of the assigned detectors list. It can be empty. Although a given detector appears in the assigned detectors list of many partitions, it cannot be active in several partitions at the same time, but only in one or none of them: the excluded detectors list of a partition contains the names of the detectors that are not active in the partition because they are active in another one, or because they are operated in stand-alone mode, or because of an explicit operator request. Explicit operator requests are subject to restrictions: the structure of a partition cannot be changed by the exclusion and inclusion of detectors during the data-taking phase. Two types of operations can be performed in a partition: operations involving all the active detectors and operations involving only one active detector. The operations of the first type are called global operations; those of the second type are called individual detectors operations. The ECS handles the global operations watching the DCS status of all the active detectors and interacting with the runControl process that steers the data acquisition for the whole partition, with the Trigger Partition Agent (TPA) that links the partition to the Central Trigger Processor (CTP), and with the HLT proxy that controls the HLT operations for the partition. When a global operation starts, it inhibits all the individual detectors operations. The ECS handles an individual detector operation watching the DCS status of the detector and interacting with the runControl process that steers the data acquisition for that particular detector,with the Local Trigger Units (LTU) associated to it, and with the HLT proxy that controls the HLT operations for that detector.. When an individual detector operation starts, it inhibits the global operations, but it does not inhibit individual detector operations executed on other detectors: these individual detector operations, such as calibration procedures, can be concurrently performed within the partition. 18.3 Stand-alone detectors A stand-alone detector is a detector operated alone and out of any partition. The operations performed with a stand-alone detector are equal to the individual detector operations that can be done on the detector when this one is active in a partition: the ECS handles these operations watching the DCS status of the detector and interacting with the runControl process that steers the data acquisition for that detector, with the LTU associated to it, and with the HLT proxy that controls the HLT operations for the detector. The major difference between a stand-alone detector and a partition with only one single detector is that the partition with only one detector is linked to the CTP by a TPA, whereas the stand-alone detector only interacts with its LTU. ALICE DAQ and ECS manual ECS Overview 298 18.4 ECS architecture Every detector operated in stand-alone mode or assigned to a partition is controlled by a process called Detector Control Agent (DCA) and every partition is controlled by a process called Partition Control Agent (PCA). When a detector is operated in stand-alone mode, its DCA accepts commands from an operator that issues commands from a DCA Human Interface. Several DCA Human Interfaces can coexist for the same DCA, but only one can send active commands at a given time: the others can only get information. When a detector is active in a partition, its DCA accepts commands only from the PCA controlling the partition. Operators can still open DCA Human Interfaces, but only to get information and not to send active commands. A PCA Human Interface provides to an operator the full control of a partition. Many PCA Human Interfaces can coexist for the same PCA, but only one has the control of the partition at a given time and can be used so send active commands. DCAs and PCAs get status information from the 'online systems' and eventually send commands to components of these systems through interfaces based on Finite State Machines. This chapter describes the components of the ECS and the interface between the ECS and the 'online systems'. 18.5 Detector Control Agent (DCA) There is a DCA for every detector operated in stand-alone mode or assigned to a partition. The main tasks performed by this process are the following: • It handles stand-alone data-acquisition runs for the detector working alone. This function requires the coordination and the synchronization of the detector's hardware controlled by the DCS, of the detector's Front End Read-Out (FERO), of a runControl process steering the data acquisition for the given detector only, of the HLT activities performed for the detector, and of the LTU associated to this detector. This function is implemented for all the detectors but not in the same way because detectors have specific requirements. • It handles detector specific procedures, such as calibration procedures. These procedures are by definition detector dependent and therefore their implementation is different for each detector. The DCA is implemented as an SMI domain [4]. The name of the domain is given by the detector name suffixed with '_DCA'. For example, the DCA controlling the HMPID detector is implemented as an SMI domain whose name is HMPID_DCA. In addition to the objects required to perform the main tasks described above, the SMI domain contains other objects that allow the following features: • When the detector is active in a partition and as long as a global action is being ALICE DAQ and ECS manual The DCA Human Interface 299 executed in the partition, the PCA controlling the partition keeps the DCA informed about the global action going on: the DCA goes in an INHIBITED status and waits for the global action to terminate. The information flow goes from the PCA to the DCA. • When the detector is active in a partition and as long as an individual detector operation is being executed for the detector, the DCA keeps the PCA controlling the partition informed about the action going on: the PCA goes in an INHIBITED status and waits for the action to terminate. The information flow goes from the DCA to the PCA. The DCA accepts commands from one master operator at a time: either a PCA or a DCA Human Interface. 18.6 The DCA Human Interface An operator can control the detector in stand-alone mode with a DCA Human Interface having got the mastership of a DCA. He/she can send commands to the DCA, can change the rights granted to the DCA, and can send commands directly to objects in the HLT, DAQ, and TRG 'online systems'. Without the mastership of the DCA, the DCA Human Interface can only get information, but it cannot issue active commands. A detailed description of the DCA Human Interface can be found in the ALICE DAQ WIKI. 18.7 Partition Control Agent (PCA) There is a PCA per partition. The main tasks performed by this process are the following: • It handles PHYSICS and TECHNICAL runs using all the detectors active in the partition. This function requires the coordination of the status, from the hardware and FERO point of view, of all the active detectors, of a runControl process steering the data acquisition for the whole partition, of the TPA associated to the partition and of the HLT proxy controlling the HLT activities for the partition. This function is implemented in a same way for all the partitions. • It delegates individual detectors operations to the DCAs controlling the detectors active in the partition. • It handles the partition structure allowing the inclusion/exclusion of detectors in/from the partition, whenever these operations are compatible with the data-taking going on for individual detectors or for the whole partition. The PCA is implemented as an SMI domain [4]. The name of the domain is given by the partition name suffixed with '_PCA'. For example, the PCA controlling the ITS partition is implemented as an SMI domain whose name is ITS_PCA. ALICE DAQ and ECS manual ECS Overview 300 In addition to the objects required to perform the main tasks described above, the SMI domain contains other objects that allow the following features: • When a detector is active in a partition and as long as a global action is being executed in the partition, the PCA keeps the DCA controlling the detector informed about the global action going on: the DCA goes in an INHIBITED status and waits for the global action to terminate. The information flow goes from the PCA to the DCA. • When a detector is active in a partition and as long as an individual detectors operation is being executed for the detector, the DCA controlling the detector keeps the PCA informed about the action going on: the PCA goes in an INHIBITED status and waits for the action to terminate. The information flow goes from the DCA to the PCA. The PCA accepts commands from one PCA Human Interface at a time. 18.8 The PCA Human Interface An operator can control a partition with a PCA Human Interface having got the mastership of a PCA. He/she can send commands to start global and individual detectors operations, can change the rights granted to the PCA, can change the structure of the partition excluding or including detectors, and can send commands directly to objects in the HLT, DAQ, and TRG 'online systems'. Without the mastership of the PCA, the PCA Human Interface can only get information, but it cannot issue active commands. A detailed description of the PCA Human Interface can be found in the ALICE DAQ WIKI. 18.9 ECS/DCS Interface The DCS describes the ALICE experiment as a hierarchy of particle detectors and of infrastructure services. Its model of ALICE is based on Finite State Machines and is implemented as a tree structured set of SMI domains and objects. Within this tree every detector is represented by a different sub-tree of SMI objects. The status of the objects being the roots of these sub-trees are the status of the different detectors seen from the DCS point of view. The interface between the ECS and the DCS mainly consists of one object per detector: the roots of the sub-trees described above and representing the detectors within the DCS. These objects provide status information to the central DCS and, at the same time, to the ECS. A second object, called Run Control Unit, informs the ECS about the availability of the detector for running (i.e. even a READY detector may want to be excluded from runs). ALICE DAQ and ECS manual ECS/DAQ Interface 301 Figure 18.1 is an example where two detectors, named 'y' and 'z' are active in an ECS partition named 'A'. The figure shows the double role of the SMI objects that provide status information for the two detectors both to the DCS and to the ECS. Figure 18.1 ECS/DCS interface. 18.10 ECS/DAQ Interface The interface between the ECS and the DAQ is made of SMI objects representing runControl processes: • A runControl process per detector: every runControl process steers the data acquisition for a given detector and for that detector only. The name assigned to the process is equal to the detector name. • A runControl process per partition: it steers the data acquisition for the whole partition with data produced by all the active detectors. The name assigned to the process is equal to the partition name prefixed with 'ALL'. If, for example, the partition name is ALICE, then the name of the runControl process is ALLALICE. 18.11 ECS/TRG Interface An SMI domain named 'TRIGGER' contains the objects describing the basic Trigger components: the LTUs associated to the detectors and the CTP. These SMI objects are associated to processes, called proxies, that actually drive the LTUs and the CTP. ALICE DAQ and ECS manual ECS Overview 302 All the detectors active in a partition produce raw data when a global operation is performed; the generation of raw data by the detectors is done under the control of their associated LTUs. These LTUs are synchronized by the CTP. There is one CTP, but many partitions can be operated at the same time and all of them need access to the CTP. The Trigger Partition Agents (TPAs) associated to the different partitions solve the access conflicts. There is one TPA per partition. The TPAs are implemented as SMI objects in SMI domains. The name of these domains is made by the partition names suffixed by '_TRG'. The TPA for a partition named ALICE is an SMI object named TPA in an SMI domain named ALICE_TRG. The TPA interacts with CTP and LTUs. When a detector is operated in stand-alone mode, the DCA controlling it directly interacts with the LTU associated to the detector. The CTP is ignored. When a detector is active in a partition and an individual detectors operation is executed on it, the PCA delegates the operation to the DCA controlling the detector. The DCA again interacts with the LTU associated to the detector. The CTP is ignored. When a global operation is performed in a partition, the PCA controlling the partition interacts with the TPA that in turn interacts with CTP and LTUs. The PCA has no direct interaction with CTP and LTUs. Figure 18.2 shows the ECS/TRG interface. Figure 18.2 ECS/TRG interface. 18.12 ECS/HLT Interface The interface between the ECS and the HLT is made of SMI objects representing HLT proxy processes: • An HLT proxy process per detector: every HLT proxy process steers the ALICE DAQ and ECS manual logFiles 303 HLT activities for a given detector and for that detector only. • An HLT proxy process per partition: it steers the HLT activities for the whole partition with data produced by all the active detectors. 18.13 logFiles The ECS components use the DATE infoLogger package to record error and information messages. The stdout files created by DCAs and PCAs are stored in a directory pointed to by the environment variable ECS_LOGS. The name of the files are self explanatory. When working with dummy versions of HLT, DCS and Trigger (for debugging purposes) the stdout files created by the dummy processes are stored in the directories pointed to by the environment variables HLT_LOGS, DCS_LOGS , and TRG_LOGS. 18.14 Database The ECS components require configuration data. This information is stored in a MySQL database (see the ALICE DAQ WIKI). The database also contains additional runtime information available through a Web interface (see the ALICE DAQ WIKI). In particular, this interface shows the activities being performed for each detector and for each partition. 18.15 Interactions with other systems In addition to its main activity (i.e. the synchronization of the ‘online systems’ to perform runs), the ECS interacts with other components of the ALICE software. In particular: • Sends SOR and EOR commands to the central ALICE DCS at the beginning and at the end of every standalone or global run. • Sends SOR and EOR commands to the LHC_MON process at the beginnig and at the end of every global run. • Sends to the Alice Configguration Tool (ACT) requests to lock/unlock configuration items to prevent configuration changes during runs. • Stores in the ALICE eLogbook information about all the performed runs. ALICE DAQ and ECS manual ECS Overview 304 18.16 Auxiliary processes All the DCAs and all the PCAs require the presence of some auxiliary processes: • ecs_timeout used to interrupt SMI commands after reasonable delays. • ecs_counter to count the number of iterations performed during some calibration procedures or the number of elements in some sets. • stringsProxy required to compare SMI parameters. • ecs_logger to store infoLogger messages. • ecs_operator required to start some special operator commands, such as starting the migration of data. • ecs_daq_db_handler handling all the interactions with the DAQ and ECS databases. The PCAs require more auxiliary processes: • pca_updateDB to keep track of the detectors excluded/included from/in partitions. • pca_updateTIN to update the list of detectors used as trigger detectors for a partition. ALICE DAQ and ECS manual ALICE Configuration Tool 19 The ALICE Configuration Tool (ACT) is the first step to achieve a high level of automation, implementing automatic configuration of the different detectors and online systems. Having already contributed to the reduction of the size of the shift crew needed to operate the experiment, the ACT is a central actor in ALICE’s activities, allowing the Run Coordination and the Shift Leaders to operate the experiment in a global way. This chapter describes the architecture of the ACT and its different components, the interfaces with the different online systems and the Web-based Graphical User Interface. ALICE DAQ and ECS manual 19.1 Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 19.2 Database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 19.3 Application Programming Interface . . . . . . . . . . . . . . 314 19.4 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 19.5 Graphical User Interface. . . . . . . . . . . . . . . . . . . . . 327 ALICE Configuration Tool 306 19.1 Architecture 19.1.1 Overview The operation of the ALICE experiment over several years to collect billions of events acquired in well defined conditions requires repeatability of the experiment’s configuration. Appropriate software tools are therefore needed to automate daily operations, minimizing human errors and maximizing the data-taking time. The ALICE Configuration Tool (ACT) fulfills these requirements, allowing the automatic configuration of the different systems and detectors. The base concept of the ACT is to serve as a configuration repository to which the different ALICE systems can access to extract their currently selected configuration. As shown in Figure 19.1, the ACT is operated both by the Run Coordination and the different system experts via a Web-based Graphical User Interface (GUI). A relational database (DB) serves as a data repository and an Application Programming Interface (API), implemented in C, provides numerous functionalities to the different components. Figure 19.1 The architecture of the ACT and its interfaces with the different online systems and detectors. A publish/subscribe mechanism, based on the Distributed Information Management (DIM) system, is also available. Two dedicated modules, running as daemon processes, use this mechanism: • ECS Dedicated Daemon (EDD): interacts with the Experiment Control System (ECS), pushing the selected configurations for the online systems. • DCS Dedicated Daemon (DDD): interacts with the Run Control Tool (RCT), pushing the selected configurations for the different ALICE detectors. The RCT then makes the configurations available to each individual Detector Control System (DCS) where the detector configuration is executed. ALICE DAQ and ECS manual Architecture 307 19.1.2 Taxonomy In order to define the different systems and detectors components to configure, the ACT introduces the following concepts: Figure 19.2 19.1.2.1 • System: an ACT system represents a physical or logical element of the ALICE experiment. Each system normally has several configurable components. Examples of V\VWHPV are: detectors, online systems, ECS partitions. • Item: an ACT item corresponds to a configurable component of a specific ACT system. Each item normally has several possible configurations defined. Examples of an item are: ‘partition PHYSICS_1 HLT Mode’, ‘TPC DCS configuration’, ‘CTP L0 inputs’. • Instance: an ACT instance defines a possible predefined configuration for a specific ACT item. At any given time, only one instance can be activated for each item. ACT hierarchy. Items Locking A locking mechanism prevents the configuration of items that are either being configured or used by an online system (e.g. a detector being part of a running partition). For configuration, the items are locked by the corresponding daemon (EDD or DDD). If being used by an online system, the items are locked by that system. 19.1.2.2 Items Status Mismatch The status mismatch flag allows to identify items whose configuration has changed outside the control of the ACT. It is the external tool’s responsibility to flag the changed item. An example of this behavior is the inclusion/exclusion of detectors from an ECS partition using the ECS’s human interfaces. 19.1.2.3 Items Activation Status At a given time, each item is in a specific state, represented by its activation status. There are four possible values: ALICE DAQ and ECS manual ALICE Configuration Tool 308 • ‘update requested’: a configuration has been requested for the item. • ‘applying’: a configuration is being executed for the item. • ‘active’: the item is configured as requested. • ‘update failed’: an error occurred while configuring the item. Figure 19.3 shows the state diagram for the items activation status. Figure 19.3 Items activation status state diagram. 19.1.3 ACT Update Request Server The ACT Update Request Server is a DIM server implementing the ACT_UPDATE DIM service and several DIM commands. The most important are: • ACT_UPDATE_REQUEST: an update has been requested. • ACT_TIMEOUT_REQUEST: an abort has been requested. When a command is received, it updates the ACT_UPDATE service. 19.1.4 Interfaces Below is a list of the different ACT interfaces. 19.1.4.1 ACT-ECS interface The interface between ACT and ECS is implemented by the EDD daemon process. When an update request is received by EDD (via the ACT_UPDATE DIM service), it checks which ECS items have been marked for update and propagates the corresponding configuration to ECS. 19.1.4.2 ACT-DAQ interface Communication between ACT and DAQ is implemented by the EDD daemon process. When an update request is received by EDD (via the ACT_UPDATE DIM service), it checks which DAQ items have been marked for update. Then, depending on the item, two different paths may be followed: ALICE DAQ and ECS manual Architecture 309 • for parameters which are also controlled by the ECS (e.g recording mode), the configuration is propagated to the ECS. • for parameters which are only controlled by the DAQ (e.g. number of GDCs), the configuration is propagated to the DAQ. At Start of Run, several DAQ modules (e.g. TPCC) also download their selected configuration directly from the DB using the C API. 19.1.4.3 ACT-HLT interface Communication between ACT and HLT is performed via the ECS, which then sends the relevant changes to the HLT system. 19.1.4.4 ACT-CTP interface Communication between ACT and CTP is performed in two different ways. For the CTP partition configuration, at Start of Run the selected configuration is downloaded directly from the DB by the CTP system using the C API. For the CTP global configuration, when an update request is received by EDD (via the ACT_UPDATE DIM service), it is transmitted to CTP, which downloads the selected configuration directly from the DB using the C API. The CTP is then restarted to load the new configuration. 19.1.4.5 ACT-Detector interface The interface between ACT and the ALICE detectors is implemented by two modules: the DDD daemon process and the RCT. When an update request is received by DDD (via the ACT_UPDATE DIM service), it checks which items have been marked for update and propagates the corresponding information (via DIM) to the RCT. The RCT then updates its internal PVSS datapoints for the corresponding detectors and sends them an FSM CONFIGURE command. When the detector finishes its configuration, it updates a dedicated datapoint with an acknowledgment message, which is then propagated back to DDD (via DIM). Based on this message, DDD then changes the updated items activation status to either ‘active’ or ‘update failed’. More technical details concerning RCT can be found in [20]. 19.1.5 Workflow As seen in Figure 19.4, the ACT workflow starts with the user (usually the Shift Leader) selecting the desired configuration via the GUI. When finished, the user will submit an update request, which will change the selected items activation status to ‘update requested’ and trigger the execution of the ACT_UPDATE_REQUEST DIM command by the ACT Update Request Server. This command will result in an update of the ACT_UPDATE DIM service, thus signaling to both EDD and DDD that an update was requested. ALICE DAQ and ECS manual ALICE Configuration Tool 310 Items related with ECS, DAQ, HLT and CTP (item categories equal to ‘partition’, ‘DAQ config’,‘HLT config’, and ‘CTP config’, respectively) are handled by EDD. Upon receiving the update request, EDD first locks the items and sets their activation status to ‘applying’. Then, depending on the item, changes will be performed in the corresponding online systems to reflect the new configuration. Finally, the items are set to either ‘active’ (on success) or ‘update failed’ (on failure) and unlocked. Items related with the DCS (item categories equal to ‘DCS config’) are handled by DDD. Upon receiving the update request, DDD first locks the items and sets their activation status to ‘applying’. Then the items are grouped by detector, and their name and the value of their active instance concatenated in a string which is passed via DIM to RCT (by updating the ACT_RCT_CONF_DET DIM service where DET is replaced by the corresponding 3-letter detector code). RCT then decodes the received string and populates its internal PVSS datapoints, after which it sends a CONFIGURE command to the detector’s FSM. After executing this command, the detector’s FSM updates a datapoint with the reply to the configuration, which is sent back to DDD via DIM. Finally, the items are set to either ‘active’ (on success) or ‘update failed’ (on failure) and unlocked by DDD. Figure 19.4 ACT workflow diagram. ALICE DAQ and ECS manual Database 311 19.2 Database 19.2.1 Overview The DB, running on a MySQL Server, is used to store the definition of the different elements of the ACT. InnoDB is used as a storage engine for its support of both transactions and foreign keys constraints. Figure 19.5 ACT database schema. Daily backups are performed to a RAID 6 disk array and the CERN Advanced STORage manager (CASTOR). 19.2.2 Table description Below is a description of the ACT’s tables. 19.2.2.1 ACTsystems table This table defines ALICE configurable systems, such as the online systems, the ECS partitions or the different detectors. Table 19.1 ACTsystems table Field Description system System name description System description ECScomponent Name of the corresponding ECS detector or partition (if applicable) systemCategory System type, if any (‘partition’, ‘detector’) ALICE DAQ and ECS manual ALICE Configuration Tool 312 Table 19.1 ACTsystems table Field Description enabled Flag indicating if system is enabled in ACT isTriggerDetector Flag indicating if system is a trigger detector isReadoutDetector Flag indicating if system is a readout detector updateTimeout Update request timeout in seconds 19.2.2.2 ACTitems table This table defines, for each system, the list of configuration items. Table 19.2 19.2.2.3 ACTitems table Field Description item Item name system Item’s system description Item description itemCategory Item category, if any (‘CTP config’, ‘TRG config’, ‘DCS config’, ‘HLT config’, ‘DAQ config’, ‘partition’) enabled Flag indicating if item is enabled in ACT activationStatus Item’s activation status report (‘update requested’, ‘applying’, ‘’, ‘update failed’) statusTimestamp Timestamp of the latest activation status update statusMismatch Flag indicating if item is not in the requested configuration (only meaningful when the item’s activation status is equal to ‘active’) activationComment Comments of latest activation status update ACTinstances table This table defines, for each item, the list of possible (predefined) configurations. Table 19.3 ACTinstances table Field Description instance Instance name item Instance’s item version Instance version number description Instance description author Instance author name creationTime Instance creation timestamp ALICE DAQ and ECS manual Database 313 Table 19.3 19.2.2.4 ACTinstances table Field Description isValidated Flag indicating if instance is validated (marked as ready to be used) changeLog Instance change log value Instance value isActive Flag indicating if instance is selected (active) for the corresponding item dependOnDetector ECS name of detector this instance may depend on at runtime ACTlockedItems table This table stores the list of locked configuration items. Table 19.4 19.2.2.5 ACTlockedItems table Field Description item Item name lockSource Name of element which is locking the item runNumber Run number which is locking the item (if applicable, zero otherwise) eventCount Number of subevents collected by readout bytesInjected Size of data collected by readout in bytes time_update Database row update date/time ACTconfigurations table This table defines reusable configurations of one or more configuration items, allowing users to configure several items in one action. Table 19.5 ACTconfigurations table Field Description id Configuration ID name Configuration name target Target to which the configuration can be applied (e.g. partition name) wildcards CSV list of wildcards to be applied obsolete Flag indicating if configuration is obsolete author Configuration author name creationTime Configuration creation timestamp ALICE DAQ and ECS manual ALICE Configuration Tool 314 Table 19.5 19.2.2.6 ACTconfigurations table Field Description description Configuration description ACTconfigurationsContent table This table stores the content of the reusable configurations defined in the ACTconfigurations table. Table 19.6 19.2.2.7 ACTconfigurationsContent table Field Description id Configuration ID item Item name instance Instance name version Instance version number ACTinfo table This table stores internal ACT information (e.g. version number). Table 19.7 ACTinfo table Field Description variable Variable name value Variable value description Variable description 19.3 Application Programming Interface 19.3.1 Overview Read/write access is available via a C API. 19.3.2 Environment variables The following environment variables are available to configure the behavior of the API: • ACT_DB: sets the credentials to access the DB. The format is “USERNAME:PASSWORD@HOSTNAME/DBNAME”. ALICE DAQ and ECS manual Application Programming Interface • 315 ACT_VERBOSE: sets the logging level. Possible values are: • 0: no messages. • 1: error messages. • > 2: same as 1 + all SQL queries. If not set, the default value is 0. 19.3.3 Data types Below is a list of the available data types. ACT_handle Description Listing 19.1 Handle to an ACT DB connection. ACT_handle type definition 1: struct _ACT_handle { 2: MYSQL *db; /* Handle to MySQL connection */ 3: char verbose; /* Flag set to 1 for verbose logs */ 4: }; 5: typedef struct _ACT_handle * ACT_handle; ACT_system Description Listing 19.2 Structure defining an ACT system. ACT_system type definition 1: typedef struct _ACT_system { 2: char *system; /* system name */ 3: char *ECScomponent; /* corresponding ECS 4: component, if any (or NULL) */ 5: ACT_t_systemCategory category; /* system category */ 6: } ACT_system; ACT_t_systemCategory Description Listing 19.3 Enumerated type defining the system categories to which a system can belong. ACT_t_systemCategory type definition 1: typedef enum { 2: ACT_system_partition, 3: ACT_system_detector, 4: ACT_system_none, /* category undefined */ 5: ACT_system_any /* used for search, match any of the above */ 6: } ACT_t_systemCategory; ALICE DAQ and ECS manual ALICE Configuration Tool 316 ACT_t_systemParams Description Listing 19.4 Enumerated type defining the parameters available in a system. ACT_t_systemParams type definition 1: typedef enum { 2: ACT_system_param_updateTimeout 3: } ACT_t_systemParams; ACT_item Description Listing 19.5 Structure defining an ACT item. ACT_item type definition 1: typedef struct _ACT_item { 2: char *item; 3: char *system; 4: ACT_t_itemCategory category; 5: ACT_instance *activeInstance; 6: 7: } ACT_item; /* /* /* /* item name */ system it belongs to */ item category */ instance currently active, may be NULL */ ACT_t_itemCategory Description Listing 19.6 Enumerated type defining the item categories to which an item can belong. ACT_t_itemCategory type definition 1: typedef enum { 2: ACT_item_CTPconfig, 3: ACT_item_TRGconfig, 4: ACT_item_DCSconfig, 5: ACT_item_HLTconfig, 6: ACT_item_DAQconfig, 7: ACT_item_Partition, 8: ACT_item_none, /* category undefined */ 9: ACT_item_any, /* used for search, match any of the above */ 10: } ACT_t_itemCategory; ACT_t_itemActiveStatus Description Enumerated type defining the activation status in which an item can be. ALICE DAQ and ECS manual Application Programming Interface Listing 19.7 317 ACT_t_itemActiveStatus type definition 1: typedef enum { 2: ACT_activeState_updateRequested, 3: ACT_activeState_applying, 4: ACT_activeState_active, 5: ACT_activeState_updateFailed, 6: ACT_activeState_none, 7: ACT_activeState_any, any of the above */ 8: } ACT_t_itemActiveStatus; /* undefined */ /* used for search, match ACT_instance Description Listing 19.8 Structure defining an ACT instance. ACT_instance type definition 1: typedef struct _ACT_instance { 2: char *item; 3: char *instance; 4: void *value; 5: int size; 6: ACT_t_itemCategory category; 7: ACT_t_itemActiveStatus status; 8: char *dependOnDetector; 9: 10: char isActive; 11: 12: } ACT_instance; /* /* /* /* /* /* /* item name */ instance name */ value content (BLOB) */ size of value, in bytes */ instance category */ instance activation status */ additional detector dependence, if any */ /* 1 if instance active, 0 otherwise */ 19.3.4 Database connection functions Below is a list of functions providing basic connection functionality to the ACT database. ACT_open Synopsis #include “act.h” int ACT_open(const char *cx_params, ACT_handle *h) Description Open a MySQL connection. Credentials should be given via the cx_params parameter in the “USERNAME:PASSWORD@HOSTNAME/DBNAME” format. If an empty string is passed the credentials are taken from the ACT_DB environment variable. If both are empty, an error will be returned. If successful, an handle to the DB connection will be stored in the h parameter. Returns Upon successful completion, this function will return a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ALICE DAQ and ECS manual ALICE Configuration Tool 318 ACT_close Synopsis #include “act.h” int ACT_close(ACT_handle h) Description Returns Close a MySQL connection and release previously used resources. Upon successful completion, this function will return a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. 19.3.5 API cleanup functions Below is a list of functions providing memory cleanup. They should be used by client programs to ensure efficient memory usage. ACT_destroySystem Synopsis #include “act.h” int ACT_destroySystem(ACT_system *i) Description Returns Cleanup memory associated with a system. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ACT_destroySystemArray Synopsis #include “act.h” int ACT_destroySystemArray(ACT_system *i, int size) Description Returns Cleanup memory associated with an array of systems. The parameter size defines the number of systems to be destroyed. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ALICE DAQ and ECS manual Application Programming Interface 319 ACT_destroyItem Synopsis #include “act.h” int ACT_destroyItem(ACT_item *i) Description Returns Cleanup memory associated with an item. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ACT_destroyItemArray Synopsis #include “act.h” int ACT_destroyItemArray(ACT_item *i, int size) Description Returns Cleanup memory associated with an array of items. The parameter size defines the number of items to be destroyed. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ACT_destroyInstance Synopsis #include “act.h” int ACT_destroyInstance(ACT_instance *i) Description Returns Cleanup memory associated with an instance. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ACT_destroyInstanceArray Synopsis #include “act.h” int ACT_destroyInstanceArray(ACT_instance *i, int size) Description Cleanup memory associated with an array of instances. The parameter size defines the number of instances to be destroyed. ALICE DAQ and ECS manual ALICE Configuration Tool 320 Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. 19.3.6 ACT READ access functions Below is a list of functions providing READ access to the ACT database. All functions should receive as parameter h an handle to the DB connection previously created by a call to the ACT_open function. ACT_getSystems Synopsis #include “act.h” int ACT_getSystems(ACT_handle h, ACT_t_systemCategory category, ACT_system **systemsArray, int *systemsNumber) Description Retrieve the list of all systems. The systems are stored in the systemsArray parameter (NULL if no systems are found) and the number of retrieved systems in the systemsFound parameter. The category parameter can be used to restrict the retrieved systems to a given system category. To retrieve all systems, use ACT_system_any. NOTE: After being used, the systems should be destroyed via the ACT_destroySystemArray function. Returns Upon successful completion, this function will return a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ACT_getSystemsToUpdate Synopsis #include “act.h” int ACT_getSystemsToUpdate(ACT_handle h, ACT_t_itemCategory category, ACT_system **systemsArray, int *systemsFound) Description Retrieve the list of systems with items for which an update has been requested. The systems are stored in the systemsArray parameter (NULL if no systems are found) and the number of retrieved systems in the systemsFound parameter. The category parameter can be used to restrict the considered items to a given item category. To query all item categories, use ACT_item_any. Disabled systems and items are ignored. NOTE: After being used, the systems should be destroyed via the ACT_destroySystemArray function. ALICE DAQ and ECS manual Application Programming Interface Returns 321 Upon successful completion, this function will return a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ACT_getSystemParamInt Synopsis #include “act.h” int ACT_getSystemParamInt(ACT_handle h, const char *system, ACT_t_systemParams param, int *value) Description Returns Retrieve an integer parameter of a given system. The corresponding value is stored in the value parameter. Upon successful completion, this function will return a value of zero. Otherwise, the following value will be returned: 1: parameter’s value is NULL. < 0: error while retrieving the parameter’s value. ACT_getItem Synopsis #include “act.h” int ACT_getItem(ACT_handle h, const char *item, ACT_instance **instancesArray, int *instancesNumber) Description Retrieve the list of defined instances of a given item.The instances are stored in the instancesArray parameter (NULL if no instances are found) and the number of retrieved instances in the instancesNumber parameter. Disabled systems and items are ignored. NOTE: After being used, the instances should be destroyed via the ACT_destroyInstanceArray function. Returns Upon successful completion, this function will return a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ACT_getActiveItem Synopsis #include “act.h” int ACT_getActiveItem(ACT_handle h, const char *item, ACT_instance **instance) Description Retrieve the active instance of a given item. The instance is stored in the instance parameter (NULL if no active instance is found). ALICE DAQ and ECS manual ALICE Configuration Tool 322 Disabled systems and items are ignored. NOTE: After being used, the instance should be destroyed via the ACT_destroyInstance function. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ACT_getActiveItem_bySystem Synopsis #include “act.h” int ACT_getActiveItem_bySystem(ACT_handle h, const char *system, ACT_t_itemCategory category, ACT_instance **instancesArray, int *instancesNumber) Description Retrieve the list of active instances of a given system. The instances are stored in the instancesArray parameter (NULL if no active instances are found) and the number of retrieved instances in the instancesNumber parameter. If the system parameter is NULL, all systems are queried. The category parameter can be used to restrict the considered items to a given item category. To query all item categories, use ACT_item_any. Disabled systems and items are ignored. NOTE: After being used, the instances should be destroyed via the ACT_destroyInstanceArray function. Returns Upon successful completion, this function will return a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ACT_getItemsToUpdate Synopsis #include “act.h” int ACT_getItemsToUpdate(ACT_handle h, const char *system, ACT_t_itemCategory category, ACT_item **itemsArray, int *itemsNumber) Description Retrieve the list of items (ordered by system name and item name) for which an update has been requested. The items are stored in the itemsArray parameter (NULL if no items are found) and the number of retrieved items in the itemsNumber parameter. The system parameter can be used to restrict the considered items to a given system. To query all systems, use NULL. The category parameter can be used to restrict the considered items to a given item category. To query all item categories, use ACT_item_any. ALICE DAQ and ECS manual Application Programming Interface 323 Disabled systems and items are ignored. NOTE: After being used, the items should be destroyed via the ACT_destroyItemArray function. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ACT_isLockedItem Synopsis #include “act.h” int ACT_isLockedItem(ACT_handle h, const char *item, int *countLocks) Description Returns Check if an item is locked. The countLocks parameter stores the number of existing locks for the item, if zero the item is not locked. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. 19.3.7 ACT WRITE functions Below is a list of functions providing WRITE access to the ACT database. All functions should receive as parameter h an handle to the DB connection previously created by a call to the ACT_open function. ACT_updateActivationStatus Synopsis #include “act.h” int ACT_updateActivationStatus(ACT_handle h, const char *item, ACT_t_itemActiveStatus status, const char *comment) Description Update the activation status of a given item (activationStatus field of the ACTitems table). The allowed status transitions are: • ‘update requested’ => ‘applying’ • ‘applying’ => ‘active’ • ‘applying’ => ‘update failed’ The optional comment parameter will update the activationComment field of the ACTitems table. NOTE: If the given item is disabled, an error will be returned. ALICE DAQ and ECS manual ALICE Configuration Tool 324 Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ACT_updateStatusMismatch Synopsis #include “act.h” int ACT_updateStatusMismatch(ACT_handle h, const char *item, int statusMismatch) Description Update the mismatch flag of a given item (statusMismatch field of the ACTitems table). The statusMismatch parameter can have the following values: • 0: the item is in the desired configuration • 1: the item is not in the desired configuration NOTE: If the given item is disabled, an error will be returned. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ACT_lockItem Synopsis #include “act.h” int ACT_lockItem(ACT_handle h, const char *item, const char *source, unsigned int run) Description Returns Lock an item (create a new row in the ACTlockedItems table). The source and run parameters define the element which is locking the item. If a run number is not applicable, zero should be used. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ACT_unlockItem Synopsis #include “act.h” int ACT_unlockItem(ACT_handle h, const char *item, const char *source, unsigned int run) Description Unlock an item (delete one row from the ACTlockedItems table). The source and run parameters define the element for which the lock should be removed. ALICE DAQ and ECS manual Tools 325 Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. 19.4 Tools Below is a list of the available command-line tools providing miscellaneous ACT functionalities. act_check_daemons.csh Synopsis Description act_check_daemons.csh Check if the different ACT DIM services and the corresponding DIM Name Servers are available and reachable. Prior to execution, the following environment variables must be set: • DIMDIR: DIM root directory. • DIM_DNS_NODE: ECS DIM Name Server node. • DCS_DIM_DNS_NODE: DCS DIM Name Server node. Upon successful completion, the tool will print a list of semicolon separated values indicating the status of each checked service: • -1: unknown • 0: not running/reachable • 1: running and reachable The order of the printed values correspond to the following DIM services/Name Servers: Returns • 1: ECS DIM Name Server • 2: ACT_UPDATE_REQUEST_SERVER (ACT Update Request Server) • 3: ecsDedicatedDaemon (EDD) • 4: ECS DIM Name Server • 5: dcsDedicatedDaemon (DDD) • 6: ACT_UPDATE (ACT Bridge) • 7: PVSSSys211Man4:DIMHandler (DCS Run Control Tool) Upon successful completion, this command returns a value of zero. Otherwise, 1 will be returned. ALICE DAQ and ECS manual ALICE Configuration Tool 326 act_compare_partitions Synopsis Description act_compare_partitions Check if partition definitions in ECS and ACT are consistent. Prior to execution, the following environment variables must be set: • ECS_DB_MYSQL_HOST: ECS database host. • ECS_DB_MYSQL_DB: ECS database name. • ECS_DB_MYSQL_USER: ECS database username. • ECS_DB_MYSQL_PWD: ECS database password. • ACT_DB: ACT database credentials in the “USERNAME:PASSWORD@HOSTNAME/DBNAME” format. Upon successful completion, the tool will print the discrepancies found, if any. Returns Upon successful completion, this command returns a value of zero. Otherwise, the following values will be returned: -1: a mandatory environment variable is not set. -2: an error occurred while retrieving information from the ACT DB. act_ddd_dummy_dcs_rct Synopsis Description act_ddd_dummy_dcs_rct Emulates the RCT, allowing test setups to be fully functional. Prior to execution, the following environment variable must be set: • Returns DCS_DIM_DNS_NODE: DCS DIM Name Server node. Upon successful completion, this command returns a value of zero. Otherwise, the following values will be returned: 1: DCS_DIM_DNS_NODE environment variable is not set. 2: DIM_DNS_NODE environment variable could not be set to the value provided in DCS_DIM_DNS_NODE. ALICE DAQ and ECS manual Graphical User Interface 327 19.5 Graphical User Interface 19.5.1 Overview The ACT’s Web-based GUI was developed using modern Web technologies, including PHP5, Javascript and Cascading Style Sheets (CSS). It uses the PHP Zend Framework to implement a Model-View-Controller (MVC) architecture. It is hosted on an Apache web server and can be accessed from the experimental area (inside the experiment's technical network), the CERN General Purpose Network (GPN) and the internet. 19.5.2 Authentication and Authorization Authentication is implemented via the CERN Authentication central service, providing Single Sign On (SSO) and removing the effort of authenticating the users from the ACT software. This way, when a user tries to access the GUI, he is redirected to the CERN Login page where it has to provide his credentials. If successful, he is then redirected back to the GUI. Authorization is based on CERN’s egroups. In order to access the GUI, users must be members of the ALICE-ACT egroup. 19.5.3 Expert Mode The Expert Mode section is used to populate the ACT DB, allowing system experts and ACT administrators to create and modify systems, items and instances. It also allows the execution of different actions and the display of different status tables. 19.5.3.1 Actions The following actions are available: • Send Configuration Request: send a configuration request by executing the act_update_request command. This will affect all items with activation status equal to ‘update requested’. • Put all Detectors in Standalone: activate, for all detectors, the instance that defines the detector as being in standalone. • Unlock all Items: remove all item locks (delete all entries of the ACTlockedItems table). • Remove all Status Mismatch: remove all status mismatch flags (set to zero the statusMismatch field of the ACTitems table for all items) • Put all Items in “Active”: change the activation status of all items to ‘active’. • Disable all Partition CTP Configurations: deactivate, for all items defining a CTP configuration for a given partition, the currently active instance. ALICE DAQ and ECS manual ALICE Configuration Tool 328 19.5.3.2 Status The following status reports are available: • Activation Status: display, for each item, the current activation status. Additional item information, such as the lock status and the currently active instance, is also displayed. • Locked Items: display the list of locked items. • Running Partitions: display the list of running partitions. • Status Mismatch: display the list of items with the status mismatch flag set. 19.5.4 Run Coordination Mode The Run Coordination Mode section is used to configure ALICE during data taking periods, allowing the Run Coordination and the Shift Leaders to globally configure the different ALICE sub-detectors and systems. 19.5.4.1 Partitions The Partitions subsection allows users to configure the different ECS partitions using a graphical wizard. These configurations can be saved for later reuse, thus allowing for an easier and faster ACT usage). The following modes are available: • Fully: all partition components (readout detectors, trigger detectors, CTP, DAQ and HLT). • Readout Detectors: only the partition’s readout detectors (including the selected detectors DCS configuration). • Readout Detectors (without DCS Configuration): only the partition’s readout detectors (without including the selected detector’s DCS configuration). • CTP: only the partition’s CTP configuration (including the corresponding trigger detectors DCS configuration). • CTP (without Trigger Detectors): only the partition’s CTP configuration (without including the corresponding trigger detectors DCS configuration). • HLT: only the partition’s HLT configuration. • DAQ: only the partition’s DAQ configuration. Additionally, the following actions can also be executed for each defined partition: • Change CTP Configuration source to “ACT database”/“Local file”: toggle the CTP configuration mode between ACT and local. • Enable/Disable “Ignore ACT Pending Actions”: enable/disable the “Ignore ACT pending actions” option in the partition’s PCA Human Interface. NOTE: while a partition is running, no configurations nor actions can be executed (a configuration in mode “Fully” can still be defined and saved). ALICE DAQ and ECS manual Graphical User Interface 19.5.4.2 329 Detectors The Detectors subsection allows users to individually configure the different ALICE detectors. The following actions are available: • Put in Standalone: put the detector in standalone (if included in an ECS partition, the detector will be excluded). • Change Partition: include the detector in an ECS partition (or put it in standalone). • Configure: configure the detector. NOTE: while a detector is running or being configured, no actions can be executed. 19.5.4.3 CTP The CTP subsection allows users to configure the CTP. At the end of the configuration the CTP processes will be restarted, therefore this configuration can only be performed when no runs are active. NOTE: this subsection executes a global CTP configuration and is therefore different from the CTP partition configuration described in Section 19.5.4.1. ALICE DAQ and ECS manual 330 ALICE Configuration Tool ALICE DAQ and ECS manual Part III DDL and D-RORC software Reference Manual November 2010 ALICE DAQ Project DDL and D-RORC DDL and D-RORC stand-alone software 20 This chapter describes several software tools that allow using the RORC device in a stand-alone manner. ALICE DAQ and ECS manual 20.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 20.2 Test programs for the RORC, DIU and SIU . . . . . . . . . . 335 20.3 Front-end Control and Configuration (FeC2) program . . . 344 20.4 DDL Data Generator (DDG) program . . . . . . . . . . . . . 352 20.5 Stand-alone installation . . . . . . . . . . . . . . . . . . . . . 361 DDL and D-RORC stand-alone software 334 20.1 Introduction The DATE kit provides the readout software to perform long-term high-volume data taking with several RORC devices in an LDC (see Chapter 6 and Chapter 7). Some software is also provided to use the RORC device in a stand-alone manner, which is useful to facilitate the installation procedure, to help debugging in case of problems, and to exploit the supplementary features of the RORC as a test device for DATE. This stand-alone software covers four areas: • The various test programs for the DDL and the RORC allow the user to identify the DDL and RORC components, to reset them, to check their status, and to execute a simple data taking task. The most important utility programs are described in Section 20.2. • The Front-end Control and Configuration (FeC2) interpreter program allows the user to utilize the “backward” channel of the RORC device to send commands and data blocks to the front-end electronics. The short description of this program along with the review of the FeC2 script language is described in Section 20.3. • The DDL Data Generator (ddg) program allows the user to operate the RORC as a device to generate simulated event fragments as they would be produced from some front-end electronics. The handling of this program is described in Section 20.4. The DDG software is used for testing the DATE system. • All functionalities of the of the stand-alone software are available as C API as well. One application is to call these C routines to configure and control the front-end electronics instead of using FeC2 scripts. This might be the better choice when aiming at more complex and interactive software for testing detector electronics. The C API documentation is available in Chapter 21. The precondition to run this stand-alone software is a loaded physmem and rorc_driver kernel modules. The installation procedure is explained in Section 20.5. When the DDL and RORC software is installed via the DATE kit, all the programs and scripts are located in the directories /date/rorc/Linux and /date/physmem/Linux. If several RORC devices are in place, the DATE readout program and the stand-alone programs can run simultaneously on different channels. This is possible because two physmem devices (see Chapter 15) are used: DATE memory banks access this memory via the /dev/physmem1 device, and the stand-alone RORC programs via the /dev/physmem0 device. Some programs have an option to access /dev/physmem1 device as well; this option can be used only when DATE is not running. The different parts of DDL (RORC, DIU, SIU) are described in Chapter 7. Further documentation the can be found at the Web site http://cern.ch/ddl/ddl_docu.html. Besides, each program prints a short explanation of usage via the -h or --help options. ALICE DAQ and ECS manual Test programs for the RORC, DIU and SIU 335 20.2 Test programs for the RORC, DIU and SIU In this chapter the most important test stand alone programs are presented. A very detailed description of all test programs can be found in RORC Library User’s Manual at http://cern.ch/ddl/rorc_docu.html page. rorc_find and rorc_qfind Synopsis rorc_find rorc_qfind Parameters: none Description The rorc_find and rorc_qfind programs list the type and hardware identification of the RORC cards plugged in the machine and of the DIUs, either plugged on or integrated in the RORCs. The rorc_find program tries to open all the RORC devices and reads from their configuration EPROM the hardware identification. If one RORC channel is in use, it can not be opened, so its feature will not be listed. The rorc_qfind program reads the /proc/rorc_map process file prepared by the rorc_driver in boot time. It shows all RORC channels and the process id’s as well if a channel is in use. The type of the RORC device could be pRORC (PCI revision: 1), D-RORC (PCI rev: 2), integrated D-RORC (PCI rev: 3), D-RORC version 2 (PCI-X version, rev: 4) and PCI Express D-RORC (PCI rev: 5). ALICE DAQ and ECS manual DDL and D-RORC stand-alone software 336 Examples Listing 20.1 Example of rorc_find program > rorc_find The following device(s) found: ----------------------------------------------------------------------Minor Channel Device type and HW identification ----------------------------------------------------------------------0 0 integ. DRORC2 DRORC2 2v1 INT. LD: EP2S30 S/N: 03034 DRORC2 2v1 INT. LD: EP2S30 S/N: 04021 embedded DIU 1 1 embedded DIU 0 integ. DRORC2 embedded DIU 1 embedded DIU ----------------------------------------------------------------------4 RORC channel(s) not in use was found. RORC driver reported 2 RORC device(s). Listing 20.2 Example of rorc_qfind program > rorc_qfind The following device(s) found: ------------------------------------------------------Minor PCI_rev Com/Status Speed Hw_s/n Fw_ID PID_0 PID_1 ------------------------------------------------------0 4 0x04100147 100 03034 2.12 0 0 1 4 0x04100147 100 04021 2.12 0 0 ------------------------------------------------------2 RORC device(s) with 4 channel(s) was found. rorc_reset Synopsis Description rorc_reset [-{M|m} ] [-{C|c} ] [-D|d|B|b|S|s|F|f|O|o|E|e|N|n] The rorc_reset program initializes a RORC and/or a DDL channel. Depending on the given program options, the program resets the different parts of the RORC, the DIU or the SIU. Resetting the RORC means emptying all its FIFOs, clearing all error bits, and putting all programmable features to their reset value. Resetting the ALICE DAQ and ECS manual Test programs for the RORC, DIU and SIU 337 DIU or the SIU means cutting the DDL link and putting all programmable features to their reset value; afterwards the DDL link automatically re-establishes itself. Parameters and switches: • the parameter RORC_minor defines the minor device number of the RORC in case there are several cards. The associated device file is /dev/prorcN, where N is the minor device number starting from 0. The default minor device number is 0. • the parameter DDL_channel chooses the channel (0 or 1) in case of an integrated D-RORC. The default channel is 0. • the switch -D or -d resets the DIU. • the switch -B or -b resets both the RORC and the DIU. • the switch -S or -s resets the SIU. • the switch -F or -f clears the rorcFreeFifo. • the switch -O or -o clears the other FIFOs of the RORC. • the switch -E or -e clears the error bits of the RORC. • the switch -N or -n clears the byte counters of the RORC. • in case no reset switch is given, only the RORC is reset. Before exiting the program writes “RORC reset OK”. It means the requested reset command is sent, but the success of the command is not checked. The user can test how successful the reset was by calling the rorc_status, diu_status, or siu_status program. rorc_id, diu_id, siu_id Synopsis rorc_id [-{M|m} ] [-{C|c} ] [-V -v -{P|p} -{S|s} -{N|n} ] [-{D|d}] [-{T|t} ] diu_id [-{M|m} ] [-{C|c} ] [-V -v -P -B -S ] siu_id [-{M|m} ] [-{C|c} ] [-V -v -P -B -S ] Description The rorc_id program reads and displays the type, the hardware and the firmware identification of the RORC. It can display the DIU or SIU firmware identification as well. Finally the program informs whether the program library version and the RORC firmware are compatible. ALICE DAQ and ECS manual DDL and D-RORC stand-alone software 338 The type of the RORC is written as “RORC revision number”. It is a number read from the PCI configuration space. If its value is 1, the device is a pRORC, if 2 then the device is a D-RORC having one DDL channel, if 3 then the device is a dual channel integrated D-RORC, if 4 then the device is a version 2 D-RORC (PCI-X version), and if 5 then the device is a PCI Express D-RORC. The hardware identification word of the RORC contains the hardware release date and version number. The firmware identification word of the RORC contains the firmware release date, the firmware version, and the size of the rorcFreeFifo. The DIU or SIU firmware identification words contain the firmware release date and version number. To get the DIU or SIU firmware identification the program sends a command. The SIU firmware identification can be asked only if the link is up. After waiting as many microseconds for the answer as specified in time-out parameter, the program interprets and displays the reply. The rorc_id program can be used for writing the RORC hardware identification into the RORC with the help of -V, -v, -P, -S and -N switches. This feature is intended for RORC developers only. For writing the hardware identification a special resistor must be soldered. If this resistor is soldered out, the hardware identification cannot be changed. Because of this the usage of these switches will not be explained here. The diu_id program reads and displays the hardware identification words of the DIU. The DIU hardware identification word contains the card major and minor version numbers (e.g. 2.0), the PLD version code (e.g. 20K60E), the card speed version (e.g. 2125 Mbps), and the card serial number. If the major version number is 1 then the card is a prototype (old) DDL card, if it is 2 then the card is the final (new) card. Note: For embedded DIUs, i.e. when the DIUs are integrated onto the D-RORC card, the DIU does not have separated hardware identification. It is identified as a one channel of the RORC card. The siu_id program reads and displays the hardware identification words of the SIU. The SIU hardware identification word contains the card major and minor version numbers (e.g. 2.0), the PLD version code (e.g. 20K60E), the card speed version (e.g. 2125 Mbps), and the card serial number. If the major version number is 1 then the card is a prototype (old) DDL card, if it is 2 then the card is the final (new) card. The SIU hardware identification words can be read only if the link is up. The diu_id and siu_id programs can be used for writing this information into the DIU’s or SIU’s memory as well. This feature is made for DDL developers only. For writing the hardware identification a special resistor must be soldered in the card. If this resistor is soldered out, the hardware identification cannot be changed. Because of this the usage of -V, -v, -P, -B and -S switches will not be explained here. Parameters and switches: • the parameter RORC_minor defines the minor device number of the RORC in case there are several cards. The associated device file is /dev/prorcN, where N is the minor device number starting from 0. The default minor device number is 0. • the parameter DDL_channel chooses the channel (0 or 1) in case of an integrated D-RORC. The default channel is 0. ALICE DAQ and ECS manual Test programs for the RORC, DIU and SIU 339 • the switch -D or -d displays in addition the hardware and firmware identifications of the DIU and SIU. • the parameter time-out defines the waiting time in microseconds for the DIU and SIU responds. The default value is 1000 microseconds. rorc_status, diu_status, siu_status Synopsis rorc_status [-{M|m} ] [-{C|c} ] diu_status [-{M|m} ] [-{C|c} ] [-{T|t} ] [-{V|v} ] siu_status [-{M|m} ] [-{C|c} ] [-{T|t} ] [-{V|v} ] Description The rorc_status program besides displaying the same information as rorc_id, reads the type of the RORC, the control/status and error registers and displays information about RORC status (e.g. working mode, rorcFreeFifo status, link status, flow control status) and errors. The type of the RORC is written as “RORC revision number”. It is a number read from the PCI configuration space. If its value is 1, the device is a pRORC, if 2 then the device is a D-RORC having one DDL channel, if 3 then the device is a dual channel integrated D-RORC, if 4 then the device is a version 2 D-RORC (PCI-X version), and if 5 then the device is a PCI Express D-RORC. The diu_status program sends a command to the DIU, waits for its reply and displays the DIU status. The program displays the hardware and firmware identifications of the DIU as well. The siu_status program sends a command to the SIU, waits for its reply and displays the SIU status. The SIU receives the commands and replies only if the link is up. The program displays the SIU hardware and firmware identifications as well. Parameters and switches: • the parameter RORC_minor defines the minor device number of the RORC in case there are several cards. The associated device file is /dev/prorcN, where N is the minor device number starting from 0. The default minor device number is 0. • the parameter DDL_channel chooses the channel (0 or 1) in case of an integrated D-RORC. The default channel is 0. • the parameter time-out defines the waiting time in microseconds for the DIU responds. The default value is 1000 microseconds. • the parameter diu_version chooses the version of the DIU. The value can be 1 for the prototype version, or 2 for the final version (plugged or embedded). The default value is 2. ALICE DAQ and ECS manual DDL and D-RORC stand-alone software 340 rorc_receive Synopsis Description rorc_receive [[{-M|-m|--minor} ] |{-r|--revision} {-n|--serial} ] [{-C|-c|--channel} ] [-v|--verbose] [{-G|-g|--generator} ] [-D|-d|--no_scatter] [{-R|--reset_lev} ] [{-X|-x|--check} ] [-Y|-y|--DDL_header] [-Z|-z|--no_RDYRX] [{-P|--phys_minor} ] [{-B|-b|--page} ] [{-U|-u|--physmem} ] [{-O|-o|--offset} ] [{-E|-e|--events} ] [{-I|-i|--init_word} ] [{-p|--pattern} {c|a|0|1|i|d| }] [{-S|-s|--stat_file} ] [{-L|-l|--length} ] [{-J|-j|--rand_len} ] [{-N|-n|--init_count} ] [{-F|--max_fifo} ] [{-f|--min_fifo} ] [{-T|--sleep_time} ] [{-t|--load_sleep} ] [{-Q|--byte_print} ] [{-q|--page_print} ] [{-K|--output_file} ] [{-k|--binary_output} ] [{-A|-a|--front_end} ] The rorc_receive program receives fragments from the DDL link or the internal data generator of the RORC. It uses the physmem package for allocating the memory blocks where the data is stored. The program compares word by word the received data with its expected value. It also checks whether the fragment length in the DTSW word matches the actual fragment length. This program provides a functional test of the RORC and DDL hardware/software. The rorc_receive program can be executed for several RORCs in parallel. In this situation a distinct region of the physmem memory must be assigned to each running rorc_receive process. The assignment of physmem memory can be done via program options (with the switches -U and -O). Parameters and switches: • the parameter RORC_minor defines the minor device number of the RORC in case there are several cards. The associated device file is /dev/prorcN, where N is the minor device number starting from 0. The default minor device number is 0. ALICE DAQ and ECS manual Test programs for the RORC, DIU and SIU 341 • the parameter revision is the RORC’s PCI revision number. It must be < 6. • the parameter serial is the RORC’s hw serial number. If given, the RORC is identified by the revision and serial not by the parameter RORC minor. • the parameter DDL_channel chooses the channel (0 or 1) in case of an integrated D-RORC. The default channel is 0. • the switch -v prints details for debugging (verbose mode). Note that the switch -V is not implemented. • the switch -G or -g enables the internal data generator of the RORC and the parameter loop-back mode selects the in loop-back location. The accepted values are: 0: do not loop-back, thus sent data via the link. 1: set the loop-back inside the DIU. 2: set the loop-back inside the SIU. any other value: set the loop-back inside the RORC. The default value for the parameter loop-back mode is 0. • the switch -D or -d enforces that the received fragments are not scattered in the physmem memory. Every event page will be written on the same physical address, hence pages will be overwritten by each other. • the parameter reset_level defines which RORC and DDL elements are reset. The accepted values are: 0: do not reset the RORC, neither the DIU nor the SIU. 1: reset the RORC only. 2: reset the RORC and the DIU, but not the SIU 3: reset the RORC, the DIU and the SIU before collecting data. The default value for the parameter reset level is 3. • the parameter check_level defines which parts of the received fragment are checked. The pattern for checking is given by the parameter pattern. The accepted values are: -1: do not stop if DTSTW problem occurs and do not check the fragment 0: do not check the fragment 1: check the first word of the fragment 2: check the fragment, expect the first word 3: check the whole fragment The default value for the parameter check_level is 3. • the switch -Y defines that the received fragment contains the Common Data Header (see Section 3.9). The contents of this header is not checked. • the switch -Z prevents sending the RDYRX and EOBTR commands. This switch is implicitly set when the -G or -A switch is used. • the parameter physmem_minor defines the minor device number of the physmem memory device. It can be 0 for /dev/physmem0 device, or 1 for /dev/physmem1. The latter can be used only when DATE is not running in the same RORC channel. The default physmem device is /dev/physmem0. • the parameter page_length defines the size in bytes of the memory blocks (data pages). The default page size is 4096 bytes. • the parameter usable_memory defines the amount in MB of the memory requested from the /dev/physmemN (N = 0 or 1) device. The default amount is 30 MB. • the parameter memory_offset defines the offset in MB of the memory ALICE DAQ and ECS manual DDL and D-RORC stand-alone software 342 requested from the /dev/physmemN (N = 0 or 1) device relative to its base address. The default offset is 0 MB. • the parameter events defines the number of fragments to be read, or to be generated if the switch -G is used. The value 0 specifies an unlimited number of fragments, which is also the default value. • the parameter init_word sets the second word of each fragment when the switch -G or -g is used. The default value for this parameter is 0. The first word of each fragment is an incrementing counter value starting from 1. • the switch -p or --pattern sets the data pattern of each fragment when the switch -G or -g is used. The accepted possibilities are: ‘c’: use constant data given by parameter init word. ‘a’: use alternating data. ‘0’: use flying 0 data starting from 0xfffffffe. ‘1’: use flying 1data starting from 0x00000001. ‘i’: use incremental data starting with parameter init word. ‘d’: use decremental data starting with parameter init word. mif file name: Memory Initialization File (.mif). It can define a complete fragment. For the MIF description see e.g. http://www.mil.ufl.edu/4712/docs/mif_help.pdf The default character for the pattern is ‘i’. • the parameter stat_file defines the name of the file where the number of bytes transferred is written. If given, and the file already exist, the program adds the number of transferred bytes to the value already in the file. • the parameter data_length defines either the size of the expected largest fragment, or the maximum size of the generated fragment by the internal data generator of the RORC when the switch -G is set. The size is given in words (4 bytes). The default size is 524287 words. • the parameter random_seed defines the seed value for the generation of fragments of random length by the internal data generator of the RORC when the switch -G is set. If the parameter is set, the minimum length is 1 and the maximum length is the parameter data length rounded down to the nearest integer of power of 2. Using 0 as parameter value, the fragments have constant size. This is also the default value. • the parameter initial_count defines the event count (first data word) of the first fragment. The default value for this parameter is 1. • the parameter max_FIFO controls the filling of the rorcFreeFifo. It defines the maximal number of entries, hence it stops filling if the number of entries is reaches this value. The maximal value is 128, which is also the default value. • the parameter min_FIFO controls the filling of the rorcFreeFifo. It defines the minimal number of entries, hence it starts filling if the number of entries is lower then this value. The default value for this parameter is 127. • the parameter sleep_time defines the waiting period in milliseconds after each received fragment in order to simulate a loaded LDC. The default period is 0 milliseconds. • the parameter load_sleep_time defines the waiting period in milliseconds before each time new physmem address is loaded into the RORC’s Ready FIFO. The default period is 0 milliseconds. • the parameter wait_time defines the waiting period for command responses ALICE DAQ and ECS manual Test programs for the RORC, DIU and SIU 343 in microseconds. The default period is 1000 microseconds. Examples • the parameter GBs_to_print specifies that whenever this number of received data is transferred, the total number of received GB is printed. The default value is 1 GB. • the parameter pages_to_print specifies that whenever this number of received pages is transferred, the total number of received pages is printed. The default value is 0, hence no page printout. • the parameter output_file defines the name of the file where the received fragments are dumped as a text file. Each fragment starts with comment lines (indicated by the ‘#’ character) that contains the event number (i.e. event fragment number) followed by the block number and the block length in 4-byte words, and then follow lines where each one shows a data word in hexadecimal format. This sequence repeats for each block and event fragments. The data words are not checked, hence this parameter implies the switch -x 0. • the parameter binary_output_file defines the name of the file where the received fragments are dumped in binary format. The format of the fragments is the same as that of the text file (see the previous parameter). The comment lines are dumped as ASCII (together with the new line characters), while the data is dumped in binary. So the binary file is much smaller than the text one. At the same time in hexadecimal dump the event fragments can be easily found. The data words are not checked, hence this parameter implies the switch -x 0. • the parameter FEE_address enforces that the data reading is carried out with the Start Block Read (STBRD) command. It defines the front-end address which is part for the STBRD command. This parameter is mandatory when the -A or -a switch is set. > rorc_receive -m 1 -c 0 -g 3 -p i -l 1000 -x 3 Uses the internal data generator of the RORC with minor device number 1 and channel 0. The generated incremental data words have a fixed size of 1000 words, whereas the whole fragment is checked. > rorc_receive -m 2 -c 1 -o 60 -z -r 2 -K /tmp/data.raw Reads fragments from channel 1 of the dual channel D-RORC with minor device number 2. The RORC and the DIU are reset, but not the SIU. The RDYRX is not sent. The data words are dumped to the file /tmp/data.raw without checks. The physmem memory is utilized between 60 MB and 90 MB relative to its base address. ALICE DAQ and ECS manual DDL and D-RORC stand-alone software 344 20.3 Front-end Control and Configuration (FeC2) program 20.3.1 General description of the FeC2 program FeC2 Synopsis Description Parameters and switches: FeC2 [-{M|m} | -{R|r} -{N|n} ] [-{C|c} ] [-{P|p} ] [-{F|f} ] [-{L|l} ] [-{O|o} ] [-{U|u} ] [-{T|t} ] [-S|-s] [-v] [-H|-h] The FeC2 program can be used for controlling and configuring the Front-end Electronics (FEE) via the DDL. It downloads commands and data blocks to the FEE, and it reads status and data blocks from the FEE. The user needs to write a script file following the FeC2 syntax. • the parameter RORC_minor defines the minor device number of the RORC in case there are several cards. The associated device file is /dev/prorcN, where N is the minor device number starting from 0. The default minor device number is 0. • the parameter revision is the RORC’s revision number. It must be < 6. • the parameter serial is the RORC’s hardware serial number. If given, the RORC is identified by the revision and serial, not by the parameter RORC_minor. • the parameter DDL_channel chooses the channel (0 or 1) in case of an integrated D-RORC. The default channel is 0. • the parameter physmem_minor defines the minor device number of the physmem memory device. It can be 0 for /dev/physmem0 device, or 1 for /dev/physmem1. The latter can be used only when DATE is not running in the same RORC channel. The default physmem device is /dev/physmem0. • the parameter FeC2_script_file defines the name of the script file to be interpreted. The default name is FeC2.scr. The syntax of FeC2 script files is described in Section 20.3.2. • the parameter log_file defines the name of the log file. If not given, the standard output (stdout) stream is used. • the parameter mem_offset defines the offset in MB of the memory requested from the /dev/physmem0 device relative to its begin. The default offset depends on the parameters RORC minor and DDL channel in the following way: mem offset = (RORC_minor * 2 + DDL_channel) * 8. Hence for each channel a separate block of 8 MB physmem memory is assigned, which allows to run several FeC2 scripts in parallel on the same machine. ALICE DAQ and ECS manual Front-end Control and Configuration (FeC2) program 345 • the parameter mem_size defines the size in MB of the memory requested from the /dev/physmem0 device. The default amount is 8 MB in accordance with the above scheme of the default value for the parameter mem_offset. • the parameter DDL_timeout defines the waiting time in microseconds for the DDL commands. The default value is 1000 microseconds. • the switch -S or -s enables the use of shared memory to accelerate the download of data blocks that have been written beforehand into files. When this switch set, each file is stored into shared memory, so that the next time the file will be used, it will be retrieved from memory. The usage of the shared memory is as follows: - Each DDL channel has its own shared memory segments. - The maximum number of channels per LDC is 16. - The maximum number of files per channel is 15420. - The maximum length of a file name is 255 characters. - The maximum number of shared memory segments per channel is 127. - Each shared memory segment can host 4 MB minus 8 bytes for administration. The following utility program can be used to remove the shared memory segments, where the switch -x just scans them: clean_shm [-m ] [-c ] [-x] • the switch -v prints details for debugging (verbose mode). Note that the switch -V is not implemented. • the switch -H or -h prints a short help message. 20.3.2 Syntax of script files for the FeC2 program The instruction and its parameters can be separated by space(s) or tabulator(s). Any parameter can be an environment variable. In this case the name must start with a $ character. Each instruction should be written in one line. Any number of empty lines is allowed. Lines starting with a ‘#’, ‘*’ or ‘;’ character are considered as comment. After the character ‘;’ or ‘//’ the remaining part of any line is considered as in-line comment. Comment lines, in-line comments and empty lines can be used in data files as well. All instructions will be executed sequentially up the end of the script file, or until reaching a return/stop command, or till the occurrence of an error. 20.3.2.1 FeC2 instructions related to the DDL reset Synopsis Description reset [RORC | DIU | SIU] Reset the given element of the DDL link. If no parameter is given, then the RORC is reset. ALICE DAQ and ECS manual DDL and D-RORC stand-alone software 346 read_DDL_status Synopsis Description read_DDL_status Reads and prints the DIU and SIU status. This command is executed only if the -v option is switched on. write_RDYRX Synopsis Description write_RDYRX Send an RDYRX command to the FEE. write_EOBTR Synopsis Description write_EOBTR Send an EOBTR command to the FEE. write_command Synopsis Description Parameter: write_command Send a DDL command to the FEE. • the parameter command_code is a hexadecimal number (maximum 19 bits). write_block Synopsis write_block [ ] Description First send the address to the FEE, and then send the block of data to the FEE. Parameters: • the parameter address is a hexadecimal number (maximum 19 bits) within the address space of the FEE to which the data block is sent. • the parameter file_name is the name of the file where the block of data is stored. The maximum length of this file is (2^19 -1 = 524287) words. • the parameter format specifies in C style format (e.g. “%x”) the reading mode of the words from the file. If omitted, the binary mode is used. ALICE DAQ and ECS manual Front-end Control and Configuration (FeC2) program 347 write_block_multiple Synopsis write_block_multiple [ ] Description First read the data from the file called file_name and divide it into sub-blocks of block_size words length. For each sub-block send the incremented address to the FEE followed by the data, thus the first sub-block goes to FEE_address, the second sub-block to FEE_address + block_size, the third sub-block to FEE_address + 2 * block_size, and so forth. At the end of each sub-block send a status read request to the poll address and compare the reply (after applying mask as bitwise AND operation) with the value status. Repeat the status read request until an exact match happens or the time-out is expired. In the latter case stop looping and set the “check_fail” flag (see Section 20.3.2.2). The length of the file needs to correspond with the length expected for the given FEE address. The maximum length allowed is 2^19 -1 = 524287 words. Parameters: • the parameter poll_address is a hexadecimal number (maximum 19 bits) within the address space of the FEE to which the status read request command is sent. • the parameter status is the compare value (maximum 19 bits) for the check. • the parameter mask is applied as bitwise AND operation to the received value from the FEE before the comparison against the parameter status is done. • the parameter time-out defines the maximum duration in microseconds to repeat the polling operation. • the parameter FEE_address is a hexadecimal number (maximum 19 bits) within the address space of the FEE to which the data block is sent. • the parameter block_size is the number of words of sub-blocks to be sent before the next status check. • the parameter file_name is the name of the file where the block of data is stored. The maximum length of this file is (2^19 -1 = 524287) words. • the parameter format specifies in C style format (e.g. “%x”) the reading mode of the words from the file. If omitted, the binary mode is used. read_and_print Synopsis read_and_print [ ] Description Send a command to the FEE and print the received value. Parameters: • the parameter address is a hexadecimal number (maximum 19 bits) within the address space of the FEE to which the status read request command is sent. • the parameter format specifies in C style format (e.g. “%x”) the printing mode of the received value. • the parameter stream defines the file name where to append the output. If omitted, the parameter log_file from the FeC2 calling sequence is used, ALICE DAQ and ECS manual DDL and D-RORC stand-alone software 348 otherwise the standard output is used. read_and_check Synopsis read_and_check Description Send a command to the FEE and check the received value. If the check fails, the “check_fail” flag is set (see Section 20.3.2.2). Parameters: • the parameter address is a hexadecimal number (maximum 19 bits) within the address space of the FEE to which the status read request command is sent. • the parameter status is the compare value (maximum 19 bits) for the check. • the parameter mask is applied as bitwise AND operation to the received value from the FEE before doing the comparison with the parameter status. read_until Synopsis read_until Description Send a command to the FEE and check the received value. This polling operation is repeated until the check is successful or the time-out is reached. In the latter case, the “check_fail” flag is set (see Section 20.3.2.2). Parameters: • the parameter address is a hexadecimal number (maximum 19 bits) within the address space of the FEE to which the status read request command is sent. • the parameter status is the compare value (maximum 19 bits) for the check. • the parameter mask is applied as bitwise AND operation to the received value from the FEE before doing the comparison with the parameter status. • the parameter time-out defines the maximum duration in microseconds before repeating the polling operation. read_block Synopsis read_block [ ] Description First send the address to the FEE, and then read the block of data from the FEE. The received words are written to the file. The length of the block of data is under the control of the FEE. Parameters: • the parameter address is a hexadecimal number (maximum 19 bits) within the address space of the FEE from where the block of data is read. • the parameter file_name is the name of the file where the received words are written. ALICE DAQ and ECS manual Front-end Control and Configuration (FeC2) program • 349 the parameter format specifies in C style format (e.g. “%x”) the writing mode of the words to the file. If omitted, the binary mode is used. read_and_check_block Synopsis read_and_check_block [ ] Description First send the address to the FEE, and then read the block of data from the FEE. The received words are compared with the ones in the file. The length of the block of data is under the control of the FEE. If the check fails, the “check_fail” flag is set (see Section 20.3.2.2) Parameters: • the parameter address is a hexadecimal number (maximum 19 bits) within the address space of the FEE from where the block of data is read. • the parameter file name is the name of the file which contains the words for comparison. • the parameter format specifies in C style format (e.g. “%x”) the reading mode of the words from the file. If omitted, the binary mode is used. 20.3.2.2 FeC2 instructions related to the program flow define Synopsis Description define Whenever name occurs as parameter of an FeC2 instruction, the value is used instead. The definition of name must appear before its first use. To distinguish between name and numbers, the name must start with a letter, whereas a hexadecimal constants must start with 0x. loop_on, loop_off Synopsis loop_on FeC2 instruction(s) loop_off Description All FeC2 instructions (with some exception) which are between loop_on and loop_off will be repeated loop_number times. The following FeC2 instructions will not be repeated even if they are between loop_on and loop_off: reset, write_RDYRX, write_EOBTR, define, loop_on, loop_off, wait, call_file, return, stop_if_failed, stop. The loop_on can not be nested. A second call of loop_on overwrites the previous loop parameters. Parameters: • the parameter loop_number defines how many times the FeC2 instruction will ALICE DAQ and ECS manual DDL and D-RORC stand-alone software 350 be repeated. If its value is less then 2 the command is equivalent to a loop_off command. Example • the parameter loop_uswait defines how many microseconds to wait at the end of each loop. • the parameter loop_cont_if_error specifies whether the loop should be interrupted if a check in the instruction inside the loop fails. If its value is 0, the looping continues. For any other values the program jumps out of the loop. This concerns only the FeC2 instructions read_and_check, read_and_check_block, and read_until. loop_on 2 0 instruction instruction instruction loop_off 0 1 2 3 is equivalent with the following sequence: instruction instruction instruction instruction instruction instruction 1 1 2 2 3 3 wait Synopsis Description wait The execution is suspended for a period of usecs microseconds. call_file Synopsis Description call_file The execution jumps to the FeC2 script file whose name is file_name. If this file is not found, the execution is stopped. Recursive calls are not allowed. return Synopsis Description return The execution of the current FeC2 script is stopped and if possible the control is returned one level higher. ALICE DAQ and ECS manual Front-end Control and Configuration (FeC2) program 351 stop_if_failed Synopsis Description stop_if_failed [ ] The execution is stopped with the given exit_code (the default one is 1) if in the previous instruction the check has failed. This concerns the following FeC2 instructions: read_and_check, read_and_check_block, read_until, and write_block_multiple. stop Synopsis Description 20.3.2.3 stop [ ] The execution is stopped with the given exit_code (the default one is 0). Example of an FeC2 script Listing 20.3 shows a FeC2 script used to carry out some basic tests on the FEE. Some symbolic names are defined (lines 15-18) and the RORC as well as the SIU are reset (lines 20-21) at the beginning. A command is sent (line 23) to initialize the FEE and the effect is verified (lines 24-25). The status of some registers is read and copied into a file (line 27-28). Finally, a block of data is sent to the FEE (line 30) and read back (line 31) for the purpose of testing. If all tests are passed, the exit code of this script is 0 (line 34). Listing 20.3 Example of an FeC2 script 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: ALICE DAQ and ECS manual # FeC2 script define define define define PATGEN EVLEN STATUS PROBA 0x0 0x100 status.out proba.hex reset RORC reset SIU write_command 0x10f read_until EVLEN 0x0f 0xff 10000000 stop_if_failed -1 read_and_print PATGEN “Patgen status: “x%x” STATUS read_and_print EVLEN “Evlen status: 0x%x” STATUS write_block 0x600 PROBA “%x” read_and_check_block 0x600 PROBA “%x” stop_if_failed -2 stop DDL and D-RORC stand-alone software 352 20.4 DDL Data Generator (DDG) program 20.4.1 General description of the DDG program ddg Synopsis Description ddg [-{F|f} ] [-{L|l} ] [-{P|p} ] [-{S|s} [-{T|t} ] [-{N|n}] [-v] [-H|-h] The ddg program is able to supply data (simulated events) to the dual channel D-RORC card, which acts as data generator for the DDL channels. The program reads the fragments from data files, which need to be generated in advance. It can handle up to 12 DDL channels per machine. Files cannot be shared between the channels. The program can also generate the Common Data Header (see Section 3.9). If this header is enabled, then the event identifiers of all fragments will be synchronized. Several replica of the ddg program can run parallel on the same or on different machines. The CDH of the corresponding fragments (generated with different program replica) remain synchronized. Parameters and switches: • the parameter config_file defines the name of the configuration file, which contains all the parameters describing the fragments to be generated by the program (see Section 20.4.3). The default name is ddg.conf. • the parameter log_file defines the name of log file. If not given, the standard output (stdout) stream is used. • the parameter physmem_minor defines the minor device number of the physmem memory device. It can be 0 for /dev/physmem0 device, or 1 for /dev/physmem1. The latter can be used only when DATE is not running in the same RORC channel. The default physmem device is /dev/physmem0. • the parameter SMI_object defines the associated SMI object in the form :: . The default SMI object is DDG::DDG. • the parameter time-out defines the waiting time in microseconds for the DDL commands. The default value is 1000 microseconds. • the switch -N or -n enforces that the fragments are not scattered in the physmem memory. Only one buffer is used. • the switch -v prints details for debugging (verbose mode). Note that the switch -V is not implemented. • the switch -H or -h prints a short help message. 20.4.2 Behavior of the DDG program The ddg program works in the following way: ALICE DAQ and ECS manual DDL Data Generator (DDG) program 353 1. The program parses the configuration file. Then it initializes the physmem memory, the DIM and the SMI packages. It sets the SMI state to “IDLE”, opens the requested DDL channels, and reads the fragments from the data files into its buffer. Alternatively it generates the fragments, if they are not read from a file. 2. The program waits for the RDYRX command from the DDL channels. It should receive them only when the DATE system at the receiving side has been started and is ready to receive data. If configured, the program resets the specified channels. 3. The program starts sending data upon reception of the SMI command “START”. It sets the SMI state to “RUNNING”. 4. While sending data the program pushes the fragment parameters (buffer’s physical address and length) into the RORC. It checks the status of the ROROC’s FIFOs. When a fragment is transmitted, the program reads the next from file to the memory buffer and pushes the fragment’s parameters into the RORC. 5. After receiving the SMI command “STOP”, the program stops sending data and terminates. 20.4.3 Syntax of the DDG configuration file Any number of empty lines is allowed. Lines starting with a ‘#’, ‘*’ or ‘;’ character are considered as comment. After the character ‘;’ or ‘//’ the remaining part of any line is considered as in-line comment. Each keyword must be written in a separate line. The keywords and their (optional) parameters can be separated by space(s), tabulator(s) or equal sign(s). The use of the keywords is not mandatory. Each keyword has a default value, which is used when the keyword is not specified in the DDG configuration file. The configuration file may even be empty, and in this case all the default values are used. If the same keyword occurs more than once in the configuration file, then the last value is used. This rule applies also for conflicting keywords. 20.4.3.1 Channel independent keywords The following DDG configuration file keywords do not depend on the D-RORC channel. PHYSMEM_OFFSET Synopsis Description PHYSMEM_OFFSET [ ] The parameter memory_offset defines the offset of memory in MB requested from the /dev/physmem0 device relative to its base address.The default offset is 0. ALICE DAQ and ECS manual DDL and D-RORC stand-alone software 354 PHYSMEM_LENGTH Synopsis Description PHYSMEM_LENGTH [ ] The parameter usable memory defines the amount of memory in MB requested from the /dev/physmem0 device. The default amount is 32 MB. DDL_COMMANDS Synopsis Description DDL_COMMANDS Wait for the RDYRX commands before sending data. NO_DDL_COMMANDS Synopsis Description NO_DDL_COMMANDS Do not wait for the RDYRX commands before sending data. If neither the keyword DDL_COMMANDS nor the keyword NO_DDL_COMMANDS is present, then the configuration is done with the keyword NO_DDL_COMMANDS. HEADER Synopsis Description HEADER Generate the Common Data Header (CDH) for each fragment. The configuration of the CDH is specified with a set of DDG keywords (see Section 20.4.3.3). NOHEADER Synopsis Description NOHEADER Do not generate the Common Data Header (CDH) for the fragments. If neither the keyword HEADER nor the keyword NOHEADER is present, then the configuration is done with the keyword NOHEADER. BUNCH_CROSSING_START Synopsis BUNCH_CROSSING_START ALICE DAQ and ECS manual DDL Data Generator (DDG) program Description 355 The parameter start_value is used to calculate the starting value for the orbit_number (24 bits) and the bunch_crossing_number (12 bits) for the CDH of the first fragment for all channels: • orbit_number_first = start_value / 3564 • bunch_crossing_number_first = start_value % 3564 If not present, the parameter start_value is 1. BUNCH_CROSSING_INCREMENT Synopsis Description BUNCH_CROSSING_INCREMENT The parameters min_value and max_value are used to calculate the range of the increment values for the bunch_crossing_number: • bc_min_increment = 2^min_value - 1 • bc_max_increment = 2^max_value - 1 The range for these parameters is between 0 and 31. If not present, the value for the parameter min value is 0, and the value for the parameter max value is 20. BUNCH_CROSSING_SEED Synopsis Description BUNCH_CROSSING_SEED The parameter seed_value is used to initialize the random number generator RANDOM for the calculation of the bunch_crossing_number and orbit_number for the CDH of the fragments (except the first one) for all channels: • bc_increment = RANDOM (bc_min_increment, bc_max_increment) • bunch_crossing += (bc_increment % 3564) • orbit_number += (bc_increment / 3564) • if (bunch_crossing >= 3564) { orbit_number++ bunch_crossing %= 3564 } If not present, the value for the parameter seed value is 1. MAX_EVENT Synopsis MAX_EVENT ALICE DAQ and ECS manual DDL and D-RORC stand-alone software 356 Description 20.4.3.2 The parameter max_event_number defines the maximum number of fragments per channel to be generated. No limitation is given when 0 is used, hence the DDG program terminates when the SMI command “STOP” is received. If not present, the value for the parameter max_event_number is 0. Channel dependent keywords The following DDG configuration file keywords depend on the D-RORC channel. The group of keywords that characterize one channel is introduced by the keyword RORC_CHANNEL. RORC_CHANNEL Synopsis Description RORC_CHANNEL The parameter minor defines the minor device number of the RORC in case there are several cards. The default minor device number is 0. The parameter channel chooses the channel (0 or 1) of a dual channel D-RORC. The default channel is 0. DATA_FILE Synopsis Description DATA_FILE The parameter file name defines the DDG data file (see Section 20.4.4). If this keyword is given, the data words of the generated fragments are supplied from this file. DATA_PATTERN Synopsis Description DATA_PATTERN The parameter pattern sets the data pattern for the generated fragments. The accepted characters of this parameter are the following: - ‘c’: constant data pattern. - ‘a’: alternating data pattern. - ‘0’: flying 0 data pattern. - ‘1’: flying 1data pattern. - ‘i’: incremental data pattern. - ‘d’: decremental data pattern. The default data pattern is ‘i’. If both keywords DATA_FILE or DATA_PATTERN are present, the later one is used. If neither is present, then the configuration is done with the keyword DATA_PATTERN. ALICE DAQ and ECS manual DDL Data Generator (DDG) program 357 INIT_WORD Synopsis Description INIT_WORD The parameter start value in hexadecimal format sets the second data word of each generated fragment when the keyword DATA_PATTERN is present. The default value depends on the selected pattern: - 0xfffffffe: if the pattern is ‘d’. - 0x00000001: if the pattern is ‘i’. - 0x0: for the other patterns. The first word of each generated fragment is an incrementing counter value starting from 1. DATA_LENGTH Synopsis Description DATA_LENGTH The parameter maximum length defines the length in words (4 bytes) of the largest generated fragment when the keyword DATA_PATTERN is present. The range for this parameter is between 1 and 2^24-1. If not present, the value is 2^19-1 = 524287. RANDOM Synopsis Description RANDOM Generate fragments with random length. Their minimal length is 0, and their maximum length is given either by the parameter of the keyword DATA_LENGTH or by the specified length of the fragments in a DDG data file (see Section 20.4.4). NORANDOM Synopsis Description NORANDOM Generate fragments with constant length. If neither the keyword RANDOM nor the keyword NORANDOM is present, then the configuration is done with the keyword RANDOM. RESET Synopsis RESET ALICE DAQ and ECS manual DDL and D-RORC stand-alone software 358 Description Reset the DDL channel before generating fragments. NORESET Synopsis Description 20.4.3.3 NORESET Do not reset the DDL channel before generating fragments. If neither the keyword RESET nor the keyword NORESET is present, then the configuration is done with the keyword RESET. Common data header keywords The following DDG configuration file keywords control the generation of the Command Data Header (CDH) when the keyword HEADER is present. BLOCK_LENGTH Synopsis Description BLOCK_LENGTH Fill the “block length” field in the CDH with the length for each generated fragment. NO_BLOCK_LENGTH Synopsis Description NO_BLOCK_LENGTH Fill the “block length” field in the CDH with the value 0xffffffff for each generated fragment. If neither the keyword BLOCK_LENGTH nor the keyword NO_BLOCK_LENGTH is present, then the configuration is done with the keyword BLOCKLENGTH. MINI_EVENT_ID Synopsis Description MINI_EVENT_ID Fill the “mini-event ID” field in the CDH with the bunch crossing number for each generated fragment. NO_MINI_EVENT_ID Synopsis NO_MINI_EVENT_ID ALICE DAQ and ECS manual DDL Data Generator (DDG) program Description 359 Fill the “mini-event ID” field in the CDH with the bunch crossing number with some random errors for each generated fragment. If neither the keyword MINI_EVENT_ID nor the keyword NO_MINI_EVENT_ID is present, then the configuration is done with the keyword MINI_EVENT_ID. FORMAT_VERSION Synopsis Description FORMAT_VERSION Fill the “format version” field in the CDH for each generated fragment with the parameter version. It is a hexadecimal number (8 bits). If not present, the version number 1 is used. L1_TRIGGER Synopsis Description L1_TRIGGER Fill the “L1 trigger message” field in the CDH for each generated fragment with the parameter L1 trigger message. It is a hexadecimal number (10 bits). If not present, the message is 0. SUB_DETECTORS Synopsis Description SUB_DETECTORS Fill the “participating sub-detectors” field in the CDH for each generated fragment with the parameter participating sub-detectors. It is a hexadecimal number (24 bits). If not present, the value is 0. ATTRIBUTES Synopsis Description ATTRIBUTES Fill the “block attributes” field in the CDH for each generated fragment with the parameter block attributes. It is a hexadecimal number (8 bits). If not present, the value is 0. STATUS_BITS Synopsis STATUS_BITS ALICE DAQ and ECS manual DDL and D-RORC stand-alone software 360 Description Fill the “status and error bits” field in the CDH for each generated fragment with the parameter status and error bits. It is a hexadecimal number (8 bits). If not present, the value is 0. TRIGGER_CLASS_LOW Synopsis Description TRIGGER_CLASS_LOW Fill the “trigger class low” field (bits 1-31 of the trigger class information) in the CDH for each generated fragment with the parameter trigger class low bits. It is a hexadecimal number (32 bits). If not present, the value is 0. TRIGGER_CLASS_HIGH Synopsis Description TRIGGER_CLASS_HIGH Fill the “trigger class high” field (bits 32-49 of the trigger class information) in the CDH for each generated fragment with the parameter trigger class high bits. It is a hexadecimal number (18 bits). If not present, the value is 0. ROI_LOW Synopsis Description ROI_LOW Fill the “ROI low” field (bits 0-3 of the region of interest information) in the CDH for each generated fragment with the parameter ROI low bits. It is a hexadecimal number (4 bits). If not present, the value is 0. ROI_HIGH Synopsis Description 20.4.3.4 ROI_HIGH Fill the “ROI high” field (bits 4-35 of the region of interest information) in the CDH for each generated fragment with the parameter ROI high bits. It is a hexadecimal number (32 bits). If not present, the value is 0. Example of a DDG configuration file Listing 20.4 shows a DDG configuration file for used to generate fragments on channel 0 on a dual channel D-RORC with minor device number 1 (line 46). The physmem memory is utilized between its base address and 10 MB (lines 37-38). The generated fragments have an incremental data pattern of 1500 words of random length (lines 47-49). A fixed length can be easily achieved, e.g. by removing the ALICE DAQ and ECS manual Stand-alone installation 361 commenting semicolon (line 49). The CDH is part of the generated fragments (line 40) with a starting bunch crossing number of 1 (line 41) and a fixed increment of 2^10-1 for the bunch crossing and hence orbit number (line 42). The “mini-event ID” field is also set (line 51), but not the “block length” field (line 50). There is no limit on the number of fragments to be generated (line 43). The DDG program starts sending data only after receiving the RDYRD command (line 39). Listing 20.4 Example of a DDG configuration file 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: # DDG configuration file PHYSMEM_OFFEST 0 PHYSMEM_LENGTH 10 DDL_COMMANDS HEADER BUCH_CROSSING_START 1 BUNCH_CROSSING_INCREMENT 10 10 MAX_EVENT 0 # DDG pcddl01ldc 405 channel 0 RORC_CHANNEL 1 0 DATA_PATTERN d DATA_LENGTH 1500 ;NORANDOM NO_BLOCK_LENGTH MINI_EVENT_ID 20.4.4 Syntax of the DDG data files Any number of empty lines is allowed. Lines starting with a ‘#’, ‘*’ or ‘;’ character are considered as comment. After the character ‘;’ or ‘//’ the remaining part of any line is considered as in-line comment. The structure of the data files is as follows: 1. Put the maximum fragment size in words (4 bytes) in a separate line. It is a decimal number, which must be greater than 0 and less than 2^24-1 = 1677215. 2. Put the fragment size in words (4 bytes) of the first fragment in a separate line. It is a hexadecimal number, which must be greater or equal than 0 and less than 224 -1 = 0xffffff. 3. Put the data words of the first fragment. They can be separated by space(s), tabulator(s) or new line character(s). 4. Continue with the following fragments as described in point 2. and 3. 20.5 Stand-alone installation The DDL and RORC library and test programs are installed together with the DATE kit. For a stand-alone installation, follow the procedure below: • The header, source, object and executable files of the RORC and DDL test programs and library are in the common AFS area: /afs/cern.ch/alice/daq/ddl/rorc/ ALICE DAQ and ECS manual DDL and D-RORC stand-alone software 362 This directory contains the different versions of the software as separate sub-directories. It also contains the different versions in compressed formats. • The compressed file names show the version number and the time of archiving. Always use the latest date of a given version. The latest distributed version can be found on the following Web page: http://cern.ch/ddl/rorc_support.html • Copy the compressed file on a local directory and uncompress it. Use the following command for extracting the files: gtar -xvzf rorc_vers.x.y.z_year.month.day.tgz rorc/ • All test programs use physmem, which needs be previously installed on the machine (see Chapter 15). • To do the compilation, type the following commands: cd rorc make -f Makefile clean make -f Makefile • To compile the driver type make -f Makefile driver • To create the device files, and prepare the driver to be loaded at boot time type as root the following commands make -f Makefile dev • To load the rorc_driver kernel module without booting type as root: make -f Makefile load • In case an older version of the rorc_driver is already loaded, then type as root: make -f Makefile reload • To check if RORC card is plugged and the driver is loaded type ./check_driver. This script shows if the RORC card is plugged (calling /sbin/lspci), if the driver is loaded (calling /sbin/lsmod) and the driver messages during load time (calling dmesg). ALICE DAQ and ECS manual RORC Application Library 21 This chapter describes the API library that allows to develop programs using the RORC device in a stand-alone manner. ALICE DAQ and ECS manual 21.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 21.2 Header files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 21.3 The rorc_driver . . . . . . . . . . . . . . . . . . . . . . . . . . 364 21.4 Description of the routines and functions . . . . . . . . . . . 365 21.5 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 RORC Application Library 364 21.1 Introduction The DATE kit provides the readout software to perform long-term high-volume data taking with several RORC devices in a LDC (see Chapter 6 and Chapter 7). Some software is also provided to use the RORC device in a stand-alone manner (see Chapter 20), useful to facilitate the installation procedure, to help debugging and to use the features of the RORC as a test device for DATE. However, if someone wants to develop his own special program to exploit the capabilities of the RORC device then an application library (written in C) provided with the DATE kit can be used. This chapter provides a description of the most important routines of this library. 21.2 Header files Before calling any of the following routines, the user must include a header file: #include “rorc_ddl.h” This file contains the necessary definitions for the use of the DDL. It has a reference to another header file, which contains the definitions of the RORC cards: #include “rorc_lib.h” The header files contain the type definition of the structures referred to in further descriptions. In addition, they contain the definition of the macros described later on. 21.3 The rorc_driver The programs and routines described in this documents work under the LINUX operating system. Currently we have a RORC driver for CERN Scientific LINUX version 4 (SLC4, kernel version 2.6.9) and SLC5 (kernel version 2.6.18). Before using the described routines and programs, the RORC driver must be loaded (see Section 21.5 for details). ALICE DAQ and ECS manual Description of the routines and functions 365 21.4 Description of the routines and functions rorcFindAll Synopsis #include int rorcFindAll(rorcHwSerial_t Description *hw, rorcHwSerial_t *diu_hw, rorcChannelId_t *channel, int *rorc_revision, int *diu_vers, int max_dev) Find all RORC channels not in use. The rorcFindAll() routine returns the version, serial, revision, minor and channel numbers of all RORC cards plugged in the PC together with the same information regarding the plugged in or embedded DIUs. The routine tries to open all RORC devices and reads the hardware version and serial numbers from their configuration EPROM. It also sends a DDL command to the DIU to find out the DIU version and serial number. The routine needs to open the RORC channel to read the above data. If the RORC channel is used by some other program then it cannot be opened and it will be not included into the list of “RORC channels found”. To get the list of all RORC devices, independently of its occupancy, use rorcQuickFind(). Parameters hw pointer to an array of rorcHwSerial_t type structures. The routine loads into these structures the version and serial numbers of RORC cards found. rorcHwSerial_t is defined in the header file rorc_lib.h. Besides the major and minor version and the serial numbers it contains the full string found in the configuration EPROM of the RORC card. If there is no information in the EPROM about the hardware version and serial numbers, then the routine puts –1 into the structure as version and serial numbers. diu_hw pointer to an array of rorcHwSerial_t type structures. The routine loads into these structures the version and serial numbers of the DIU card found. If no DIU is plugged or there is no information in the DIU’s EPROM about the hardware version and serial numbers, then the routine puts –1 into the structure as version and serial numbers. channel pointer to an array of rorcChannel_t structures. Here the routine supplies the corresponding minor device numbers of the RORC cards and the channel numbers of the DIUs. rorc_revision pointer to an array of integers. Here the routine supplies the corresponding device revision numbers (1 for pRORC, 2 for D-RORC with connector for DIU. 3 for D-RORC with ALICE DAQ and ECS manual RORC Application Library 366 embedded DIUs, 4 for version 2 D-RORC and 5 for PCI express RORC) of the RORC cards. diu_vers max_dev Return value See also pointer to an array of integers. Here the routine supplies the corresponding DIU version number (0 if no DIU, 1 if prototype DIU, 2 if final DIU plugged in version, and 3 if embedded DIU) of the DIU on the corresponding DDL channel. the size of hw, diu_hw, channel, rorc_revision and diu_vers arrays. number of DDL channels (not in use) found or 0. rorcFind(), rorcQuickFind(), rorcSerial(), rorcOpenChannel() rorcQuickFind Synopsis #include int rorcQuickFind (int Description Parameters *rorc_minor, int *rorc_revision, unsigned long *com_stat, int *pci_speed, int *rorc_serial, int *rorc_fw_maj, int *rorc_fw_min, int *max_chan, int *ch_pid0, int *ch_pid1, int max_dev) Find all RORC cards. The rorcQuickFind() routine returns the minor number, revision number, PCI command/status information, PCI speed, hardware serial number, firmware version major and minor numbers and the IDs of processes using RORC channels for all RORC cards plugged in the PC. The routine gets this information from the /proc/rorc_map process file. The file is filled by the RORC driver using (except the process IDs) the data established at the boot time. rorc_minor rorc_revision pointer to an integer array containing the corresponding minor device numbers of the RORC cards. pointer to an array of integers. Here the routine supplies the corresponding device revision number (1 for pRORC, 2 for D-RORC with connector for DIU, 3 for D-RORC with embedded DIUs, 4 for version 2 ALICE DAQ and ECS manual Description of the routines and functions com_stat pci_speed rorc_serial rorc_fw_maj rorc_fw_min max_chan ch_pid0 ch_pid1 max_dev Return value See also D-RORC and 5 for PCI express RORC) of the RORC cards. pointer to an array of unsigned long integers. Here the routine copies the value found in the device’s command/status register. This value reflects the PCI settings of the given slot. The settings are correct if the value is 0x04300107 for revision 2 or 3 cards, 0x04100107 for revision 4 cards, and 0x00100007 for revision 5 cards. pointer to an array of integers. Here the routine supplies the speed settings of the cards. It could be 33, 66, 100 and 133 Hz. Note, the PCI-X type RORC card cannot work on 133 Hz. pointer to an array of integers. Here the routine loads the hardware serial number of RORC cards found. pointer to an array of integers. The routine interprets the device firmware version number and loads here the major part. E.g. for firmware ID 2.12 this value is 2. pointer to an array of integers. The routine interprets the device firmware version number and loads here the minor part. E.g. for firmware ID 2.12 this value is 12. pointer to an array of integers. Here the routine supplies the number of channels (DIUs) of the given RORC cards. pointer to an array of integers. Here the routine supplies the ID of the process currently using channel 0 of the given RORC cards or 0 if the channel is not in use. pointer to an array of integers. Here the routine supplies the ID of the process currently using channel 1 of the given RORC cards or 0 if the channel is not in use. the size of rorc_minor, rorc_revision, com_stat, pci_speed, rorc_serial, rorc_fw_maj, rorc_fw_min, max_chan, ch_pid0, and ch_pid1 arrays. number of DDL cards found on PCI bus or -1 if /proc/rorc_map cannot be open. rorcFind(), rorcFindAll(), rorcSerial(), rorcOpenChannel() rorcFind Synopsis 367 #include ALICE DAQ and ECS manual RORC Application Library 368 int rorcFind(int revision, int serial, int *minor) Description Find the specified RORC card. The rorcFind() routine returns the minor number of a RORC card with the specified revision and serial numbers. The minor number is necessary to open a DDL channel with the rorcMapChannel() or rorcOpenChannel() routines. At boot time the rorc_driver module matches the revision and serial numbers with the minor numbers and puts this info into the /proc/rorc_map process file. The rorcFind() routine reads this file and finds the given minor number. If the /proc/rorc_map file does not exist (it can happen if an older rorc_driver module is loaded, which does not create this file) then rorcFind() tries to open all the RORC devices plugged in the PC and reads the revision number from their PCI configuration space and the hardware serial number from their configuration EPROM. If several cards have the same specified revision and serial numbers, then the routine returns the first one. Parameters device revision number (1 for pRORC, 2 for D-RORC with connector for DIU, 3 for D-RORC with embedded DIUs, 4 for version 2 D-RORC and 5 for PCI express RORC cards) of the RORC card to be found. the serial number (a 5 digit decimal number) of the RORC card to be found. pointer to an integer where the minor number of the specified card has to be returned. revision serial minor Return value RORC_STATUS_OK = 0 RORC_STATUS_ERROR = -1 See also the specified RORC was found. minor points to the minor number of the specified card. the specified RORC was not found, such a card is not plugged. rorcFindAll(), rorcSerial(), rorcMapChannel(), rorcOpenChannel() rorcOpenChannel Synopsis #include int rorcOpenChannel (rorcHandle_t Description handle, int rorc_minor, int rorc_channel) Arm and reset the DDL channel. The rorcOpenChannel() routine should be called for every DDL channel at the start of a run. The routine checks the existence of the RORC channel. If it finds the channel, it opens it and fills a descriptor. The descriptor address can be used as a handle for every further use of the given channel. The rorcOpenChannel() routine resets the RORC device and sends a command the DIU to find out whether there is any DIU plugged and if so, what is the given DIU version (prototype or final). The above information is written into ALICE DAQ and ECS manual Description of the routines and functions 369 the handle structure. If one does not want the RORC to be reset, use the rorcMapChannel() routine instead. Parameters address of a RORC descriptor structure. The rorcHandle_t type is a pointer to a rorcDescriptor_t structure, which contains all information about the PCI-based RORC. The structure type is defined in the rorc_lib.h file. The caller, before calling the rorcOpenChannel() routine, has to allocate a descriptor and supply its address to the routine. The routine fills the structure with data necessary for further calls. device file minor number of the RORC card. Multiple RORC cards can be supported (with device file names “/dev/prorcN”, where N is the minor number). The minor numbers start from 0. RORC channel number (0 or 1). For pRORCs or D-RORCs without embedded DIUs only channel 0 can be used. handle rorc_minor rorc_channel Return value RORC_STATUS_OK = 0 RORC_STATUS_ERROR = -1 See also no error, channel initialized and handle points to a valid RORC descriptor. the RORC channel couldn’t be opened. Either no card was found or another process uses it or its PCI memory cannot be mapped. rorcMapChannel(), rorcClose() rorcMapChannel Synopsis #include int rorcClose(rorcHandle_t handle) Description Parameters Return value See also Close the RORC channel. The rorcClose() routine should be called for every DDL channel at the end of a run. The routine closes all resources set up by a previous call of the routines rorcOpenChannel() or rorcMapChannel(). handle address of the RORC descriptor. When the routine returns the handle, it will point to an invalid descriptor. RORC_STATUS_OK = 0 no error, channel closed RORC_STATUS_ERROR = -1 the RORC channel couldn’t be closed properly. rorcOpenChannel(), rorcMapChannel() rorcReset Synopsis #include void rorcReset(rorcHandle_t handle, ALICE DAQ and ECS manual Description of the routines and functions 371 int Description Parameters See also option) Reset the RORC channel. The rorcReset() routine initializes the RORC card and/or a DDL channel. According to the user request the routine resets the Free FIFO, the other parts of the RORC, the DIU or the SIU. Resetting the RORC channel means to empty all its FIFOs, including the Free FIFO and error bits, and then putting all programmable features to their reset values. Resetting the DIU or the SIU means cutting the DDL link; afterwards the DDL link rebuilds itself. handle address of the RORC descriptor. option the following values can be used: RORC_RESET_FF clear Rx and Tx Free FIFOs RORC_RESET_FIFOS clear RORC’s other FIFOs RORC_RESET_ERROR clear RORC's error bits RORC_RESET_COUNTERS clear RORC’s byte counters RORC_RESET_RORC reset RORC RORC_RESET_DIU reset DIU RORC_RESET_SIU reset SIU 0 reset RORC rorcOpenChannel(), rorcMapChannel(), rorc_reset rorcEmptyDataFifos Synopsis #include void rorcEmptyDataFifos(rorcHandle_t int Description Parameters Return value See also handle, timeout) Try to empty all data FIFOs of the RORC channel.The rorcEmptyDataFifos() routine tries to empty the RORC card’s receive (Rx) and transmit (Tx) data FIFOs by continuously sending the ‘clear RORC FIFOs’ command and checking Rx FIFO status. It is not enough to send the command only once, because new data can arrive from the - not empty - FIFOs of the FEE, SIU or DIU. The routine returns if the RORC’s Rx FIFO is empty or the time-out has expired. handle timeout address of the RORC descriptor. time-out value in secs. RORC_STATUS_OK = 0 no error, data FIFOs emptied. RORC_TIMEOUT TIMEOUT = -64 there are still data in RORC’s Rx FIFO after timeout. rorcReset(), rorcArmDDL() ALICE DAQ and ECS manual RORC Application Library 372 rorcArmDataGenerator Synopsis #include int rorcArmDataGenerator(rorcHandle_t Description Parameters handle, __u32 initEventNumber, __u32 initDataWord, int dataPattern, int eventLen, int seed, int *rounded_length) Initialize RORC’s Data generator. The rorcArmDataGenerator() routine should be called for every DDL channel where the RORC card will be used as data generator. The routine can be called after the call of rorcOpenChannel() and before the call of rorcStartDataGenerator() routines. It defines all the parameters needed for data generation. If rorcStartDataGenerator() is called without calling rorcArmDataGenerator(), then the data generator will use unpredictable values. handle initEventNumber initDataWord dataPattern RORC_DG_CONST: RORC_DG_ALTER: RORC_DG_FLY0: RORC_DG_FLY1: RORC_DG_INCR: RORC_DG_DECR: RORC_DG_RANDOM: eventLen seed address of the RORC descriptor. each event starts with the serial number of the given event (event count). This parameter defines the starting value of it. the first data word of the event (after the event count). It is used only for some of the test patterns. Note: for D-RORC, if the seed is not RORC_DG_NO_RANDOM_LEN, the first data word of each event is 0. an integer between 1 and 7: all data words are initDataWord. alternating pattern, starting from initDataWord flying 0 starting from 0xfffffffe flying 1 starting from 0x00000001 incrementing data starting from initDataWord decrement data starting form initDataWord random data length (from 1 to 2^19-1) of the generated events in 32 bit words, including the event count. Important: because of the special features of the random number generation, if random length is used (seed is not equal to RORC_DG_NO_RANDOM_LEN), the minimum generated event length is 1, and the maximum value of the length will be eventLen rounded down to the nearest integer of power of 2. defines the seed value for random data length. If given, the event lengths will vary between 1 and eventLen. ALICE DAQ and ECS manual Description of the routines and functions 373 rounded_length Return value See also Using the value RORC_DG_NO_RANDOM_LEN no random length will be generated. it is an output parameter: in case of random length generation the maximum event length is rounded to the nearest integer of power of two. The routine transfers this value to RORC as the maximum length and returns it to the user in this variable. RORC_STATUS_OK = 0 no error, data generator initialized RORC_INVALID_PARAM = -2 error: some of the parameters out of range. rorcOpenChannel(), rorcStartDataGenerator() rorcArmDDL Synopsis #include int rorcArmDDL(rorcHandle_t int Description Parameters Return value handle, options) Arm the DDL. The rorcArmDDL() routine should be called for every DDL channel when the RORC card is not used as data generator but data come from the Front-end Electronics (FEE). The purpose of the routine is to establish or check the connection between the DIU and SIU, to reset all components and to clear all data, which could remain in the channel from previous use of the link. According to the user request the routine resets the Free FIFO, the other parts of the RORC, the DIU or the SIU units. If several reset requests are “OR-d”, the program first resets the SIU, then establishes the link, then resets the DIU and at last resets the RORC. Resetting the RORC card means emptying all its FIFOs, including the Free FIFO, and then put all programmable features to their reset values. Resetting the DIU or SIU means cutting the DDL link (if it was on before the call of the routine). In the case of prototype version of the DDL cards, after the link cut, the link has to be re-established by calling rorcArmDDL() with RORC_LINK_UP parameter. In case of the final DDL cards, the link is set up automatically. handle options RORC_RESET_FF RORC_RESET_RORC RORC_RESET_DIU RORC_RESET_SIU RORC_LINK_UP address of the RORC descriptor. the following values can be “OR-d”: reset Free FIFO reset RORC reset DIU reset SIU establish the DDL link RORC_STATUS_OK = 0 no error, requested task done. RORC_LINK_NOT_ON = -4 link initialization did not succeed RORC_CMD_NOT_ALLOWED =-8 routine called with not permitted option ALICE DAQ and ECS manual RORC Application Library 374 RORC_NOT_ACCEPTED = -16 See also unsuccessful SIU reset rorcArmDataGenerator(), rorcReset(), rorcEmptyDataFifos() rorcPushFreeFifo Synopsis #include void rorcPushFreeFifo(rorcHandle_t handle, rorcMemAddres_t blockAddress, Description __u32 blockLength, int readyFifoIndex) Push one entry into RORC’s Free FIFO. The rorcPushFreeFifo() is an in-line function what should be called when the user has a free data page and wants to load its parameters into the Free FIFO. It loads the parameters directly into the RORC registers. The function does not check the range of the parameters: it masks them for the given range. It neither checks the Free FIFO status. If the Free FIFO is overflowed then the new parameters will not be loaded. The caller can check this situation using rorcCheckFreeFifo(). Parameters handle blockAddress blockLength readyFifoIndex See also address of the RORC descriptor. physical address of the next free page in physmem memory. length of the next free page in byte (24 bit). index of the ready FIFO, where the “data arrived” flag has to be put to (8 bit). rorcCheckFreeFifo() rorcCheckFreeFifo Synopsis #include int rorcCheckFreeFifo(rorcHandle_t handle) Description Parameters Return the status of the RORC’s Free FIFO. The rorcCheckFreeFifo() should be called when the caller wants to know how many FIFO entries are in the Free FIFO. Using pRORC device, it returns the number of entries in “8 entry” units (i.e. 0 means 1 to 8 entries, 1 means 9 to 16 entries, etc.). FIFO full and FIFO empty statuses are signaled. In the case of D-RORC (RORC revision number > 1), the routine only signals if the Free FIFO is not empty (returns not 0). handle address of the RORC descriptor. ALICE DAQ and ECS manual Description of the routines and functions Return value 375 In case of pRORC: a value between 1 and 15 specifying number of not empty Free FIFO entries in the following way: 0 Between 1 and 8 words 1 Between 9 and 16 words 2 Between 17 and 24 words 3 Between 25 and 32words ……. ………. 13 Between 105 and 112 words 14 Between 113 and120 words 15 Between 121and 128 words RORC_STATUS_OK = 0 RORC_FF_EMPTY = -256 Free FIFO is not empty (and not full). Free FIFO is empty. RORC_FF_FULL = -128 error: Free FIFO full. In case of D-RORC: See also 0: Free FIFO is empty, any other value: Free FIFO is not empty. rorcPushFreeFifo() Setting RORC parameters on/off The following 6 routines can be used to set RORC internal control parameters on or off. rorcLoopBackOn Synopsis #include int rorcLoopBackOn(rorcHandle_t handle) Description The rorcLoopBackOn() routine should to be called when the user wants to set the operational control parameter “Internal Loop-back” bit. If this control bit is on, the data generated by the RORC’s Data Generator will be sent back to the RORC as if it had arrived from the link. This conditions can be reset by the routine rorcLoopBackOff(). Parameters handle ALICE DAQ and ECS manual address of the RORC descriptor. RORC Application Library 376 Return value RORC_STATUS_OK = 0 no error rorcLoopBackOff Synopsis #include int rorcLoopBackOff(rorcHandle_t handle) Description The rorcLoopBackOff() routine should to be called when the user wants to reset the operational control parameter “Internal Loop-back” bit This condition is automatically set after RORC reset. Parameters Return value handle RORC_STATUS_OK = 0 address of the RORC descriptor. no error rorcHltSplitOn Synopsis #include int rorcHltSplitOn(rorcHandle_t handle) Description The D-RORC card with 2 integrated DIU can be used in “split mode”. It means that the data arriving on one channel can be transferred to the other channel. The rorcHltSplitOn() routine should to be called when the user wants the given channel to be used as output channel. This conditions can be reset by the routine rorcHltSplitOff(). Parameters Return value handle address of the RORC descriptor. RORC_STATUS_OK = 0 no error RORC_CMD_NOT_ALLOWED = -8 the routine cannot be called for pRORCs or D-RORCs without integrated DIU. rorcHltSplitOff Synopsis #include int rorcHltSplitOff(rorcHandle_t handle) Description The rorcHltSplitOff() routine should to be called when the user wants to switch off the data sending for the given channel. This condition is automatically set after a RORC reset. ALICE DAQ and ECS manual Description of the routines and functions Parameters Return value handle 377 address of the RORC descriptor. RORC_STATUS_OK = 0 no error RORC_CMD_NOT_ALLOWED = -8 the routine cannot be called for pRORCs or D-RORCs without integrated DIU. rorcHltFlctlOn Synopsis #include int rorcHltFlctlOn(rorcHandle_t handle) Description The D-RORC card with 2 integrated DIU can be used in “split mode”. It means that the data arriving on one channel can be transferred to the other channel. The rorcHltFlctlOn() routine should to be called when the given channel is used as the output channel (rorcHltSplitOn() is called or will be called) and the user wants the flow control from the receiver side (probably the HLT farm) be taken into account. This conditions can be reset by the routine rorcHltFlctlOff(). Parameters Return value handle address of the RORC descriptor. RORC_STATUS_OK = 0 no error RORC_CMD_NOT_ALLOWED = -8 the routine cannot be called for pRORCs or D-RORCs without integrated DIU. rorcHltFlctlOff Synopsis #include int rorcHltFlctlOff(rorcHandle_t handle) Description The rorcHltFlctlOff() routine should to be called when the given channel is used as the output channel (rorcHltSplitOn() is called or will be called) and the user wants the flow control from the receiver side (probably the HLT farm) NOT be taken into account. This condition is automatically set after RORC reset. Parameters Return value handle address of the RORC descriptor. RORC_STATUS_OK = 0 no error RORC_CMD_NOT_ALLOWED = -8 the routine cannot be called for pRORCs or D-RORCs without integrated DIU. ALICE DAQ and ECS manual RORC Application Library 378 ddlSendCommandAndWaitReply Synopsis #include int ddlSendCommandAndWaitReply(rorcHandle_t Description Parameters __u32 feeCommand, __u32 feeAddress, long long timeout, stword_t *stw, int expected, int *n_reply) Send a command and wait for the reply. The ddlSendCommandAndWaitReply() routine should to be called when the user wants to send a command to the FEE via the DDL channel. The routine returns the received replies. handle feeCommand RDYRX = 1 EOBTR = 11 STBWR = 13 STBRD = 5 FECTRL = 12 FESTRD =4 feeAddress timeout stw expected n_reply Return value handle, address of the RORC descriptor. a maximum 4-bit long value which will be sent to the FEE as a part the command. The following FEE commands are allowed: Ready to Receive End of Block Transfer Start of Block Write Start of Block Read Front-end control Front-end status readout a maximum 19-bit long value which will be sent to the FEE as a part of the command. the number of waiting cycles for receiving the SIU reply. If you want to specify the timeout value in microseconds, then use the value ( * handle->loop_per_usec). pointer to an array of status word structures where the routine returns the received statuses. number of expected reply words. pointer to a variable where the routine returns the number of received statuses. RORC_STATUS_OK = 0 no error, the command sent, the expected number of reply words received RORC_LINK_NOT_ON = -4 error: the link is down RORC_TIMEOUT = -64 error: command can not be sent in time specified by timeout ALICE DAQ and ECS manual Description of the routines and functions See also 379 RORC_TOO_MANY_REPLY = -512 error: too many replies arrived or before sending the command, the RORC’s received FIFO contained already some words from a previous command RORC_NOT_ENOUGH_REPLY = -1024 error: less reply arrived then expected in time specified by timeout rorc_send_command rorcStartDataGenerator Synopsis #include int rorcStartDataGenerator(rorcHandle_t __u32 Description handle, maxLoop) Set RORC to start sending generated data. The rorcStartDataGenerator() routine should to be called when the user wants to receive generated data. Normally the Data Generator sends the data to the DDL link. If the user wants the simulated data to arrive in the PC, then the RORC has to be set to loop-back mode before starting the Generator. This can be done by the routine rorcLoopBackOn(). Data will arrive only when data receiver is started by calling the rorcStartDataReceiver() routine, and the RORC’s Free FIFO is not empty. Features of the generated data (data pattern, event length, event frequency) can be defined by a previous call to rorcArmDataGenerator(). To stop the data generator (in the case of infinite number of events) call the routine rorcStopDataGenerator(). Parameters Return value See also handle maxLoop address of the RORC descriptor. number of events to be generated. Possible values are from 1 to 2^32-1, or RORC_DG_INFINIT_EVENT (infinite number of events). RORC_STATUS_OK = 0 no error, data generator started rorcArmDataGenerator(), rorcLoopBackOn(), rorcStartDataReceiver(), rorcStopDataGenerator() rorcStopDataGenerator Synopsis #include int rorcStopDataGenerator(rorcHandle_t handle) ALICE DAQ and ECS manual RORC Application Library 380 Description Parameters Return value See also Stop sending generated data. The rorcStopDataGenerator() routine should to be called when the user wants to stop receiving generated data. The data generator stops sending events when the number of events set in rorcStartDataGenerator() is reached. However rorcDataStopGenerator() has to be called to set the RORC card into normal state. If data sending is going on when this routine is called, then the current event will be finished and no more data will be sent. If the transfer is stuck, one has to reset the RORC card. handle address of the RORC descriptor. no error, data generator stopped RORC_STATUS_OK = 0 rorcStartDataGenerator() rorcStartDataReceiver Synopsis #include int rorcStartDataReceiver(rorcHandle_t unsigned long Description Parameters Return value See also handle, readyFifoBaseAddress) Set the DDL channel to data collecting state. The rorcStartDataReceiver() routine should to be called when the user wants to receive data via the DDL channel. handle readyFifoBaseAddress address of the RORC descriptor. the physical memory address of the Ready FIFO. It must be a multiple of 2K, i.e. the lower 11 bits of the Ready FIFO address must be 0. no error, data collection started. RORC_STATUS_OK = 0 rorcStopDataReceiver() rorcStopDataReceiver Synopsis #include int rorcStopDataReceiver(rorcHandle_t handle) Description Parameters Stop data collecting. The rorcStopDataReceiver() routine should to be called when the user wants to stop receiving data via the DDL channel. handle address of the RORC descriptor. ALICE DAQ and ECS manual Description of the routines and functions Return value See also 381 no error, data collection stopped. RORC_STATUS_OK = 0 rorcStartDataReceiver() ddlReadDataBlock Synopsis #include int ddlReadDataBlock(rorcHandle_t handle, unsigned long bufferPhysAddress, unsigned long returnPhysAddress, rorcReadyFifo_t *returnAddr, Description __u32 feeAddress, long long timeout, stword_t *stw, int *n_reply, int *step) Read a data block from the FEE. The ddlReadDataBlock() routine should to be called when the user wants to read a data block from the FEE via the DDL channel. The routine fulfils the following 3 steps: 1. Sends a Start Block Read (STBRD) command to the FEE, specifying the front-end address where the data is. 2. Receives the data block. 3. Sends an End Of Block Transfer (EOBTR) command to the SIU. If an error occurs in any of the above steps, the routine returns an error code, the step number and the received reply from the FEE or SIU. Parameters handle address of the RORC descriptor. bufferPhysAddress the physical memory address of the data. returnPhysAddress the physical memory address of a word where the number of transferred word and a status word will be put when the transfer had finished. When using D-RORC the address must be 2K aligned, i.e. its lower 11 bits must be 0. The routine writes –1 this address before sending the data and polls this address while the transfer is done. returnAddress a pointer to the virtual address of the above physical memory. feeAddress a maximum 19-bit long value which will be sent to the FEE in the STBRD command. timeout the number of waiting cycles for receiving the SIU reply. If you want to specify the timeout value in ALICE DAQ and ECS manual RORC Application Library 382 microseconds, then use the value ( * handle->loop_per_usec) pointer to an array of status word structures where the routine returns the received status. pointer to a variable where the routine returns the number of received status. pointer to a variable where the routine returns the step number at which the routine returned from. stw n_reply step Return value See also RORC_STATUS_OK = 0 no error RORC_LINK_NOT_ON = -4 error: the link is down RORC_TIMEOUT = -64 error: command can not be sent in time timeout RORC_TOO_MANY_REPLY = -512 error: too many replies arrived RORC_NOT_ENOUGH_REPLY = -1024 error: less reply arrived then expected in time timeout ddlWriteDataBlock() ddlWriteDataBlock Synopsis #include int ddlWriteDataBlock(rorcHandle_t handle, unsigned long bufferPhysAddress, unsigned long bufferWordLength, unsigned long returnPhysAddress, volatile unsigned long *returnAddr, Description __u32 feeAddress, long long timeout, stword_t *stw, int *n_reply, int *step) Send a data block to the FEE. The ddlWriteDataBlock() routine should to be called when the user wants to send a data block to the FEE via the DDL channel. The routine fulfils the following 3 steps: 1. Sends a Start Block Write (STBWR) command to the FEE, specifying the front-end address where the data has to be written. 2. Sends the data block. 3. Sends an End Of Block Transfer (EOBTR) command to the SIU. ALICE DAQ and ECS manual Description of the routines and functions 383 If an error occurs in any of the above steps, the routine returns an error code, step number and the received reply from the FEE or SIU. Parameters handle address of the RORC descriptor. bufferPhysAddress the physical memory address of the data. bufferWordLength the length of the data block in 32 bit words. The maximum length is 512 K words – 1 word. returnPhysAddress the physical memory address of a word where the number of transferred word will be put when the transfer had finished. When using D-RORC the address must be 2K aligned, i.e. its lower 11 bits must be 0. The routine writes –1at this address before sending the data and polls this address while the transfer is done. returnAddress a pointer to the virtual address of the above physical memory. feeAddress a maximum 19-bit long value which will be sent to the FEE in the STBWR command. timeout the number of waiting cycles for receiving the SIU reply. If you want to specify the timeout value in stw n_reply step Return value See also RORC_STATUS_OK = 0 no error RORC_LINK_NOT_ON = -4 error: the link is down RORC_TIMEOUT = -64 error: command can not be sent in time timeout RORC_NOT_ABLE = -32 error: the previous download was not finished in time timeout RORC_TOO_MANY_REPLY = -512 error: too many replies arrived RORC_NOT_ENOUGH_REPLY = -1024 error: less reply arrived then expected in time timeout ddlReadDataBlock() rorcStartTrigger Synopsis microseconds, then use the value ( * handle->loop_per_usec) pointer to an array of status word structures where the routine returns the received status. pointer to a variable where the routine returns the number of received status. pointer to a variable where the routine returns the step number at which the routine returned from. #include ALICE DAQ and ECS manual RORC Application Library 384 int rorcStartTrigger(rorcHandle_t Description Parameters See also long long timeout, stword_t stword) The rorcStartTrigger() routine sends a RDYRX command to the FEE. address of the RORC descriptor the number of waiting cycles for receiving the FEE reply. If you want to specify the timeout value in microseconds, then use the value ( * handle->loop_per_usec) the FEE reply: a DDL status word stword.stw contains the full reply. For the details of a status word, see the rorc_ddl.h. handle timeout stword Return value handle, RORC_STATUS_OK = 0 the RDYRX command was sent successfully. RORC_STATUS_ERROR = -1 the RORC was not able to send the command. RORC_LINK_NOT_ON = -4 the link is down; the RORC is not able to send the command. RORC_NOT_ACCEPTED = -16 No reply arrived from SIU within the specified timeout. rorcStopTrigger() rorcStopTrigger Synopsis #include int rorcStopTrigger(rorcHandle_t Description Parameters long long timeout stword_t stword) The rorcStopTrigger() routine sends an EOBTR command to the FEE. handle timeout stword Return value handle, RORC_STATUS_OK = 0 address of the RORC descriptor the number of waiting cycles for receiving the FEE reply. If you want to specify the timeout value in microseconds, then use the value ( * handle->loop_per_usec). the FEE reply: a DDL status word stword.stw contains the full reply. For the details of a status word, see the rorc_ddl.h. the EOBTR command was sent successfully. ALICE DAQ and ECS manual Description of the routines and functions See also 385 RORC_STATUS_ERROR = -1 the RORC was not able to send the command. RORC_LINK_NOT_ON = -4 the link is down (the RORC is not able to send the command). RORC_NOT_ACCEPTED = -16 no reply arrived from SIU within the specified timeout. rorcStartTrigger() rorcSerial Synopsis #include rorcHwSerial_t rorcSerial(rorcHandle_t handle) Description Parameters Return value See also Reads RORC’s version and serial numbers. The rorcSerial() routine reads from the card’s configuration EPROM its hardware version and serial numbers. The routine rorcInterpretSerial() interprets the relevant fields and print them to standard output. handle address of the RORC descriptor structure rorcHwSerial_t The routine loads into this structure the version and serial numbers of the RORC card. rorcHwSerial_t is defined in rorc_lib.h. Besides the major and minor version and the serial numbers it contains the full string retrieved from the configuration EPROM of the RORC card. If there is no information in the EPROM about the hardware version and serial numbers then the routine writes –1 into the structure. rorcFind(), rorcFindAll(), rorcReadFw(), ddlSerial() rorcReadFw Synopsis #include int rorcReadFw(rorcHandle_t Description handle) The rorcReadFw() function reads the RORC’s firmware identification word. The routine rorcInterpretFw(fw) interprets the relevant fields and print them to standard output. The inline function rorcFFSize(fw) returns the number of Free FIFO entries of the card, while rorcFWVersMajor(fw) and rorcFWVersMinor(fw) return the major and minor version numbers of the card’s firmware. ALICE DAQ and ECS manual RORC Application Library 386 Parameters Return value address of the RORC descriptor. handle The returned word contains the RORC’s firmware identification in the following format: bits 0-4: day bits 5-8: month bits 9-12: year form 2000 bits 13-24 version number of the pRORC card’s firmware bits 25-31 Free FIFO size of the card in 64 units. See also rorcSerial() rorcReadRorcStatus Synopsis #include int rorcReadRorcStatus(rorcHandle_t rorcStatus_t Description handle, *status) The rorcReadRorcStatus() function fills a structure (defined in rorc_lib.h) containing information about RORC status and errors, such as: the working mode of the RORC, Free FIFO status, link status, flow control status, etc. Before calling the rorcRorcReadStatus(), the caller has to allocate a rorcStatus_t structure and supply its address to the routine. The routine fills this structure. The rorcStatus_t structure contains three members: ccsr: the copy of the RORC’s Operation Control and Status Register, cerr: the copy of the RORC’s Error Register, cdgs: the copy of the RORC’s Data Generator Status Register. The meaning of the status and error bits can be found in rorc_lib.h. The routines rorcInterpretStatus(ccsr) and rorcInterpretError(cerr) interpret the relevant register bits and print them to standard output. Parameters Return value handle status RORC_STATUS_OK = 0 address of the RORC descriptor. address of a rorcStatus_t type structure. The routine fills into this structure the RORC status information. no error, RORC status structure filled ALICE DAQ and ECS manual Description of the routines and functions 387 ddlSerial Synopsis #include rorcHwSerial_t ddlSerial(rorcHandle_t Description Parameters Return value See also handle, int destination, long long timeout) Read the version and serial numbers of the DIU or SIU card. Send command to the DIU or SIU requesting the hardware version and serial numbers. The routine works only for plugged DIU and DDL cards of the final version. The routine rorcInterpreHwtSerial() interprets the relevant fields and prints them to standard output. handle destination timeout address of the RORC descriptor DIU or SIU the number of waiting cycles for receiving the DDL card’s reply. If you want to specify the timeout value in microseconds, then use ( * handle->loop_per_usec). structure rorcHwSerial_t The routine loads into this structure the version and serial numbers of the DDL (DIU or SIU) card. rorcHwSerial_t is defined in rorc_lib.h. Besides the major and minor version and the serial numbers it contains the full string received from the card. If there is no information received, then the routine writes –1 into the structure (this is the case for the prototype version DDL cards or integrated DIUs). rorcSerial() rorcHasData Synopsis #include int rorcHasData(rorcReadyFifo_t int Description readyFifoBaseAddr, readyFifoIndex) Check the Ready FIFO for new data block. The calling program has to specify the Ready FIFO base address and index. It polls the Ready FIFO entry and signals if a data block arrived. This routine is an in-line function. It does not return values from the Ready FIFO. The caller can read the block length and the status from the FIFO.The routine only ALICE DAQ and ECS manual RORC Application Library 388 returns the information of block arrival. The memory address of the given block (and the other blocks of the same event) has to be known by the caller. Parameters Return value readyFifoBaseAddr base address of the Ready FIFO readyFifoIndex index of the Ready FIFO where the checking has to be done RORC_DATA_BLOCK_NOT_ARRIVED = 0 no data block (data page) arrived (Ready FIFO status = -1) RORC_NOT_END_OF_EVENT_ARRIVED = 1 data block (data page) arrived but not end-of-event block (Ready FIFO status = 0) RORC_LAST_BLOCK_OF_EVENT_ARRIVED= 2 end-of-event data block arrived (Ready FIFO status = DTSTW). If the continuation bit (bit 8) is set, the event will continue. rorcCheckLink Synopsis #include int rorcCheckLink(rorcHandle_t handle) Description Parameters Return value See also Check a status word of the RORC card which reflects the link status. handle address of the RORC descriptor. RORC_STATUS_OK = 0 the DDL link is on RORC_LINK_NOT_ON = -4 the DDL link is not on rorcReadRorcStatus(), rorcArmDDL() ALICE DAQ and ECS manual Installation 389 21.5 Installation The DDL-RORC Library and Test Programs are installed together with the DATE. For a stand-alone installation, follow the given procedure below: • The header, source, object and executable files of RORC and DDL test programs and library are in a common afs area:/afs/cern.ch/alice/daq/ddl/rorc/ • This directory contains the different versions of the software as separate sub directories. These sub directories also contain the different versions in compressed formats. • The compressed file names show the version number and the time of archiving. Use always the latest date of a given version. The latest distributed version can be found in the DDL home page as well: http://cern.ch/ddl/rorc_support.html. • Copy the compressed file onto your area, uncompress it and extract all directories and files from it. Use the following command for extracting files: > gtar –xvzf rorc_vers. _ .tgz • You will get a directory structure with the following subdirectories: rorc/ rorc/Linux/ rorc/examples/ rorc/scripts/ source, header and make files executables and compiled API libraries programs showing the usage of API libraries some functional test scripts • Some test programs use the physmem memory manager module (see Chapter 15). If DATE is installed then physmem is installed as well. For a stand-alone installation, one can find the package in the DDL home page at http://cern.ch/ddl/rorc_support.html. • To compile it type the following commands: > cd rorc > make -f Makefile clean > make –f Makefile • To compile the rorc_driver type: > make driver -f Makefile • To register the driver and to create the device files type as root: > make dev -f Makefile The driver will be automatically loaded at boot time. • If you want to load the rorc_driver kernel module without rebooting the machine, type as root: > make load –f Makefile • If an older version of the RORC driver is already loaded then run: ALICE DAQ and ECS manual RORC Application Library 390 > make reload –f Makefile ALICE DAQ and ECS manual Part IV Detector Algorithms Framework November 2010 ALICE DAQ Project Detector Algorithms Framework 22 The online calibration tasks for the detectors are implemented using the Detector Algorithms (DA) Framework. The framework is available for download at: http://cern.ch/alice-daq/DA-framework This chapter describes the architecture and interfaces to implement Detector Algorithms. ALICE DAQ and ECS manual 22.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 394 22.2 The Detector Algorithms (DAs) . . . . . . . . . . . . . . . . 395 22.3 DA framework architecture . . . . . . . . . . . . . . . . . . . 395 22.4 DA framework implementation . . . . . . . . . . . . . . . . 397 Detector Algorithms Framework 394 22.1 Introduction The ALICE sub-detectors require specific calibration tasks to be performed regularly in order to achieve the most accurate physics measurements. These systems are indeed sensitive to configuration settings, mechanical geometry, environmental conditions changes, components aging and sensors defects. The corresponding set of procedures to calibrate the sub-detectors involves events analysis in a wide range of experimental conditions. These calibration tasks may be done either in dedicated runs, or in parallel to physics data taking. Typical examples of calibrations include pedestal and gain computation, dead and noisy channels mapping, etc. Depending on the sub-detector and the calibration task, one has to define in particular: • The trigger type, which can be a normal physics trigger, or some specific events related to dedicated hardware device (e.g. laser, LED, pulser). • The number of events to collect, from a hundred events for pedestal runs to millions of events for dead channel mapping. • The event formatting, zero-suppressed or not, which impacts on the required throughput. Event size ranges from sub-events of few kilobytes to 20 MB. • The detector electronics settings, specific to the sub-detector operation mode. • The calibration algorithm, i.e. the actual code to interpret the data and produce results. • The type of run, standalone or global, depending if the task can be performed during normal data taking or requires a specific run. It has an impact on the operation mode and detector dead-time for physics. • The frequency at which the calibration is required, from few times per day to once a year. The calibration results produced may be needed to configure the detector electronics for data taking, for example to produce zero-suppressed data or to mask noisy channels, in order to reduce the data volume. Therefore, these results should be available right after the calibration data-taking procedure, in order to reconfigure the detector accordingly for the next physics run. In addition, the results are also used offline for the events reconstruction. Both usages of the results involve a drastic timing constraint on the way they are produced. It would be too heavy to make the full calibration analysis offline (a first pass over the data would be needed to produce calibration results), and sometimes too late (for calibrations required very frequently, or for which results are needed for the detector configuration). Only the most complex calibration data analysis should be done offline. Therefore, a dedicated framework has been designed and implemented to achieve as much as possible the detector calibration directly online, and to address the heterogeneous requirements specific to each calibration task. ALICE DAQ and ECS manual The Detector Algorithms (DAs) 395 22.2 The Detector Algorithms (DAs) The ALICE online calibration framework is used to implement and run a set of detector algorithms (DAs), which are calibration tasks running online. DAs are provided by the sub-detector teams, using the global framework to develop detector-specific calibration procedures. Each DA grabs detector data and produces results online. These results can be reused directly online, e.g. to configure the detector, or shipped offline to be post-processed (if necessary) and used in event reconstruction. To cover all the needs, we have defined two types of DAs, running either in exclusive mode (a dedicated run is required), or in the background (the task can be performed in a physics run): • In the first case, called ‘LDC DA’, the data is recorded locally and in parallel on the LDCs, during a dedicated standalone run (single detector running), usually of short duration. At end of run, a DA process is launched on each LDC to analyze the data. In this mode, parallelization is optimal, and results are readily available for further export to FEE. Typical example is the pedestal run, with few hundreds of big events. The temporary data files stored on local disk are useful to re-play, tune or debug the DA behavior. • In the second case, called ‘MON DA’, a single DA process of a given type is active during the run, on a dedicated monitoring machine. Data samples are picked up from the normal data flow in a non-intrusive way, and processed directly on the fly. The DA gets only what it can process, events may be dropped in case the DA is busy. The DA selects the type of events it needs (calibration, physics) and the source to monitor (typically, a detector or a set of detectors). At the end of run, the DA goes in a post-processing phase to finalize the results. A generic example is to populate an histogram event by event, and at the end of run compute a fit and extract some key values. Another example of MON DA usage is for the dead channel mapping, where millions of events may be needed to cover the full detector. Many runs may be needed to collect such statistics, in which case intermediate results are saved at end of run, and re-loaded at the next start of run. 22.3 DA framework architecture The overall DA framework architecture, and in particular the interaction of the DA processes with the online components, can be seen in Figure 22.1. The DA process consists of detector code, written in C++ using the AliROOT framework, which is the ALICE offline code repository (therefore providing the same calibration algorithm implementation for both online and offline environments). This code uses the DAQ DA interface library in order to communicate with the other systems. The ECS is in charge of launching the DA where needed for the corresponding run type, depending on the experiment running mode selected by the operator. The run type is propagated to the other online systems and to the detector, in order to make ALICE DAQ and ECS manual Detector Algorithms Framework 396 sure corresponding settings are applied. This information is also stored in the experiment logbook (see Chapter 24) for bookkeeping and further reference. Figure 22.1 DA framework architecture. Upon startup, the DA connects to the main DAQ data flow (being local files for LDC DA, or remote data sources for MON DA) in order to collect events. The DA may use some configuration information stored centrally in the detector database (see Section 4.4.6), which allows the operator to define the DA operation settings or algorithm parameters. At run time, all output messages from the DA process are exported to the central DAQ/ECS log system (described in Chapter 11) for online operator display and archival. Health of the process is also monitored constantly by the ECS, and return error code checked upon exit. While running, the DA may publish its results to the experiment Data Quality Monitoring (DQM) framework (see Chapter 23) for feedback to the operator, graphical display, consistency and quality checks, or reuse for monitored data reduction in the DQM agents. The DA is notified the end of the run, and then proceeds to final post-processing and results saving. The control system checks that the DA completes its tasks within the required time, and aborts the process if necessary. Allowed DA end-of-run duration is kept short for the global runs (typically less than one minute), to minimize data taking dead-time. Whenever possible, demanding computation tasks exceeding this threshold are performed offline where there is no such constraint. The DA can save persistent files to a configuration database (local or central) in case results are to be reused in the DAQ (e.g. for a further DA run, for other online processes, for FEE configuration, etc). Calibration results are also exported to the File Exchange Server (see Section 17.4), which is the system used to publish data from DAQ to the other components. The DA result files may as well be reused in HLTor DCS. Most importantly, the results are collected offline by the Shuttle ALICE DAQ and ECS manual DA framework implementation 397 framework (see Section 17.5), where the output data is post-processed and archived in the Offline Condition Database. 22.4 DA framework implementation The framework relies on two main components: • a programming interface for the DAs to interract with the outside world. • a launch mechanism to start and control the DAs at runtime, so that they run when appropriate. We describe below the DA Framework version 1. 22.4.1 DA interface API The DAs are implemented in the AliROOT framework so that the calibration processing code is shared with the offline code and components reused. However, the DAQ provides the API for the detector code input and output mechanisms. The main loop (subscription to events and their processing) of each DA program is implemented in the detector code, and not provided by the framework library. The framework distribution provides some examples of DA skeletons with a main loop reading and processing events. The API described in the file daqDA.h provides the means to read configuration files, store and export results, and get some runtime information. In particular, it provides: • functions to read and write data from/to the DAQ detector database (Section 4.4.6): daqDA_DB_getFile() and daqDA_DB_storeFile(). • a function to export result files to the File Exchange Server (Section 17.4): daqDA_FES_storeFile(). • a function to check if the DA is requested to terminate daqDA_checkShutdown(). If this is the case, the DA should exit within promptly (no more than 30-60 seconds, or may be killed). • functions to retrieve the ECS loop parameters in case of a calibration requiring multiple iterations with different settings: daqDA_ECS_getCurrentIteration() and daqDA_ECS_getTotalIteration(). • a function to store some results locally (e.g. partial results, or output to be used in another DA running locally). This is useful to avoid storing large files in the database in the case they don’t need to be used on other hosts. It can be done with daqDA_localDB_storeFile(). Files stored there are then available at runtime in the directory $DAQ_DETDB_LOCAL. • a function to convert a trigger class name into a trigger class id, which then may be used in the monitoring API (Chapter 5) to request events on a specific trigger class. This function is named daqDA_getClassIdFromName(). ALICE DAQ and ECS manual Detector Algorithms Framework 398 Output messages should simply be written to stdout: they are then redirected to the infologger log repository. The DAs use the monitoring API described in Chapter 5 to retrieve events at runtime, either from a file (LDC DA) or from the online data stream (MON DA). The DAs may export data to AMORE (DQM framework, see Chapter 23) for interactive display and results checking. It usually involves including the file AmoreDA.h provided by the AMORE distribution, and linking with the library libAmoreDA.a. For runtime stability, DAs are required to be provided by detector teams as static executables having no dependency. It allows DAs of different detectors relying on different AliROOT versions to coexist on the same machine. Note that the DA build mechanism is provided by AliROOT and out of control from the DAQ. For reference, DAs may be build with the following make targets in $ALICE_ROOT: • make daqDA-DETCODE-NAME:builds a DA executable • make daqDA-DETCODE-rpm:builds and package in RPM files all the DAs for a given detector detector. The AMORE environment variable should be defined and pointing to the AMORE installation directory in order for AliROOT to link the DAs with AMORE support. Some mandatory information should be provided in the RPM description tag. AliROOT takes care of packaging the DA in RPM. However the documentation fields must be completed in the DA source code (first comment of the file, /* ... */, with the syntax KEYWORD: VALUE). They are automatically extracted from the source code and copied in the RPM description. This information is used to validate and check the packages before deployment at the experimental area. The following fields should be filled in: • Contact : E-mail of package responsibles (development and runtime) • Link : External link to additional DA documentation, including some raw data test files and necessary input configuration files. • Reference Run : The run number of a reference run made at the experimental area in the appropriate conditions for this DA, with recording to CASTOR enabled. This will be used to validate the DA. Such run should use a single LDC for a DA running on LDCs. It should contain a realistic number of events. • Run type : The ECS run type(s) in which this DA should be running. (PHYSICS for global runs, otherwise the detector specific ECS run type in standalone operation). • DA type : LDC or MON (for DAs running respectively on LDC at end of run or on Monitoring node during the run) • Number of events needed : The number of events needed to produce adequate results. • Input files : Names of files needed to run the DA. (these are the files stored ALICE DAQ and ECS manual DA framework implementation 399 in the DAQ detector configuration database). • Output files : Names of files produced by this DA (including local files, FXS files, detDB files) • Trigger types used: Trigger type of events used by this DA. These fields may be checked after the RPM is created with the command rpm -qip daqDA-....rpm 22.4.2 DA control mechanisms The launching mechanism depends on the DA type, LDC or MON. In both cases, the DA executable is called with command line arguments giving the monitoring data source(s) where to get the events (the name of an online monitoring data source for a MON DA, or a list of local data files for a LDC DA). No other command line arguments are allowed to be given to a DA executable. All configuration parameters should be read by the DA from the configuration database. The starting of the DA processes is done through the LAUNCHER facility (identified as such in the infoLogger messages) implemented in the runControl package file named da.c. The DA rely on a set of runtime parameters that must always be defined, because used by some of the I/O functions. 22.4.2.1 Runtime parameters The following runtime parameters, defined as environment variables, are necessary in order to use the DA I/O functions of the DA library. Most of them are provided by the DATE standard setup procedure and completed by calling launcher process for variable items. • DATE_DETECTOR_CODE, DATE_RUN_NUMBER, DATE_ROLE_NAME, DATE_FES_DB, DATE_FES_PATH: access parameters to the File Exchange Server (see Section 17.4). • DATE_RUN_NUMBER, DAQ_DB_LOGBOOK : access to the logbook, e.g. to retrieve information on trigger classes to filter on them. • DAQDALIB_PATH :path to the installation directory of the DAQ DA library, typically /opt/daqDA-lib . This is needed to use the I/O functions (database and File Exchange Server). • AMORE_DB_MYSQL_... : definition of the access parameters to the AMORE database, in case some data shall be exported to the DQM. One may also need to define AMORE_DA_NAME used to identify the target AMORE output table (set to DATE_ROLE_NAME by default). • DAQ_DETDB_LOCAL: location of a local directory that may be used as a persistent location to store data. Usually, it is set to ${DATE_SITE}/${DATE_ROLE_NAME}/db. At the moment, only the DATE_DETECTOR_CODE is not set automatically (because specific to the DA) and should be defined in a wrapper script. ALICE DAQ and ECS manual Detector Algorithms Framework 400 It might also be necessary to define ROOTSYS to a dummy string, e.g. NULL or some ROOT routines called in some DA may crash if this variable is undefined. When testing a DA executable manually, one may define the variable DAQDA_TEST_DIR in order to use a dummy configuration database and File Exchange Server. The directory pointed by this variable will be used to read configuration files from there, and to export result files (as to the File Exchange Server). The DATE_DETECTOR_CODE, DATE_ROLE_NAME, DATE_RUN_NUMBER shall still be defined in this situation. Similarly, if a full AMORE setup is not available in the test environment, it is possible to define AMORE_NO_DB to true and an associated directory AMORE_NO_DB_DIR as a fake database. At run time, the DAQ starts the DA in a directory named DATE_SITE_WORKING_DIR, and typically located in ${DATE_SITE}/${DATE_ROLE_NAME}/PARTITION-DETECTOR/work_DA-... This directory is meant to be a temporary working directory. Content may be cleared after the DA exits. Persistent files should be saved to the configuration database or to the DAQ_DETDB_LOCAL local directory. 22.4.2.2 LDC DA launching LDC DAs are started by the ECS at the end of the run, after data taking is finished. The ECS looks in the DAQ database (Files section in editDb) for a script based on the corresponding hardcoded SYNCHRONOUS action name in the ECS SMI file defining the state machine of this detector. Such database files (host field left empty) are named for example /ECS/FMD/FMD_COMPUTE_GAIN or /ECS/TPC/TPC_PULSER_DA. The content of this file should be an executable (SHELL or other) script, defining necessary variables and launching the corresponding DA executable (passing to it the provided arguments), e.g. export DATE_DETECTOR_CODE=TPC /opt/daqDA-TPC-PULSER/TPCPULSERda.exe $@ The script is copied on the machine and executed at end of run, after the DAQ has completed the run and the local data files are available. 22.4.2.3 MON DA launching MON DAs are started by the runControl following the configuration file named after the runControl: /das/RCNAME.config where RCNAME is the runControl name (and host field left empty). Example names of such files are /das/ALLPHYSICS_1.config for the partition PHYSICS_1 or /das/FMD.config for a standalone FMD detector operation. The file syntax is the following (as implemented in the runControl package): every line contains a sequence of fields: DA name, name of the MON machine (DATE role) where the DA shall be executed, name of the script to be executed (DATE database File entry in editDb, to be named /das/scripts/... (and host field left empty), e.g. /das/scripts/FMD-Base), input parameter to be given to ALICE DAQ and ECS manual DA framework implementation 401 the script (the name of a valid monitoring data source, e.g. ^FMD to monitor FMD data, or @* to monitor full events from all GDCs), and a list of tags (i.e. run types) to activate the DA (typically, the name of the active runControl configuration, as saved from the runControlHI, e.g. DEFAULT or PEDESTAL). An example entry looks like: DA-FMD-BASE mon-DA-FMD-0 FMD-Base ^FMD DEFAULT In the case of a global run, it may be necessary to check (by accessing the logbook) in the DA script if the corresponding detector belongs to the run, because at the moment the runControl starts the MON DA scripts based on the configuration name. Despite some filtering is done on the monitoring data source, this might not be enough if the DA is not using monitoring by detector. The content of the script itself is similar to the LDC DA script described above, and is a simple wrapper to the DA executable. The name and host of the MON roles have to be defined in the DATE configuration database. A simple ROLE entry with TOPLEVEL set to 0, with an associated IPC memory bank of size -1 for a single control pattern is sufficient. Note that the data source and monitoring policy should be chosen with care for the MON DAs, or they may not be able to receive at runtime the necessary data. The MON DA is started at the start of the run, and is left active until the data taking is over. At that point, it receives a QUIT command from the runControl launcher, and the DA process should exit after a reasonnable time (time for post-processing and exporting result files should be kept under control). ALICE DAQ and ECS manual 402 Detector Algorithms Framework ALICE DAQ and ECS manual Part V Data Quality Monitoring November 2010 ALICE DAQ Project Automatic MOnitoRing Environment (AMORE) 23 The quality of the acquired data evolves over time depending on the status of the detectors, its components and the operating environment. To use the valuable bandwidth and the short data-taking period in an optimal way, the quality of the data being actually recorded must be continuously monitored. Data Quality Monitoring involves the online gathering of data, their analysis by user-defined algorithms and the storage and visualization of the produced monitoring information. This chapter describes the data quality monitoring framework AMORE which is based on the DATE monitoring library in conjunction with ROOT. It is a distributed and modular system, where each detector team develops one or several plug-ins on top of the framework. This chapter also describes the generic modules that leverage the development effort of the detectors teams, such as the Generic GUI and the Quality Assurance agent. ALICE DAQ and ECS manual 23.1 Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . 406 23.2 Database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408 23.3 Application flow . . . . . . . . . . . . . . . . . . . . . . . . . 413 23.4 Features details . . . . . . . . . . . . . . . . . . . . . . . . . . 417 23.5 Application Programming Interface (API) . . . . . . . . . . 422 23.6 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Automatic MOnitoRing Environment (AMORE) 406 23.1 Architecture AMORE (Automatic MOnitoRing Environment) is the Data Quality Monitoring (DQM) framework for ALICE. It is a flexible and modular software framework which is used to analyze data samples and produce and visualize monitoring results. The data samples, ie. events or subevents, are coming either from LDCs or from GDCs. Raw data files can also be used as data source. AMORE is founded on the widely-used data analysis framework ROOT and uses the DATE monitoring library (see Figure 23.1). In case the same analysis is needed online and offline, the use of the ALICE Off-line framework for simulation, reconstruction and analysis (AliRoot) is recommended at the level of the module. Figure 23.1 Schema of the main dependencies of AMORE. 23.1.1 Overview AMORE is based on a publisher-subscriber paradigm (see Figure 23.2) where a large number of processes, called agents, execute detector-specific decoding and analysis on raw data samples and publish their results in a pool. Clients can then connect to the pool and visualize the monitoring results through a dedicated user interface. The serialization of the published objects, which occurs on the publisher side before the actual storage in the database, is handled by the facilities provided by ROOT. The only direct communication between publishers and clients consists of notifications by means of DIM. The notifications coming from the outside world, especially from the Experiment Control System (ECS), use the same technology. Figure 23.2 The publisher-subscriber paradigm in AMORE. ALICE DAQ and ECS manual Architecture 407 23.1.2 MonitorObjects As illustrated in Figure 23.2, the monitoring results are encapsulated in so-called MonitorObjects that essentially contain additional metadata allowing a proper and coherent handling by the framework (see Section 23.5.1.1 for details). 23.1.3 AMORE taxonomy AMORE uses a plug-in architecture to avoid any framework's dependency on users' code. The plug-in mechanism is implemented through the ROOT reflection feature. Users, usually detector teams, develop modules that are typically split into two main parts corresponding to the publishing and the subscribing sides of the framework (see Figure 23.3). The modules are built into dynamic libraries that are loaded at runtime by the framework if, and when, it is needed. There are typically 4 libraries produced (stacked boxes on the left), one for each package (the four boxes at the bottom of the figure): Common, Publisher, Subscriber and UI. A module's publisher can be instantiated as many times as needed, to collect more statistics for instance, each instance being called agent. The same is true for the subscriber part of the module; we call these instances clients or GUI. Note that a module can contain several publisher and subscriber classes (not shown in the Figure). The shared libraries produced by the detector’s code are stored in a special directory called amoreSite. Its location is defined in the variable $AMORE_SITE. The directory also contains the file AMORE.params where the database credentials are stored. Figure 23.3 Description of a module. 23.1.4 Publishers The publishers must extend the class PublisherModule. They are meant to analyze the raw data they receive and to publish results under the form of MonitorObjects. However, not all publishers directly do the work. The Quality Assurance (QA) module for example delegates the processing to the AliRoot QA framework. The High Level Trigger (HLT) module retrieves the objects from a private network and publish them in the pool. The module amoreDB simply publishes data retrieved from a database, including from the AMORE pool itself. One should avoid, if possible, to duplicate code but rather choose to delegate the processing to existing frameworks and libraries. ALICE DAQ and ECS manual Automatic MOnitoRing Environment (AMORE) 408 23.1.5 Clients The clients must extend the class VisualModule. They mainly consist of a ROOT GUI in which some MonitorObjects are displayed. Clients are usually tied to the corresponding publisher. There is no limit on what a client can do with the objects it retrieves, but it is in general not a good idea to deeply modify them. Indeed, these modifications will not be saved in the database and therefore will be lost to other clients. As a rule of thumb, only do “cosmetic” changes in the GUI. Even adding lines or boxes to an histogram shouldn’t be part of the client in most cases. 23.2 Database 23.2.1 Overview The pool is implemented as a database. The open-source MySQL system was chosen as it proved to be reliable, performant and light-weight. Figure 23.4 shows a rough schema of the database and the detailed description of the tables follows. The database is used not only to keep the data published by the agents, but also to store the configuration of AMORE as a system. This includes information about the agents such as the machine where they can run and to which detector they belong (amoreconfig table) as well as the optional configuration files. When a new agent is created in the system, a row is added to amoreconfig table. The table where published data will be stored is created or recreated when the agent is started. 23.2.2 Archives Former versions of the MonitorObjects can be kept. This is the case for the recent values of the objects as long as the data table of the agent doesn’t exceed a certain size. This size is specified in the table amoreconfig (see below the tables descriptions for details). To decide what objects must be removed, a First In First Out policy is applied. Snapshots of the MonitorObjects can also be archived for a longer term on user’s request or automatically, at SOR, EOR and every hour during a run. These data are stored in a table that is pretty similar to the agent’s data table but whose name ends with “_archives”. The objects will stay there for a week before being deleted, although they could remain indefinitely if they are marked as permanent. For details on the process which takes care of archiving, please refer to Section 23.4.3. ALICE DAQ and ECS manual Database 409 Figure 23.4 Schema of the database. 23.2.3 Tables descriptions The main tables used by AMORE are described in this section. amoreconfig Table 23.1 amoreconfig List of the agents. Field Description host Machine, specified by its role name, where the agent is allowed to run agentname Name of the agent detector Detector to which the agent belongs (3 letter code) source Default data source (format: see Chapter 5) dimnode Dim server poolnode Database server defaultmodule Default module (in case the library contains several modules) configfile Default configuration file name fifo_size Size of the data table in Bytes image_generation Flag indicating whether objects’ images must be produced production Flag indicating whether this agent must always be running extra_flags Any additional flag to pass to the agent at startup ALICE DAQ and ECS manual Automatic MOnitoRing Environment (AMORE) 410 amoreref Table 23.2 amoreref Configuration files table. Field Description detector Detector to which the file belongs (3 letter code) filename Name of the file data The file itself updatetime Time of last update of the file Agents tables Each agent has a table where its MonitorObjects are stored (one version of each object per row). It is the “pool” where data transits between the publisher (agent) and the subscriber (client). The name of such a table is the name of the agent. Its size shouldn’t exceed the size specified in amoreconfig->fifo_size. This is enforced within the framework. However, its minimal size will be the sum of the size of all the agent’s MonitorObjects. If fifo_size is big enough, several versions of each MonitorObject will be kept in the table, the oldest objects being removed first. Table 23.3 Agents tables fields description Field Description moname Name of the MonitorObject updatetime Time when the object has been stored data The serialized MonitorObject size Size of the data run Run active when the object was last updated image Summary image latest_values Pointers to the latest version (i.e. last published) of each MonitorObject of each agent. By querying it, one gets the time of the last update of a given object. This table speeds up the numerous queries made by the clients by avoiding searching the large data tables and by being stored in RAM (in-memory table). It is kept up to date by triggers on the agent’s data tables. ALICE DAQ and ECS manual Database 411 Table 23.4 latest_values table Field Description agentname The name of the agent publishing the object moname The name of the object updatetime Time when the object was published Archives tables Each agent has an archive table where temporary and permanent copies of MonitorObjects are stored. The naming convention for these tables is: _archives. The structure is very similar to the agent’s data table. Table 23.5 Archives tables Field Description moname Name of the MonitorObject updatetime Time when the object has been stored data The serialized MonitorObject size Size of the object permanent Flag to indicate if the object’s archive is permanent or not run Run active when the object was last updated image Image of the object (if generated) description Description of the archived object globals Global variables used by AMORE. For example, the version of the current database schema is stored here. Table 23.6 globals table Field Description variable Variable name value Value ALICE DAQ and ECS manual Automatic MOnitoRing Environment (AMORE) 412 Roles List of roles, i.e. names given to machines where agents can run. Table 23.7 roles table Field Description name Role’s name host Hostname of the machine Users List the users allowed in the system. Table 23.8 users table Field Description user User’s name Agents_access Agents can only be manipulated (started or stopped) by certain users. This table contains, for each agent, the user(s) allowed to do so. Table 23.9 agents_access table Field Description agentname Agent’s name, foreign key to amoreconfig user Refers to a user in the table “Users” Agents_details The clients might need to know what an agent is publishing. They use DIM to get a list of the objects being published, but this is often not enough. For example, the Generic GUI needs to know the quality or the type of an object without actually loading it from the database. The agents_details table is used for this purpose. It contains, for each agent, a long string providing details on the objects it publishes. The format is: = [ # # # :]*. ALICE DAQ and ECS manual Application flow Table 23.10 413 agents_details table Field Description agent Agent’s name details Details string 23.3 Application flow The agents and the clients are implemented as finite state machines (FSM). The framework binaries, amoreAgent for the agents and amore for the clients, drive the FSM and call the user’s module methods at certain steps. The two FSMs are completely decoupled and the notion of monitor cycle is different on both sides. Thanks to this decoupling, a slow process doesn’t affect the others. ALICE DAQ and ECS manual Automatic MOnitoRing Environment (AMORE) 414 23.3.1 Agents and clients Finite State Machines Figure 23.5 Left: the publisher Finite State Machine. Right: the client Finite State Machine. 23.3.2 Initialization AMORE is a pluggable software where the detectors’ libraries are loaded and the proper class is initialized at runtime. When starting an agent with the binary amoreAgent, three main steps occur: 1. Look up amoreconfig for the agent, check that it exists and runs on the correct machine. Retrieve information about the agent (detector, class name,...) 2. Load the detector library ALICE DAQ and ECS manual Application flow 415 3. Instantiate the class No configuration table exists for the clients, such as amoreconfig for the agents. One must specify the detector name and the module name when starting the client. Therefore the startup is simpler and skips the first step described above. 23.3.3 Agents and clients inheritance and methods calls sequences All the publisher modules must inherit from PublisherModule. Reciprocally, the subscriber modules must inherit from SubscriberModule. Their various methods will then be called by the framework depending on its current state. Figure 23.6 shows the detailed sequence of methods calls made by the framework for the agents and the clients. ALICE DAQ and ECS manual Automatic MOnitoRing Environment (AMORE) 416 Figure 23.6 Sequence of methods calls on the agent and the client modules. ALICE DAQ and ECS manual Features details 417 23.4 Features details 23.4.1 Quality Each MonitorObject has a quality associated with it. This quality is stored within the object and can take different values : • kNULLFLAG : no quality. It can be used for objects such as error message sent to the client or intermediate objects needed to build another object. • kINFO : Good quality. • kWARNING : Object should be checked. • kERROR : Object is clearly out of the reference, there is an error. • kFATAL : Object is so incorrect that measures must be undertaken quickly. To set the quality of an object, usually at End Of Cycle, use the method SetQuality(flag) of the MonitorObject class. By default, if not explicitly set by the publisher, the quality of an object is kFatal. 23.4.2 Expert/Shifter MonitorObjects Each MonitorObject can be classified as shifter or expert, the former representing a subset of the latter. In order to publish a MonitorObject as shifter use the following example: Publish(fMO, “MOname”, “MOtitle”, MonitorObject::kSHIFTER) and as expert: Publish(fMO, “MOname”, “MOtitle”, MonitorObject::kEXPERT) which is the default if nothing is specified. There are two ways of exploiting such a functionality. 1. Start the agent with the option “-f S” . In this way, only the shifter objects will be published in the database and available in the GUI. 2. Start the agent with the usual options and filter the histograms in the client to show only the shifter ones (check for each MonitorObject its expert flag). In this way, all the objects will be published and available in the Logbook and in the database, but only the shifter objects will be displayed in the GUI. 23.4.3 Archiver and FIFO 23.4.3.1 Purpose When an expert is called by an operator, he might want to check and study the objects even though the run has stopped. Therefore, snapshots of the ALICE DAQ and ECS manual Automatic MOnitoRing Environment (AMORE) 418 MonitorObjects must be saved, if not permanently at least for a week or longer. The archiver is meant to give a way to archive and recover interesting MonitorObjects for further study. The archiver must always be running and available to receive new requests. It also performs a clean up every night to erase temporary archives older than 7 days. In addition to these mid and long-term archives, it might also be interesting to keep a very detailed short-term history to discover when a problem occurred or started. This is done through the so-called FIFO, which is directly implemented within the database. It consists in keeping former versions of the objects in a First In First Out queue (see Figure 23.7). This chapter describes the general design of both features. For information about how to operate the archiver, please refer to the ALICE DAQ WIKI. Figure 23.7 23.4.3.2 The archiving system in AMORE. Implementation of the archiver The archiver package in AMORE depends only on the core package. It produces a standalone binary that loads one or more ArchiverModule(s). It uses DIM to receive users’ requests and SOR and EOR notifications; agents declare themselves automatically to the archiver at SOR and EOR. The tasks to be executed by the archiver are stored in an ordered queue. The complete class diagram of the package is shown on Figure 23.8. People interested in further details are encouraged to read the sources. The archiver uses plug-ins, called ArchiverModules, to do the actual work of archiving and cleaning up. The configuration of the archiver is done by means of a configuration file (see Listing 23.1 for an example). The plug-in StdArchiverModule is currently used for both the cleaning and the archiving. It uses stored procedures to execute the archiving and to make permanent archives. ALICE DAQ and ECS manual Features details 419 Figure 23.8 Class diagram (including some interaction information) of the package archiver. Listing 23.1 Example of a configuration file for the archiver 1: 2: 3: 4: 5: 6: 7: Table 23.11 # Archiver config file # Define the archiver module to use to archive archiver_module_archive amore::archiver::StdArchiverModule # Define the archiver module to use to clean up obsolete archives archiver_module_clean amore::archiver::StdArchiverModule # The number of days an archive is kept before it is cleared std_archiver_obsolescence 7 DIM commands DIM commands Description amore/archiver/archive Trigger the archiving. Parameter is ::[ ] If no object specified, all objects are archived. amore/archiver/makePermanent Make an archive permanent. The parameter is :: :: ALICE DAQ and ECS manual Automatic MOnitoRing Environment (AMORE) 420 Table 23.11 23.4.3.3 DIM commands DIM commands Description amore/archiver/agentSOR The command that agents must use to declare themselves as alive at SOR or when they are started Parameter is :: amore/archiver/agentEOR Define the command that agents must use to declare themselves as alive at EOR or when they stop Parameter is :: amore/archiver/printTasks Force the archiver to print a list of the tasks currently in the system. Implementation of the FIFO The recent versions of the objects are kept in the data table (named after the agent’s name). The size of the fifo, ie. the size of the table, is defined in the table amoreconfig for every agent. A table size smaller than the sum of the size of all the published objects results in a default behaviour where only the latest version of each object is kept. The PoolConnection in the publisher takes care of determining if the maximum size is exceeded and, if so, to take the appropriate action, namely to delete the oldest objects 23.4.3.4 Access to the archives The Logbook gives full access to the archives and the FIFO with possibilities of creating archives and making them permanent. An archive request can also be sent from the Generic GUI. 23.4.4 Access Rights One or more users from the table users are associated to every agent. They represent the users allowed to start, stop and restart the agent. If the user “all” is associated to the agent, all users present in the table are allowed to start, stop and restart the agent. When creating a new agent, the operator must specify the allowed user. If the user is not present in the table or the field is left empty, the user “all” will be associated to the agent. 23.4.5 ECS-AMORE interaction 23.4.5.1 Motivation Agents must be able to react to the runs’ Start Of Run (SOR) and End Of Run (EOR) in order to re-initialize themselves accordingly and possibly to reset certain MonitorObjects. ALICE DAQ and ECS manual Features details 23.4.5.2 421 Implementation The class RunControl (core) and its sub-class RunSequence (publisher) inherit from DimClient and subscribe to the SOR and EOR Dim info provided by the Logbook daemon. One command exists for each runControl, ie. for each detector and for each partition. When a standalone run is started for detector XXX, the corresponding command is received. In the case of a partition, the info for the partition is updated with the new run number, as well as the info of each included detector. The RunSequence constructors takes as argument the name of the RunControl to which it must listen. It is therefore the responsibility of the publisher to identify the runControl. It does so by using the detector code attached to the agent and/or the partition given at startup (parameter -p). 23.4.6 Logbook usage 23.4.6.1 Motivation The Logbook contains a large number of metadata about runs. It is thus a valuable source of information that AMORE needs to access. Moreover, AMORE takes advantage of the Logbook web interface to make statistics and objects available to the users worldwide. 23.4.6.2 Usages At SOR, AMORE uses the Logbook to retrieve information about the run, such as its type (PHYSICS, CALIBRATION,...) or the detectors it includes. At SOR, the framework also stores data in the Logbook in order to have it listed in the corresponding page of the web interface (see Table 23.12). Table 23.12 Data passed to the Logbook at SOR Field Description Run number Which run the agent is running for. Detector The code of the detector for which the agent is running Agent’s name Name of the agent Version of the module Version number of the module’s libraries Configuration The configuration file specified at startup or by default (if any) Every agent has one summary image that is generated in its method GetSummaryImage(). During a run, the agent regularly stores it in the Logbook.The update interval is currently set to 2 minutes and can be changed in the class ImagePublisher. Finally, at every end of cycle and at EOR, the agent inserts the following statistics in the Logbook: ALICE DAQ and ECS manual Automatic MOnitoRing Environment (AMORE) 422 • Number of objects • Total number of objects published (all versions of all objects) • Total number of bytes published • Average CPU time per cycle • Average real time per cycle 23.4.7 Multi thread image production The image production is the functionality that makes the images of the MonitorObjects available in the Logbook. In order to enable it, the flag image_generation of the agent in the table amoreconfig must be set to 1. The image generation can perform in single or multi-thread mode. In order to run it in multi-thread mode, the option “-i” must be specified in the launching parameters of the agent. It permits to split the data quality monitoring process in two independent threads, one for the analysis and one for the image production. 23.5 Application Programming Interface (API) This section is dedicated to the API of AMORE. The classes and methods described below are grouped by package. Only the public interface is presented. For details on how to develop a new module please refer to the document “Modules’ developer’s guide” available on the AMORE website (http://ph-dep-aid.web.cern.ch/ph-dep-aid/amore/). 23.5.1 Core 23.5.1.1 MonitorObject Any object published in AMORE is encapsulated in a MonitorObject data structure. To ensure type safety, a templated class hierarchy is used. AMORE provides classes derived from MonitorObject to handle scalars, histograms (1D and 2D) or TObjects, for example. Several of them are templated to specify the type of histograms or scalars being encapsulated. The MonitorObject abstract class contains a set of members that can be accessed (read and/or write). The accessors follow the naming convention (for a member fMyMember: GetMyMember) ALICE DAQ and ECS manual Application Programming Interface (API) Table 23.13 423 Members of the class MonitorObject Member Description Name (read-only) Name of the object, used as a unique id. Can contain only standard characters plus slashes (“/”), but no spaces. Title Title of the object. Description Description of the object. UpdateTime (read-only) Last time the object was published in the database. Quality Quality of the object. Variable of type QualityFlag_t. It can take 1 of 5 values: kNULLFLAG (no quality), kINFO, kWARNING, kERROR and kFATAL. See Chapter 23.4.1 for details ExpertFlag Specifies if the object is for shifter and/or expert. Variable of type ExpertFlag_t. It can take 1 of 2 values: kEXPERT and kSHIFTER. DefaultDrawOption Default draw option to use when drawing this object. DisplayHint Hints about how to display the object in the best way. This is highly type-dependent. At present, the options ‘logx’ and ‘logy’ are accepted for any type of histogram. The following methods are available for each type of MonitorObject. In addition, the interface of MonitorObjects subclasses contain type-specific methods, e.g. Fill(...) for MonitorObjectHisto. Please have a look directly at the header file core/MonitorObject.h to know what methods exist for each type. Reset Synopsis #include “MonitorObject.h” void Reset() Description The Reset method must be called to reset an object. Draw Synopsis #include “MonitorObject.h” void Draw(Option_t* option = “”) Description Draw the object on the current pad. ALICE DAQ and ECS manual Automatic MOnitoRing Environment (AMORE) 424 23.5.1.2 Run A class representing a run. At start of run, the publisher code receives an object of this type. RunType Synopsis #include “Run.h” string RunType() Description Return the run type. RunNumber Synopsis #include “Run.h” RunNumberType RunNumber() Description Returns the run number. RunDuration Synopsis #include “Run.h” int RunDuration() Description 23.5.1.3 Returns the number of minutes elapsed since this objects has been created. This can be either the number of minutes since the start of the run if we received the SOR or the number of minutes since this object was created in case we started the agent after the SOR. The number of minutes is rounded down. ConfigFile This class represents a configuration file (stored in the database or in the file system) and provides methods to access its content. It tries to parse the file during its initialization. If the format is not recognized (pairs separated by spaces) the user can still use the object to retrieve its content and make its own parsing. The user gets a reference to a ConfigFile if one is specified at startup; he can also instantiates such an object at anytime. Please refer to the Modules’ developer’s guide for more details on how to use configuration files. ALICE DAQ and ECS manual Application Programming Interface (API) 425 InitWithFile Synopsis #include “ConfigFile.h” void InitWithFile (string filePath) Description Initialize the object with the file specified by its path. InitWithContent Synopsis #include “ConfigFile.h” void InitWithContent (string content) Description Initialize the object directly with the content of a file. Exists Synopsis #include “ConfigFile.h” void Exists () Description Returns Call this method to know if this object has been initialized. True if this object has been initialized, false otherwise. Get Synopsis #include “ConfigFile.h” string Get (string key) Description Returns If the parsing of the file was successful, and if the key was specified, the method returns the value associated with the key. The value associated with the key. Contains Synopsis #include “ConfigFile.h” string Contains (string key) ALICE DAQ and ECS manual Automatic MOnitoRing Environment (AMORE) 426 Description Returns If the parsing of the file was successful, tells if the key was specified in the file. True if the key was specified in the file, false otherwise. GetContent Synopsis #include “ConfigFile.h” string GetContent () Returns The content of the file. GetMap Synopsis #include “ConfigFile.h” map GetMap () Description Returns If the parsing of the file was successful, returns the map of pairs created during initialization. In case the parsing failed, returns an empty map. See description. 23.5.2 Publisher 23.5.2.1 PublisherModule PublisherModule is an abstract class from which all the publisher classes must inherit. It contains a certain number of methods called by the publisher’s Finite State Machine. All of them, apart Reset(), Version() and GetSummaryImage(), must be overwritten by the sub-classes. See Figure 23.6 to know when each method is called by the FSM. Below is the description of the methods not shown in the figure. GetSummaryImage Synopsis #include “PublisherModule.h” string GetSummaryImage() Description If implemented by the sub-classes, returns a so-called summary image. ALICE DAQ and ECS manual Application Programming Interface (API) 23.5.2.2 427 PublicationManager The PublicationManager provides the interface to the publication methods. It also gives access to a certain number of utility methods, for example to know the run number or the agent’s name. Each sub-class of PublisherModule (see above) has access to a global variable of type PublicationManager: gPublisher. It is the only reference to the framework that the user’s modules have. Publish Synopsis #include “PublicationManager.h” int Publish (TYPE object, const char* name, const char* title,...) Description Returns This method exists in many flavours depending on the type of the object one wants to publish. It can be a MonitorObject or a TObject, the latter being encapsulated in a MonitorObjectTObject within the method. When publishing scalars or histograms, it is required to specify through templates the real type of the object (TH1F or TH1D for example). To be effective, a call to this method must occur within BookMonitorObjects or StartOfRun in the class PublisherModule. Publish() doesn’t actually update the object in the data pool; it declares it as being part of the set which must be updated at every end of monitor cycle. 0 in case of success, 1 otherwise. AgentName Synopsis #include “PublicationManager.h” string AgentName () Description Returns the agent’s name. GetCurrentRun Synopsis #include “PublicationManager.h” Run* GetCurrentRun () Description Returns the current run number. Unpublish ALICE DAQ and ECS manual Automatic MOnitoRing Environment (AMORE) 428 Synopsis #include “PublicationManager.h” int Unpublish (MonitorObject*& mo) Description Returns Undo the Publish, ie. removes the object from the set of objects being updated at every end of cycle. To be effective, a call to this method must occur within the method PublisherModule::StartOfRun(). 0 in case of success, 1 otherwise. GetDbFileContent Synopsis #include “PublicationManager.h” string GetDbFileContent (string detector, string filename) Description Returns Load the specified file from the table amoreref. The content of the file if it exists, an empty string otherwise. DownloadDbFile Synopsis #include “PublicationManager.h” bool DownloadDbFile (string detector, string filename) Description Returns Load the specified file from the table amoreref and save its content in a file named filename in the current directory. If no file was found, an empty file is created. True if the file existed and was successfully downloaded, false otherwise. Quit Synopsis #include “PublicationManager.h” void Quit () Description Ask the framework to quit. It does so in a graceful way, executing first the end of run sequence. GetMonitorObject Synopsis #include “PublicationManager.h” ALICE DAQ and ECS manual Application Programming Interface (API) 429 MonitorObject* GetMonitorObject (string name) Description Returns the MonitorObject published under the name name. If no object found, returns NULL. 23.5.3 Subscriber 23.5.3.1 SubscriptionManager The SubscriptionManager provides the interface to the subscription methods. It also gives access to a certain number of utility methods. Each sub-class of VisualModule (see below) has access to a global variable of type SubscriptionManager: gSubscriber. It is the only reference to the framework that the user’s visual modules have. Subscribe Synopsis #include “SubscriptionManager.h” int Subscribe (const char* name) Description Returns Subscribe to the object given by ‘name’ = “ / / ”. In the variant without parameters, it unsubscribes from all the objects. In the first variant, with parameters, the return codes are: • ALICE DAQ and ECS manual 0: success. Automatic MOnitoRing Environment (AMORE) 430 • 1: already unsubscribed. • 2: object never subscribed. • -1: the proxy doesn’t exist. The variant without parameters returns the number of objects that were unsubscribed. At Synopsis #include “SubscriptionManager.h” template MonitorObjectType* At(const char* key); Description Returns the MonitorObject for the key = “ / ”. The object must have been subscribed beforehand. Last Synopsis #include “SubscriptionManager.h” template MonitorObjectType* Last(const char* agent, const char* object); Description Returns the MonitorObject specified by its agent and object name, NULL if none was found. The returned object must be deleted by the caller. Reset Synopsis #include “SubscriptionManager.h” int Reset (string agentName) Description Returns Send a reset command to the agent named agentName • 1:Ssuccess. • 0: Reset could not be delivered (DIM issue). • -1: Agent not found. Stop Synopsis #include “SubscriptionManager.h” ALICE DAQ and ECS manual Application Programming Interface (API) 431 int Stop (string agentname) Description Returns Send a command to the agent named agentName asking it to stop and exit. • 1: Success. • 0: Stop could not be delivered (DIM issue). • -1: Agent not found. Archive Synopsis #include “SubscriptionManager.h” void Archive (const char* agentname, const char* moname, const char* description) Description Archive the object moname of agent agentname and put the description description. ActiveAgents Synopsis #include “SubscriptionManager.h” string ActiveAgents (const char* const det = 0) Description Returns a string containing a list (colon separated) of the active agents in the system. AllowedActiveAgents Synopsis #include “SubscriptionManager.h” vector AllowedActiveAgents (const char* name) Description Returns a list of agents the subscriber is allowed to act on. Users can use this method to know whether they are allowed to start, stop or reset a given agent. An agent is allowed to act (stop/start) on an agent if one of the following is true: a. The user (login name) that started the agent is the same as the user who started the client b. The detector code of the agent is the same as the detector code of the client c. The user who started the client is equal to the detector code of the agent ALICE DAQ and ECS manual Automatic MOnitoRing Environment (AMORE) 432 AgentPublications Synopsis #include “SubscriptionManager.h” string AgentPublications (const char* const agentname) Description Return a list of all the objects published by the agent called agentname. AgentPublicationsDetailsStop Synopsis #include “SubscriptionManager.h” string AgentPublicationsDetails (const char* const agentname) Description Returns Return a list of all the objects published by the agent called agentName with details on their quality, their type and their visibility (expert/shifter). A string with the format (repeted for each agent): agentName#type#quality#visibility: GetDbFileContent Synopsis #include “SubscriptionManager.h” string GetDbFileContent (string detector, string filename) Description Returns Load the specified file from the table amoreref. The content of the file if it exists, an empty string otherwise. DownloadDbFile Synopsis #include “SubscriptionManager.h” bool DownloadDbFile (string detector, string filename) Description Returns Load the specified file from the table amoreref and save its content in a file named filename in the current directory. If no file was found, an empty file is created. True if the file existed and was successfully downloaded, false otherwise. StoreFile & StoreFileContent ALICE DAQ and ECS manual Application Programming Interface (API) Synopsis 433 #include “SubscriptionManager.h” int StoreFile (string filename, string pathToFile, bool overwrite) int StoreFileContent (string filename, string content, bool overwrite) Description Returns Store a file in the database table amoreref. The first variant accepts a path name to a file, whereas the second directly takes the content of the file. The parameter filename is used to name the file in the database and can be different from the file on disk. • 0: Successful insertion. • 1: File overwritten. • -1: File already exists, no overwrite. • -2: Failure. GetFilesList Synopsis #include “SubscriptionManager.h” vector GetFilesList (string pattern=””) Description Returns Find the (configuration) files stored in the database for the user (login name) who started the client. If pattern is specified, only the files whose names contain the pattern will be returned. A vector of the files’ names corresponding to the criteria GetDetector Synopsis #include “SubscriptionManager.h” string GetDetector (string agentname) Description Returns the detector’s code of the agent called agentname GetRun Synopsis #include “SubscriptionManager.h” int GetRun(const char* key) ALICE DAQ and ECS manual Automatic MOnitoRing Environment (AMORE) 434 Description Each MonitorObject is stored with the run number during which it was published. This method returns the run number for the object specified by key = “ / ”. 23.5.4 User Interface (UI) 23.5.4.1 VisualModule VisualModule is an abstract class from which all the UI classes must inherit. It extends SubscriberModule. It contains a certain number of methods called by the subscriber’s Finite State Machine. All of them, apart Reset(), must be overwritten by the sub-class even though most of them are usually left empty. See Figure 23.6 to know when each method is called by the FSM. You can notice that the methods are almost the same between PublisherModule and VisualModule (inheriting from SubscriberModule). This is made on purpose to have the same structure and the same FSM on both side of the framework. However, the drawback is that certain methods, especially StartOfRun and EndOfRun, are not meaningful on the client side. Below a subset of methods is listed which are either especially important or need a bit of explanation. fConfigFile Synopsis Description #include “VisualModule.h” A pointer to the configuration file. It might be null if no configuration file was specified by the user at startup (using the flag -g). Construct Synopsis #include “VisualModule.h” void Construct() Description Build the user interface within this method. Update Synopsis #include “VisualModule.h” void Update() Description Get updates of the subscribed MonitorObjects in this method. ALICE DAQ and ECS manual Application Programming Interface (API) 435 Process Synopsis #include “VisualModule.h” void Process() Description Process the MonitorObjects retrieved in Update() in this method. InitAuto Synopsis #include “VisualModule.h” void InitAuto() Description The user interface calls the Construct at startup. However this behaviour can be changed by overwriting this method. Returns true if the module should initialize automatically, false otherwise. In the latter case, the user will have to press the button “Init”. StartAuto Synopsis #include “VisualModule.h” void StartAuto() Description The user interface waits after it has been initialized. This behaviour can be changed in order to start updating the objects directly. Returns true if the module should initialize and start automatically, false otherwise. In the latter case, the user will have to press the button “Start”. UpdatePeriod Synopsis #include “VisualModule.h” int UpdatePeriod() Description The user interface updates the objects at regular intervals. By default, the duration of the intervals is set to 30 seconds. This value can be changed within the module, by overwriting the method UpdatePeriod() in the subclass of VisualModule. The end user can, of course, still change the duration. The method will be called right after Construct() has been called. ALICE DAQ and ECS manual Automatic MOnitoRing Environment (AMORE) 436 23.5.5 Detector Algorithms (DA) library AmoreDA Synopsis #include “AmoreDA.h” AmoreDA(EMode mode) Description Constructor of the class AmoreDA. mode should be kSender on the DA side. In an hypothetical client, it would be kReceiver. Send Synopsis #include “AmoreDA.h” int Send(const char* objectName, const TObject* obj, const bool asMonitorObject=false) Description Returns This function has to be used in the DA to send the object obj under the name objectName to the AMORE pool. It will be stored in a table named after the content of the environment variable $AMORE_DA_NAME. If this variable is not defined, $DATE_ROLE_NAME is used instead. The last parameter, asMonitorObject, stipulates whether the TObject must be encapsulated within a MonitorObject or not. 0 on success, 1 otherwise. 23.5.6 Archiver 23.5.6.1 ArchiverModule Archive Synopsis #include “ArchiverModule.h” void Archive(string agentName, string moName, string updatetime, string desc) Description This method must implement the archiving of the object specified by its agent, its name and its updatetime. Clean ALICE DAQ and ECS manual Tools 437 Synopsis #include “ArchiverModule.h” void Clean() Description This method must clean up too old, non-permanent archives. Init Synopsis #include “ArchiverModule.h” void Init(PoolInterface* pi, ConfigFile cf) Description This method gives the opportunity to the module to set up its internal state, given a connection to the database (PoolInterface) and to the configuration file. 23.6 Tools The AMORE package contains various tools in the form of binaries or scripts. Their purpose is to set up a machine, to configure or discover the environment, to launch agents and clients and finally to manage the infrastructure (agents or configuration files). Here is a list of these tools sorted by category. For details on their usage please refer to the AMORE section in the ALICE DAQ WIKI. Setup Utilities Launchers Management newAmoreSite Create and set up a new AMORE_SITE amoreMysqlSetup tables) Configures MySQL for AMORE (create the db, users and amore-config Get all configuration parameters for AMORE amoreSetup Set up the environment according to AMORE.params amoreUpdateDB.tcl Update the database scheme (after an update of AMORE) amore Start a client amoreAgent Start an agent amoreArchiver Start the archiver newAmoreAgent Create a new agent amoreConfigFilesBrowser An interface to manage the configuration files amoreAgentsManager.tcl An UI to start/stop the agents and the archiver amoreEditDb ALICE DAQ and ECS manual An expert UI to browse and edit the database 438 Automatic MOnitoRing Environment (AMORE) ALICE DAQ and ECS manual Part VI The ALICE electronic logbook November 2010 ALICE DAQ Project eLogBook The ALICE Electronic Logbook 24 ALICE needs a bookkeeping facility to record not only the activities at the experimental area but also all the non-physics metadata associated with each performed run. As shifters come and go, a central information repository is needed to store reports of incidents, configuration changes, achievements or planned operations. Furthermore, a historical record of data-taking conditions and statistics is needed not only to allow the selection of good run candidates for prioritized offline processing, but also to detect trends and correlations, create aggregated reports and assist the run coordination in fulfilling the scientific goals.The ALICE Electronic Logbook (eLogbook) fulfills this requirement, providing a repository for both shifters/experts reports and run statistics/conditions. It also provides a modern Web-based Graphical Human Interface, allowing the members of the ALICE collaboration to access and filter this vast data record easily. ALICE DAQ and ECS manual 24.1 Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 24.2 Database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 24.3 Application Programming Interface . . . . . . . . . . . . . . 461 24.4 Logbook Daemon . . . . . . . . . . . . . . . . . . . . . . . . 483 24.5 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484 24.6 Graphical User Interface. . . . . . . . . . . . . . . . . . . . . 487 The ALICE Electronic Logbook 442 24.1 Architecture 24.1.1 Overview The ALICE Electronic Logbook (eLogbook) is the Data Acquisition bookkeeping facility in ALICE. It is based on a LAMP (Linux, Apache, MySQL and PHP) software stack, with the relational database (see Section 24.2) serving as a data repository and the Web-based Graphical User Interface (see Section 24.6) providing interactive access to members of the ALICE collaboration. An Application Programming Interface implemented in C (see Section 24.3) and several command-line tools (see Section 24.5) provide read/write access to the repository. A daemon process (see Section 24.3) collects data published by the DCS and inserts it in the DB. Some of this data is gathered by a dedicated process that extracts information published by the LHC via the DIP protocol and republishes it in DCS (see Chapter 25). Figure 24.1 The architecture of the eLogbook and it’s interfaces with the other ALICE systems and the LHC. ALICE DAQ and ECS manual Database 443 24.2 Database 24.2.1 Overview The DB, running on a MySQL Server, is used to store heterogeneous data related with the experiment's activities. InnoDB is used as a storage engine for its support of both transactions and foreign keys constraints. As shown in Figure 22.2, the tables that compose this DB can be grouped into 4 different categories: Figure 24.2 • RUN CENTRIC: related to a specific run. • LOG ENTRY CENTRIC: related to a specific human or automatic text report with optional file attachment. • USER CENTRIC: related to the GUI users. • OTHER: tables that do not belong to the previous categories. eLogbook’s database schema A stored procedure, called update_logbook_counters, is executed every 60 seconds (and at the end of each run) to update the different global counters in the logbook table, whose value depends on partial counters spread throughout several tables. Daily backups are performed to a RAID 6 disk array and the CERN Advanced STORage manager (CASTOR). 24.2.2 Table description Below is a description of the eLogbook’s tables. ALICE DAQ and ECS manual The ALICE Electronic Logbook 444 24.2.2.1 logbook table This table stores per run information. It is populated by: Table 24.1 • PCA/DCA: run, time_created, time_completed, partition, detector, run_type, calibration, numberOfDetectors, detectorMask, ecs_success, daq_success, eor_reason, ecs_iteration_current and ecs_iteration_total fields. • PCA Human Interface: runQuality field. • runControl: HLTmode, DAQ_time_start, DAQ_time_end, runDuration, detectorMask, numberOfLDCs, numberOfGDCs, LDClocalRecording, GDClocalRecording, GDCmStreamRecording, eventBuilding and dataMigrated fields. • logbookDaemon: beamEnergy, beamType, LHCFillNumber, LHCTotalInteractingBunches, LHCTotalNonInteractingBunchesBeam1, LHCTotalNonInteractingBunchesBeam2, L3_magnetCurrent and Dipole_magnetCurrent fields. • CTP software: L2a and ctpDuration fields. • TDSM: dataMigrated field. • update_logbook_counters stored procedure:runDuration, totalSubEvents, totalDataReadout, totalEvents, totalDataEventBuilder, totalDataRecorded, averageDataRateReadout, averageDataRateEventBuilder, averageDataRateRecorded, averageSubEventsPerSecond and averageEventsPerSecond fields. logbook table (per run conditions and statistics) Field Description run Run number time_created Run number creation timestamp DAQ_time_start Start of data acquisition timestamp DAQ_time_end End of data acquisition timestamp time_completed End of run timestamp time_update Database row update date/time runDuration Duration of data acquisition partition ECS partition name (NULL if standalone run) detector Detector name (NULL if global run) run_type ECS run type calibration Flag indicating if run is a calibration run ecs_success Flag indicating if run finished successfully daq_success Flag indicating if run data acquisition finished successfully ALICE DAQ and ECS manual Database 445 Table 24.1 logbook table (per run conditions and statistics) Field Description eor_reason End of run reason as declared by the ECS beamEnergy Single beam energy in GeV beamType Type of collisions (‘p-p’, ‘Pb-Pb’, ‘p-Pb’, NULL if no collisions) LHCFillNumber LHC fill number LHCTotalInteractingBunches Total number of interacting bunches LHCTotalNonInteractingBunchesBea m1 Total number of non-interacting bunches in beam 1 LHCTotalNonInteractingBunchesBea m2 Total number of non-interacting bunches in beam 2 numberOfDetectors Number of detectors participating in the run detectorMask Bitmask of detectors participating in run (LSB = detector ID zero) log NOT USED totalSubEvents Total number of subevents collected by readout totalDataReadout Total size of data collected by readout in bytes averageSubEventsPerSecond Average number of subevents per second averageDataRateReadout Average data rate collected by readout in bytes/second totalEvents Total number of events generated by eventBuilder totalDataEventBuilder Total size of data generated by eventBuilder in bytes averageEventsPerSecond Average number of events per second averageDataRateEventBuilder Average data rate generated by eventBuilder in bytes/second totalDataRecorded Total size of data recorded by mStreamRecorder in bytes averageDataRateRecorded Average data rate recorded by mStreamRecorder in bytes/second numberOfLDCs Total number of LDCs participating in the run numberOfGDCs Total number of GDCs participating in the run ALICE DAQ and ECS manual The ALICE Electronic Logbook 446 Table 24.1 24.2.2.2 logbook table (per run conditions and statistics) Field Description numberOfStreams NOT USED LDClocalRecording Flag indicating if local recording in the LDCs was enabled GDClocalRecording Flag indicating if local recording in the GDCs was enabled GDCmStreamRecording Flag indicating if mStreamRecording mode was enabled eventBuilding Flag indicating if Event Building was enabled LHCperiod LHC period ID (e.g. LHC09c) HLT mode High Level Trigger mode dataMigrated Status of the data migration to Tier 0 runQuality Global run quality flag for the run as specified by the ECS shifter L3_magnetCurrent Current of the L3 magnet in Amperes (signed) Dipole_magnetCurrent Current of the Dipole magnet in Amperes (signed) L2a Total number of L2a trigger decisions generated ctpDuration Duration during which at least 1 trigger class time sharing group was active since SOR in seconds ecs_iteration_current Current ECS iteration number for detector calibration with several runs ecs_iteration_total Total ECS iterations expected for detector calibration with several runs logbook_detectors table This table stores per detector per run information. It is populated by: Table 24.2 • runControl: run_number, detector and run_type fields. • GUI: run_quality field. • CTP software: L2a field. logbook_detectors table Field Description run_number Run number detector Detector name ALICE DAQ and ECS manual Database 447 Table 24.2 24.2.2.3 logbook_detectors table Field Description run_type ECS run type run_quality Run quality for the detector/run pair as indicated by the ECS shifter (‘No report’, ‘Good run’, ‘Bad run’) L2a Number of L2a trigger decisions generated for this detector logbook_stats_LDC table This table stores per LDC per run information. The HLT counters are populated by hltAgent and the other fields by readout. Table 24.3 24.2.2.4 logbook_stats_LDC table Field Description run Run number LDC LDC rolename detectorId Detector ID as specified by the id field of the DETECTOR_CODES table eventCount Number of subevents collected by readout eventCountPhysics Number of PHYSICS subevents collected by readout eventCountCalibration Number of CALIBRATION subevents collected by readout bytesInjected Size of data collected by readout in bytes bytesInjectedPhysics Size of PHYSICS data collected by readout in bytes bytesInjectedCalibration Size of CALIBRATION data collected by readout in bytes hltAccepts Number of HLT accept decisions for this LDC hltRejects Number of HLT reject decisions for this LDC hltBytesRejected Size of data rejected by HLT decisions for this LDC in bytes time_update Database row update date/time logbook_stats_LDC_trgCluster table This table stores per trigger cluster per LDC per run information. It is populated by readout. ALICE DAQ and ECS manual The ALICE Electronic Logbook 448 Table 24.4 24.2.2.5 logbook_stats_LDC_trgCluster table Field Description run Run number LDC LDC rolename cluster Trigger cluster ID eventCount Number of subevents collected by UHDGRXW bytesInjected Size of data collected by readout in bytes time_update Database row update date/time logbook_stats_GDC table This table stores per GDC per run information. It is populated by eventBuilder. Table 24.5 24.2.2.6 logbook_stats_GDC table Field Description run Run number GDC GDC rolename eventCount Number of events generated by eventBuilder eventCountPhysics Number of PHYSICS events generated by eventBuilder eventCountCalibration Number of CALIBRATION events generated by eventBuilder bytesRecorded Size of data generated by eventBuilder in bytes bytesRecordedPhysics Size of PHYSICS data generated by eventBuilder in bytes bytesRecordedCalibration Size of CALIBRATION data generated by eventBuilder in bytes time_update Database row update date/time logbook_stats_files table This table stores per data file per run information related with the full data chain, from the file creation up to the migration to CASTOR. It is mostly populated by mStreamRecorder, with the status field (and corresponding timestamps) also updated by TDSM. Table 24.6 logbook_stats_files table Field Description id File ID run Run number ALICE DAQ and ECS manual Database 449 Table 24.6 24.2.2.7 logbook_stats_files table Field Description fileName Filename (without path) location Path to current file directory local Flag indicating if file is local rolename Rolename of DAQ node writing the file hostname Hostname of DAQ node writing the file pid Process ID writing the file time_update Database row update date/time time_write_begin File writing start date/time time_write_end File writing end date/time time_migrate_request File migration request date/time time_migrate_begin File migration start date/time time_migrate_end File migration end date/time status File status (‘Writing’, ‘Closed’, ‘Waiting migration request’, ‘Migration requested’, ‘Migrating’, ‘Migrated’) eventCount Number of events written to file size File size in bytes logbook_daq_active_components table This table stores per run information related with the active DAQ components (DDLs, LDCs, GDCs). It is populated by runControl. Table 24.7 24.2.2.8 logbook_daq_active_components table Field Description run Run number LDC Active LDCs IDs (LSB = ID zero) GDC Active GDCs IDs (LSB = ID zero) DDL Active DDLs IDs (LSB = ID zero) logbook_shuttle table This table stores per run information related with the Offline Shuttle processing. It is populated by ECS, with the different processing status updated directly by the Shuttle software. ALICE DAQ and ECS manual The ALICE Electronic Logbook 450 Table 24.8 logbook_shuttle table Field Description run Run number shuttle_done Flag indicating if Shuttle processing is complete test_mode Flag indicating if Shuttle should run in test mode update_time Database row update date/time SPD Shuttle processing status for detector SPD (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) SDD Shuttle processing status for detector SDD (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) SSD Shuttle processing status for detector SSD (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) TPC Shuttle processing status for detector TPC (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) TRD Shuttle processing status for detector TRD (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) TOF Shuttle processing status for detector TOF (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) PHS Shuttle processing status for detector PHOS (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) CPV Shuttle processing status for detector CPV (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) HMP Shuttle processing status for detector HMPID (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) MCH Shuttle processing status for detector MUON_TRK (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) MTR Shuttle processing status for detector MUON_TRG (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) PMD Shuttle processing status for detector PMD (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) FMD Shuttle processing status for detector FMD (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) T00 Shuttle processing status for detector T0 (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) V00 Shuttle processing status for detector V0 (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) ZDC Shuttle processing status for detector ZDC (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) ACO Shuttle processing status for detector ACORDE (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) ALICE DAQ and ECS manual Database 451 Table 24.8 24.2.2.9 logbook_shuttle table Field Description EMC Shuttle processing status for detector EMCal (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) TST Shuttle processing status for detector DAQ_TEST (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) HLT Shuttle processing status for HLT (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) GRP Shuttle processing status of global run parameters (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) TRI Shuttle processing status for of the Trigger parameters (‘UNPROCESSED’, ‘INACTIVE’, ‘FAILED’, ‘DONE’) logbook_DA table This table stores per run per Detector Algorithm information. Table 24.9 24.2.2.10 logbook_DA table Field Description run Run number detector Detector name related to the DA DAname DA name DAversion DA version number (from RPM) DAstdout Output of the DA role Rolename of DAQ node associated with the DA exitCode Exit code of the DA commandLine Executed command workingDir Working directory logbook_AMORE_agents table This table stores per run per AMORE agent information. It is populated by the AMORE framework. Table 24.10 logbook_AMORE_agents table Field Description run Run number detector Detector name related to the AMORE agent agentName AMORE agent name ALICE DAQ and ECS manual The ALICE Electronic Logbook 452 Table 24.10 24.2.2.11 logbook_AMORE_agents table Field Description agentVersion AMORE agent version number (from RPM) runtimeParameters AMORE agent runtime parameters MOpublished Number of published Monitoring Objects MOversionsPublished Number of published Monitoring Objects versions bytesPublished Total size of published Monitoring Objects in bytes averageCPUtime Average CPU time of a monitor cycle averageRealTime Average real time of a monitor cycle QAsummaryImage Quality Assurance summary image QAsummaryImageSize Size of Quality Assurance summary image in bytes time_update Database row update date/time logbook_trigger_clusters table This table stores per run per trigger cluster information. It is populated by the CTP software. Table 24.11 24.2.2.12 logbook_trigger_clusters table Field Description run Run number partition ECS partition name cluster Trigger cluster ID (1-6) detectorMask 24-bit detector IDs bitmask (LSB = detector ID zero) corresponding to the readout detectors of this cluster inputDetectorMask 24-bit detector IDs bitmask (LSB = detector ID zero) corresponding to the trigger detectors of this cluster triggerClassMask 50-bit trigger classes ID bitmask (LSB = trigger class ID zero) of the trigger classes triggering this cluster logbook_trigger_classes table This table stores per run per trigger class information. It is populated by: • CTP software: run, classId, className, classGroupId, classGroupTime, L0b, L0a, L1b, L1a, L2b, L2a and ctpDuration fields. • hltAgent: hltAccepts, hltPartialAccepts, hltOnly, hltRejects and hltBytesRejected fields. ALICE DAQ and ECS manual Database 453 Table 24.12 24.2.2.13 logbook_trigger_classes table Field Description run Run number classId Trigger class ID (0-49) className Trigger class name classGroupId Trigger class time sharing group ID classGroupTime Trigger class time sharing group duration in seconds L0b Number of L0b trigger decisions generated for this trigger class L0a Number of L0a trigger decisions generated for this trigger class L1b Number of L1b trigger decisions generated for this trigger class L1a Number of L1a trigger decisions generated for this trigger class L2b Number of L2b trigger decisions generated for this trigger class L2a Number of L2a trigger decisions generated for this trigger class ctpDuration Duration during which this trigger class was active since SOR in seconds hltAccepts Number of HLT accept decisions for this trigger class hltPartialAccepts Number of HLT partial accept decisions for this trigger class hltOnly Number of HLT only decisions for this trigger class hltRejects Number of HLT reject decisions for this trigger class hltBytesRejected Size of data rejected by HLT decisions for this trigger class in bytes logbook_trigger_inputs table This table stores per run per trigger input information. It is populated by the CTP software. Table 24.13 logbook_trigger_inputs table Field Description run Run number inputLevel Trigger input level (0-2) inputId Trigger input Id (1-24 for level 0, 1-24 for level 1, 1-12 for level 2) ALICE DAQ and ECS manual The ALICE Electronic Logbook 454 Table 24.13 24.2.2.14 logbook_trigger_inputs table Field Description inputName Trigger input name inputCount Number of trigger signals for this trigger input logbook_trigger_config table This table stores per run information related with the CTP configuration. It is populated by the CTP software. Table 24.14 24.2.2.15 logbook_trigger_config table Field Description run Run number configFile Trigger configuration file alignmentFile Trigger alignment file logbook_stats_HLT table This table stores per run information related with the HLT decisions. It is populated by the update_logbook_counters stored procedure. Table 24.15 24.2.2.16 logbook_stats_HLT table Field Description run Run number hltAccepts Total number of HLT accept decisions for this run hltPartialAccepts Total number of HLT partial accept decisions for this run hltOnly Total number of HLT only decisions for this run hltRejects Total number of HLT reject decisions for this run hltBytesRejected Size of data rejected by HLT decisions for this run in bytes time_update Database row update date/time logbook_stats_HLT_LDC table This table stores per run per HLT LDC information. It is populated by hltAgent. Table 24.16 logbook_stats_HLT_LDC table Field Description run Run number ALICE DAQ and ECS manual Database 455 Table 24.16 24.2.2.17 logbook_stats_HLT_LDC table Field Description LDC HLT LDC rolename hltAccepts Number of HLT accept decisions taken by this HLT LDC hltPartialAccepts Number of HLT partial accept decisions taken by this HLT LDC hltOnly Number of HLT only decisions taken by this HLT LDC hltRejects Number of HLT reject decisions taken by this HLT LDC time_update Database row update date/time logbook_comments table This table stores the Log Entries. It is populated by the GUI (human generated Log Entries such as End-Of-Shift reports), by runControl (automatic “start/end of run” Log Entries) and by the PCA Human Interface (ECS operator “end of run” Log Entries). Table 24.17 logbook_comments table Field Description id Log Entry ID run Run number associated with the Log Entry userid User ID of the Log Entry author as specified in the id field of the logbook_users table title Log Entry title comment Log Entry body class Log Entry class (‘HUMAN’, ‘PROCESS’) type Log Entry type (‘GENERAL’, ‘HARDWARE’, ‘CAVERN’, ‘SOFTWARE’, ‘NETWORK’, ‘EOS’, ‘OTHER’) time_created Log Entry creation date/time deleted Flag indicating if the Log Entry is deleted parent Parent Log Entry (for threads) root_parent Root parent Log Entry (for threads) dashboard Flag indicating if the Log Entry is an announcement time_validity Log Entry validity date/time ALICE DAQ and ECS manual The ALICE Electronic Logbook 456 24.2.2.18 logbook_comments_interventions table This table expands the logbook_comments table, adding additional information related to on call interventions. It is populated by the GUI. Table 24.18 24.2.2.19 logbook_comments_interventions table Field Description commentid Log Entry ID type Intervention type (‘REMOTE’, ‘ONSITE’) logbook_files table This table stores the files attached to the Log Entries. It is populated by the GUI. Table 24.19 24.2.2.20 logbook_files table Field Description commentid ID of the Log Entry to which the file is attached to fileid File ID filename Filename size File size in bytes title File title data File binary data thumbnail_small File small-sized thumbnail (100x100, only filled if file is an image) thumbnail_medium File medium-sized thumbnail (320x320, only filled if file is an image) content_type File Content-Type time_created File creation date/time deleted Flag indicating if the file is deleted logbook_threads table This table expands the logbook_comments table, adding additional information related to threads. It is populated by the GUI. Table 24.20 logbook_threads table Field Description id Log Entry ID (root parent Log Entry) title Thread title ticket_status Ticket status (‘OPEN’, ‘CLOSED’) ALICE DAQ and ECS manual Database 457 24.2.2.21 logbook_subsystems table This table stores the definition of the Log Entries Subsystems. Table 24.21 24.2.2.22 logbook_subsystems table Field Description id Subsystem ID name Subsystem name text Subsystem text (to be displayed in GUIs) parent Parent subsystem email Automatic email notification email address (single or multiple in CSV format) notify_no_run_log_entries Flag indicating if an automatic email notification should be sent for the Log Entries of this subsystem without run number notify_run_log_entries Flag indicating if an automatic email notification should be sent for the Log Entries of this subsystem with a run number notify_quality_flags Flag indicating if an automatic email notification should be sent for the Log Entries of this subsystem related with the change of the quality flags logbook_comments_subsystems table This table stores the relationship between the Log Entries and the Subsystems. It is populated by the GUI. Table 24.22 24.2.2.23 logbook_comments_subsystems table Field Description commentid Log Entry ID subsystemid Subsystem ID logbook_users table This table stores the main information of the GUI’s users. It is populated by the GUI at the moment of the users’s first login. Table 24.23 logbook_users table Field Description id User ID first_name User’s first name full_name User’s full name ALICE DAQ and ECS manual The ALICE Electronic Logbook 458 Table 24.23 24.2.2.24 logbook_users table Field Description email User’s email address group_name User’s CERN group name logbook_users_privileges table This table stores the GUI’s users privileges. It is populated by the GUI. Table 24.24 logbook_users_privileges table Field Description id User privilege ID userid User ID time_start Privilege start date/time time_end Privilege end date/time privilege revoked 24.2.2.25 Privilege set (one or more of ‘NONE’, ‘READ’, ‘WRITE’, ‘ADMIN’, ‘SUPER’) Flag indicating if the privilege is revoked logbook_users_profiles table This table stores additional information of the GUI’s users. It is populated by the GUI. Table 24.25 24.2.2.26 logbook_users_profiles table Field Description userid User ID name Profile entry name value Profile entry value logbook_filters table This table stores the definition of the search filters predefined values (see Section 24.6.3.6). Table 24.26 logbook_filters table Field Description id User filter ID userid User ID content_name Name of the GUI’s content to which the filter applies ALICE DAQ and ECS manual Database 459 Table 24.26 24.2.2.27 logbook_filters table Field Description column_qs_var Name of the GUI’s URL query string variable to which the filter’s value will be assigned name Filter’s name value Filter’s value or SQL code print_order Filter’s order of appearance in the GUI sql_flag Flag indicating if this filter is SQL based load_by_default Flag indicating if this filter should be loaded by default enabled Flag indicating if this filter is enabled DETECTOR_CODES table This table stores the definition of the different ALICE detectors. Table 24.27 24.2.2.28 DETECTOR_CODES table Field Description id Detector ID name Detector name code Detector 3-letter code isVirtual Flag indicating if the detector is virtual description Detector description TRIGGER_CLASSES table This table stores the definition of the different ALICE trigger classes. Table 24.28 24.2.2.29 TRIGGER_CLASSES table Field Description className Trigger class name description Trigger class description obsolete Flag indicating if the trigger class is obsolete logbook_config table This table stores internal eLogbook information (e.g. version number). ALICE DAQ and ECS manual The ALICE Electronic Logbook 460 Table 24.29 logbook_config table Field Description Name Configuration parameter name Value Configuration parameter value Description Configuration parameter description 24.2.3 Stored Procedures Below is a list of the eLogbook’s stored procedures. update_logbook_counters Synopsis Description CALL update_logbook_counters(run_number INT) Updates the runDuration, totalSubEvents, totalDataReadout, totalEvents, totalDataEventBuilder, totalDataRecorded, averageDataRateReadout, averageDataRateEventBuilder, averageDataRateRecorded, averageSubEventsPerSecond and averageEventsPerSecond fields of the logbook table. It also updates the statistics of the logbook_stats_HLT table. Returns No value is returned. DAQlogbookSP_updateListTriggerClasses Synopsis Description Returns CALL DAQlogbookSP_updateListTriggerClasses() Updates the list of trigger classes stored in the TRIGGER_CLASSES table, based on the distinct values of the className field of the logbook_trigger_classes table. No value is returned. 24.2.4 Events Below is a list of the eLogbook’s events. DAQlogbookEV_updateListTriggerClasses Description Executes the DAQlogbookSP_updateListTriggerClasses stored procedure every day at 01:00 h. ALICE DAQ and ECS manual Application Programming Interface 461 24.3 Application Programming Interface 24.3.1 Overview Read/write access is available via the DAQlogbook C API. A version for Tcl is also available as a shared library. 24.3.2 Environment variables The following environment variables are available to configure the behavior of the DAQlogbook API: • DAQ_DB_LOGBOOK: sets the credentials to access the DB. The format is “USERNAME:PASSWORD@HOSTNAME/DBNAME”. • WITH_INFOLOGGER: sets the logging mode. If set, logging uses the infoLogger system. If not set, log messages are sent to stdout. • DAQ_LOGBOOK_VERBOSE: sets the logging level. Possible values are: • 0: no messages • 1: error messages • 2: same as 1 + debug messages • > 2: same as 2 + all SQL queries If not set, the default value is 1. 24.3.3 Database connection functions Below is a list of functions providing basic connection functionality to the eLogbook’s database. DAQlogbook_open Synopsis #include “DAQlogbook.h” int DAQlogbook_open(const char *cx_params) Description Open a MySQL connection. Credentials should be given via the cx_params parameter in the “USERNAME:PASSWORD@HOSTNAME/DBNAME” format. If an empty string is passed the credentials are taken from the DAQ_DB_LOGBOOK environment variable. If both are empty, eLogbook access via the API is disabled and further read/write function calls are ignored. Returns Upon successful completion, this function will return a value of zero. Otherwise, the following value will be returned: -1: error while connecting to the database. ALICE DAQ and ECS manual The ALICE Electronic Logbook 462 1: cx_params and DAQ_DB_LOGBOOK are empty, eLogbook access disabled. DAQlogbook_close Synopsis #include “DAQlogbook.h” int DAQlogbook_close(void) Description Returns Close a MySQL connection and release previously used resources. This function always returns a value of zero. 24.3.4 Logging functions Below is a list of functions allowing logging configuration. DAQlogbook_verbose Synopsis #include “DAQlogbook.h” void DAQlogbook_verbose(int v) Description Returns Set the logging level of the API. Possible values are the same as described for the DAQ_LOGBOOK_VERBOSE environment variable. No value is returned. 24.3.5 eLogbook READ access functions Below is a list of functions providing READ access to the eLogbook database. DAQlogbook_datafile_getIdFromName Synopsis #include “DAQlogbook.h” int DAQlogbook_datafile_getIdFromName(const char *name, unsigned int run, int *id) Description Returns Retrieve the ID of the entry in the logbook_stats_files table corresponding to the given filename and run number. The value is stored in the id parameter. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ALICE DAQ and ECS manual Application Programming Interface 463 DAQlogbook_get_AMORE_agent_summary_img Synopsis #include “DAQlogbook.h” int DAQlogbook_get_AMORE_agent_summary_img(const char *agentname, void **summary_img, unsigned long *n_bytes) Description Retrieve the Quality Assurance summary image of a specific AMORE agent for the latest run where the agent was active. The image itself is stored in the summary_img parameter and it’s size in bytes in the n_bytes parameter. Returns Upon successful completion, this function will return a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_query_getTriggerClusters Synopsis #include “DAQlogbook.h” int DAQlogbook_query_getTriggerClusters(unsigned int run, char *partition, unsigned int *clustermask) Description Returns Build a 6-bit bitmask (LSB = cluster #1) of the active trigger clusters for the given run. The value is then stored in the clustermask parameter. Upon successful completion, this function returns a value of zero. Otherwise, -1 will be returned. DAQlogbook_query_getDetectorIdsFromTriggerClassMask Synopsis #include “DAQlogbook.h” int DAQlogbook_query_getDetectorIdsFromTriggerClassMask (unsigned int run, unsigned long long triggerClassMask, unsigned int *detectorMask, unsigned int *clustermask) Description Build a 24-bit detector IDs bitmask (LSB = detector ID zero, stored in the detectorMask parameter) corresponding to all the readout detectors of all the trigger clusters triggered by a given 50-bit trigger classes bitmask (triggerClassMask parameter). Optionally, if the clustermask parameter is not NULL, it will store at the end of the function execution a 6-bit bitmask of the triggered trigger clusters. NOTE: This function is optimized to cache the full trigger configuration for a given run, so that the database is not queried again if the last call to this function used the same run number. Calling it with run parameter equal to zero clears the cache. Returns Upon successful completion, this function returns a value of zero. Otherwise, the following value will be returned: ALICE DAQ and ECS manual The ALICE Electronic Logbook 464 -1: no trigger cluster is triggered by the given trigger classes. > 0: error code with a value equal to the line number where the error occurred. DAQlogbook_query_getDetectorIdsFromTriggerCluster Synopsis #include “DAQlogbook.h” int DAQlogbook_query_getDetectorIdsFromTriggerCluster (unsigned int run, unsigned int cluster, unsigned int *detectorMask) Description Build a 24-bit detector IDs bitmask (LSB = detector ID zero, stored in the detectorMask parameter) corresponding to the readout detectors of the given trigger cluster. NOTE: This function is optimized to cache the full trigger configuration for a given run, so that the database is not queried again if the last call to this function used the same run number. Calling it with run parameter equal to zero clears the cache. Returns Upon successful completion, this function returns a value of zero. Otherwise, the following value will be returned: -1: given trigger cluster not found. > 0: error code with a value equal to the line number where the error occurred. DAQlogbook_query_getDetectors Synopsis #include “DAQlogbook.h” int DAQlogbook_query_getDetectors(unsigned int run, unsigned int *detectormask) Description Returns Build a 24-bit detector IDs bitmask (LSB = detector ID zero, stored in the detectormask parameter) corresponding to the readout detectors participating in the given run. Upon successful completion, this function returns a value of zero. Otherwise, -1 will be returned. DAQlogbook_query_getPartition Synopsis #include “DAQlogbook.h” int DAQlogbook_query_getPartition(unsigned int run, char **partition, int *standalone) ALICE DAQ and ECS manual Application Programming Interface Description 465 Retrieve the name of the ECS partition for the given run (stored in the partition parameter). In case of a standalone run, the value will be equal to the detector name. Additionally, at the end of the execution of the function, the standalone parameter will indicate if it’s a standalone or a global run. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_query_getDetectorsInTriggerClasses Synopsis #include “DAQlogbook.h” int DAQlogbook_query_getDetectorsInTriggerClasses(unsigned int run, char **table) Description Build a 50x24 boolean table (table parameter) indicating, for each detector ID, which trigger classes will trigger it in the given run. The table indexes are of the form triggerClassId * 24 + detectorID. As an example, if detector ID equal to 5 is triggered by trigger class equal to 20: table[20 * 24 + 5] = 1 Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_query_getTriggerClassNames Synopsis #include “DAQlogbook.h” int DAQlogbook_query_getTriggerClassNames(unsigned int run, char ***table) Description Build a 50-entries table (table parameter) indicating, for each possible trigger class ID, the corresponding trigger class name in the given run. The table indexes are the trigger class IDs (from 0 to 49) and the values are the trigger class names. A value of NULL means that the corresponding trigger class ID is undefined for the given run. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ALICE DAQ and ECS manual The ALICE Electronic Logbook 466 DAQlogbook_query_getTriggerClassIdFromName Synopsis #include “DAQlogbook.h” int DAQlogbook_query_getTriggerClassIdFromName(unsigned int run, const char *className, unsigned char *classId) Description Retrieve, for a given trigger class name, the corresponding trigger class ID (stored in the classId parameter) in the given run. If more than 1 match is found, the first one is retrieved. Returns Upon successful completion, this function returns a value of zero. Otherwise, the following value will be returned: -1: given trigger class name undefined for this run. -10: more than 1 match found. > 0: error code with a value equal to the line number where the error occurred. DAQlogbook_query_getDetectorsInTriggerInput Synopsis #include “DAQlogbook.h” int DAQlogbook_query_getDetectorsInTriggerInput(unsigned int run, char **table) Description Returns Build a 24-entries boolean table (table parameter) indicating, for each detector ID, if the corresponding detector is a trigger detector in the run. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_query_getL2aPerTriggerClass Synopsis #include “DAQlogbook.h” int DAQlogbook_query_getL2aPerTriggerClass(unsigned int run, int **L2a) Description Build a 50-entries table (L2a parameter) indicating, for each trigger class ID, the number of L2 accept decisions in the given run. The table indexes are the trigger class IDs (from 0 to 49) and the values are the number of L2 accept decisions. A value of -1 means that the corresponding trigger class ID is undefined for the run. ALICE DAQ and ECS manual Application Programming Interface Returns 467 Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_query_getHLTDecisionsPerTriggerClass Synopsis #include “DAQlogbook.h” int DAQlogbook_query_getHLTDecisionsPerTriggerClass(unsigned int run, int **hltAccepts, int **hltPartialAccepts, int **hltOnly, int **hltRejects) Description Build four 50-entries tables (hltAccepts, hltPartialAccepts, hltOnly and hltRejects parameters) indicating, for each trigger class ID, the number of different HLT decisions in the given run. The table indexes are the trigger class IDs (from 0 to 49) and the values are the number of HLT decisions. A value of -1 means that the corresponding trigger class ID is undefined for the run. The tables will store, at the end of the execution of the function, the following information: Returns • hltAccepts: number of full event accept decisions per trigger class ID. • hltPartialAccepts: number of partial readout decisions per trigger class ID. • hltOnly: number of “hltOnly” decisions per trigger class ID. • hltRejects: number of full event reject decisions per trigger class ID. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_get_DAQ_active_components Synopsis #include “DAQlogbook.h” int DAQlogbook_get_DAQ_active_components(unsigned int run,const char* type,void **mask_of_ids,int *n_bytes) Description Retrieve the list of active DAQ components of a given type in the given run. Possible values for the type parameter are: DDL, LDC, GDC. At the end of the execution of the function, the mask_of_ids parameter will store a bitmask of the active components’IDs as defined in the DATE_CONFIG database. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ALICE DAQ and ECS manual The ALICE Electronic Logbook 468 DAQlogbook_query_runType Synopsis #include “DAQlogbook.h” int DAQlogbook_query_runType(unsigned int run, char **runType) Description Retrieve, for a given run, the corresponding ECS run type (stored in the runType parameter). Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_getLastRun Synopsis #include “DAQlogbook.h” int DAQlogbook_getLastRun(const char *partition, unsigned int *run, unsigned char *active) Description Returns Retrieve, for a given ECS partition or detector (partition parameter), the number of the last run started (stored in the run parameter). The active parameter will store, at the end of the execution of the function, if the retrieved run is still active or not. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_getActiveRuns Synopsis #include “DAQlogbook.h” int DAQlogbook_getActiveRuns(unsigned int **runs, unsigned int *size) Description Returns Build an array with the number of the runs currently active (stored in runs parameter). The size parameter will store the number of runs found. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ALICE DAQ and ECS manual Application Programming Interface 469 DAQlogbook_getAmoreAgents Synopsis #include “DAQlogbook.h” int DAQlogbook_getAmoreAgents(unsigned int run, char ***agents, unsigned int *size) Description Build a list of active AMORE agents (stored in agents parameter) for a given run. The size parameter will store the number of agents found. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. 24.3.6 eLogbook WRITE functions Below is a list of functions providing WRITE access to the eLogbook database. DAQlogbook_update_newRun Synopsis #include “DAQlogbook.h” int DAQlogbook_update_newRun(unsigned int run, const char *partition, unsigned int ndetectors, const char **detectors, const char *runtype, unsigned int calibration, int checkOldRuns) Description This function should be called when a new run is started. It creates a new entry in the logbook table and initializes several fields of this table: run, time_created, partition, detector, run_type, calibration, numberOfDetectors and detectorMask. Additionally, it also creates one entry in the logbook_detectors table for each detector participating in the run and initializes one entry in the logbook_daq_active_components table with the run number. If the checkOldRuns flag is set, the function will first close all active runs having as participating detector(s) any of the ones participating in this run. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_update_detectorMask Synopsis #include “DAQlogbook.h” int DAQlogbook_update_detectorMask(unsigned int run) ALICE DAQ and ECS manual The ALICE Electronic Logbook 470 Description Update the detectorMask and numberOfDetectors fields of the logbook table, based on the content of the logbook_detectors table. It is called automatically by the DAQlogbook_update_newRun function. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_update_EndRun Synopsis #include “DAQlogbook.h” int DAQlogbook_update_EndRun (unsigned int run, int ecs_success, int daq_success, const char * const eor_reason) Description Returns This function should be called when a run finishes. It populates the ecs_success, daq_success and eor_reason fields of the logbook table. It also: • populates the logbook_shuttle table with the UNPROCESSED value for each detector participating in the run. • changes all files (related to the run) entries from the logbook_stats_files table still in Writing to Closed. • updates the dataMigration flag from the logbook table. • updates the different counters by executing the update_logbook_counters stored procedure. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_update_startTime Synopsis #include “DAQlogbook.h” int DAQlogbook_update_startTime(unsigned int run) Description Returns Update the DAQ_time_start field of the logbook table with the current timestamp. It also adds a Log Entry of class PROCESS, marking the start of data taking. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ALICE DAQ and ECS manual Application Programming Interface 471 DAQlogbook_update_stopTime Synopsis #include “DAQlogbook.h” int DAQlogbook_update_stopTime(unsigned int run) Description Update the DAQ_time_end and the runDuration fields of the logbook table. It also adds a Log Entry of class PROCESS, marking the end of data taking. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_update_DAQrunning Synopsis #include “DAQlogbook.h” int DAQlogbook_update_DAQrunning(unsigned int run) Description Update the time_update field of the logbook table with the current timestamp. It should be called periodically during data taking, serving as heartbeat of the run and allowing the detection of crashed or not properly terminated runs. It will also update the different counters by executing the update_logbook_counters stored procedure. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_update_DAQnode_config Synopsis #include “DAQlogbook.h” int DAQlogbook_update_DAQnode_config(unsigned int run,int LDCs,int GDCs, int LDCrecordingMode, int GDCrecordingMode) Description Update the logbook table with the DAQ nodes configuration, populating the following fields: numberOfLDCs, numberOfGDCs, LDClocalRecording, GDClocalRecording, GDCmStreamRecording and eventBuilding. It will also initialize the dataMigrated field based on the active recording configuration. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. ALICE DAQ and ECS manual The ALICE Electronic Logbook 472 DAQlogbook_update_DAQnode_statsGDC Synopsis #include “DAQlogbook.h” int DAQlogbook_update_DAQnode_statsGDC(unsigned int run, char *GDC, unsigned long long eventCount, unsigned long long eventCountPhysics, unsigned long long eventCountCalibration, unsigned long long bytesRecorded, unsigned long long bytesRecordedPhysics, unsigned long long bytesRecordedCalibration) Description Update the counters in the logbook_stats_GDC table. When called for the first time for the given run and GDC pair, it will create a new entry in the table. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_insert_DAQnode_statsLDC Synopsis #include “DAQlogbook.h” int DAQlogbook_insert_DAQnode_statsLDC(unsigned int run, char *LDC, unsigned int detectorId, unsigned long long eventCount, unsigned long long eventCountPhysics, unsigned long long eventCountCalibration, unsigned long long bytesInjected, unsigned long long bytesInjectedPhysics, unsigned long long bytesInjectedCalibration) Description Returns Create a new entry in the logbook_stats_LDC table for the given run and LDC pair, initializing the different counters to the specified values. Subsequent changes to this entry should be done via the DAQlogbook_update_DAQnode_statsLDC function. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_update_DAQnode_statsLDC Synopsis #include “DAQlogbook.h” int DAQlogbook_update_DAQnode_statsLDC(unsigned int run, char *LDC, unsigned long long eventCount, unsigned long long eventCountPhysics, unsigned long long eventCountCalibration, unsigned long long bytesInjected, unsigned long long bytesInjectedPhysics, unsigned long long bytesInjectedCalibration) ALICE DAQ and ECS manual Application Programming Interface Description Returns 473 Update the counters in the logbook_stats_LDC table for the given run and LDC pair previously created via the DAQlogbook_insert_DAQnode_statsLDC function. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_update_DAQnode_statsLDC_trgCluster Synopsis #include “DAQlogbook.h” int DAQlogbook_update_DAQnode_statsLDC_trgCluster(unsigned int run, char *LDC, unsigned char clusterId, unsigned long long eventCount, unsigned long long bytesInjected) Description Returns Updates the counters in the logbook_stats_LDC_trgCluster table. When called the first time for a given run, LDC and trigger cluster ID tuple, it will create a new entry in the table. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_update_DAQ_active_components Synopsis #include “DAQlogbook.h” int DAQlogbook_update_DAQ_active_components(unsigned int run, const char* type, void *mask_of_ids, int n_bytes) Description Update one of the fields of the logbook_daq_active_components table. The type parameter defines which field is updated and can be one of the following: LDC, GDC, DDL. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_update_ECS_iteration Synopsis #include “DAQlogbook.h” int DAQlogbook_update_ECS_iteration(unsigned int run, unsigned int currentIteration, unsigned int totalIterations) ALICE DAQ and ECS manual The ALICE Electronic Logbook 474 Description Returns Update the ecs_iteration_current and ecs_iteration_total fields of the logbook table. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_datafile_new Synopsis #include “DAQlogbook.h” int DAQlogbook_datafile_new(unsigned int run, const char *filePath, unsigned int local) Description Create a new entry in the logbook_stats_files table for the given run, populating the run, fileName, location, local, rolename, hostname, pid and time_write_begin fields. It should be called when a new data file is created. The returned value should be used as an ID for further function calls related with data files statistics or status. Returns Upon successful completion, this function returns the ID of the created table entry. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_datafile_update_size Synopsis #include “DAQlogbook.h” int DAQlogbook_datafile_update_size(int id, unsigned long long size, unsigned long long events) Description Returns Update the size and eventCount fields of the logbook_stats_files table for a given data file ID (as returned by DAQlogbook_datafile_new). Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_datafile_update_location Synopsis #include “DAQlogbook.h” int DAQlogbook_datafile_update_location(int id, const char *new_location) Description Update the location field of the logbook_stats_files table for a given data file ID (as returned by DAQlogbook_datafile_new). ALICE DAQ and ECS manual Application Programming Interface Returns 475 Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_datafile_setStatus_closed Synopsis #include “DAQlogbook.h” int DAQlogbook_datafile_setStatus_closed(int id) Description Update the following fields of the logbook_stats_files table: • status: Closed. • time_write_end: current timestamp. It should be called when a data file with the given ID (as returned by DAQlogbook_datafile_new) is closed. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_datafile_setStatus_waitingMigrationRequest Synopsis #include “DAQlogbook.h” int DAQlogbook_datafile_setStatus_waitingMigrationRequest(int id) Description Update the following fields of the logbook_stats_files table: • status: Waiting migration request. • time_write_end: current timestamp. It should be called when a data file with the given ID (as returned by DAQlogbook_datafile_new) is ready for migration. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_datafile_setStatus_migrationRequested Synopsis #include “DAQlogbook.h” int DAQlogbook_datafile_setStatus_migrationRequested(int id) Description Update the following fields of the logbook_stats_files table: ALICE DAQ and ECS manual The ALICE Electronic Logbook 476 • status: Migration request. • time_migrate_request: current timestamp. It should be called when a data file with the given ID (as returned by DAQlogbook_datafile_new) is marked for migration. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_datafile_setStatus_migrating Synopsis #include “DAQlogbook.h” int DAQlogbook_datafile_setStatus_migrating(int id) Description Update the following fields of the logbook_stats_files table: • status: Migrating. • time_migrate_begin: current timestamp. It should be called when a data file with the given ID (as returned by DAQlogbook_datafile_new) starts to be migrated. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_datafile_setStatus_migrated Synopsis #include “DAQlogbook.h” int DAQlogbook_datafile_setStatus_migrated(int id) Description Update the following fields of the logbook_stats_files table: • status: Migrated. • time_migrate_end: current timestamp. It should be called when a data file with the given ID (as returned by DAQlogbook_datafile_new) finishes being migrated. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_datafile_updateRunStatus Synopsis #include “DAQlogbook.h” ALICE DAQ and ECS manual Application Programming Interface 477 int DAQlogbook_datafile_updateRunStatus(unsigned int run) Description Returns Update the dataMigrated field of the logbook table for the given run. The new value is based on the status of the run’s data files stored in the logbook_stats_files table. Only non-local files are taken into account. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_update_runQuality Synopsis #include “DAQlogbook.h” int DAQlogbook_update_runQuality(unsigned int run, const char *runQuality) Description Returns Update the value of the runQuality field of the logbook table for the given run. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_update_cluster Synopsis #include “DAQlogbook.h” int DAQlogbook_update_cluster (unsigned int run, unsigned int cluster, unsigned int detectorMask, const char *partition, unsigned int inputDetectorMask, unsigned long long triggerClassMask) Description Returns Create a new entry in the logbook_trigger_clusters table, thus declaring a new trigger cluster for the given run. The detectorMask and inputDetectorMask parameters should be a 24-bit detector ID bitmask of the detectors participating in the given cluster as readout detectors and as trigger detectors, respectivly. The triggerClassMask parameter should be a 50-bit trigger classes ID bitmask of the trigger classes defined for the given cluster. Upon successful completion, this function returns a value of zero. Otherwise, -1 will be returned. DAQlogbook_update_triggerConfig Synopsis #include “DAQlogbook.h” ALICE DAQ and ECS manual The ALICE Electronic Logbook 478 int DAQlogbook_update_triggerConfig (unsigned int run, const char * const configurationFile, const char * const alignmentFile) Description Create a new entry in the logbook_trigger_config table, thus registering the trigger configuration and the alignment settings for the given run. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_update_triggerClassName Synopsis #include “DAQlogbook.h” int DAQlogbook_update_triggerClassName(unsigned int run, unsigned char classId, const char *className, unsigned int classGroupId, float classGroupTime) Description Returns Create a new entry in the logbook_trigger_classes table, thus registering a new trigger class for the given run. The classId parameter should be the corresponding bit number of the trigger class (as defined in the 50-bit trigger classes bitmask) and the className parameter should be the full trigger class name as defined by the Trigger Coordination. Upon successful completion, this function returns a value of zero. Otherwise, -1 will be returned. DAQlogbook_update_triggerClassCounter Synopsis #include “DAQlogbook.h” int DAQlogbook_update_triggerClassCounter(unsigned int run, unsigned char classId, unsigned long long L0bCount, unsigned long long L0aCount, unsigned long L1bCount, unsigned long L1aCount, unsigned long L2bCount, unsigned long L2aCount, float ctpDuration) Description Returns Update the L0b, L0a, L1b, L1a, L2b, L2a and ctpDuration fields of the logbook_trigger_classes table for the given run and trigger class ID pair. It should be called only after the corresponding trigger class ID has been registered using the DAQlogbook_update_triggerClassName function. Upon successful completion, this function returns a value of zero. Otherwise, -1 will be returned. ALICE DAQ and ECS manual Application Programming Interface 479 DAQlogbook_insert_triggerInput Synopsis #include “DAQlogbook.h” int DAQlogbook_insert_triggerInput(unsigned int run, unsigned int inputId, const char *inputName, unsigned int inputLevel) Description Create a new entry in the logbook_trigger_inputs table, thus registering a new trigger input for the given run41. Returns Upon successful completion, this function returns a value of zero. Otherwise, -1 will be returned. DAQlogbook_update_triggerInputCounter Synopsis #include “DAQlogbook.h” int DAQlogbook_update_triggerInputCounter(unsigned int run, unsigned int inputId, unsigned int inputLevel, unsigned long long inputCount) Description Returns Update the inputCount field of the logbook_trigger_inputs table for the given run and trigger input (represented by the inputId and inputLevel pair). It should be called only after the corresponding trigger input has been registered using the DAQlogbook_insert_triggerInput function. Upon successful completion, this function returns a value of zero. Otherwise, -1 will be returned. DAQlogbook_update_triggerDetectorCounter Synopsis #include “DAQlogbook.h” int DAQlogbook_update_triggerDetectorCounter(unsigned int run, const char *detector, unsigned long L2aCount) Description Returns Update the L2a field of the logbook_detectors table for the given run and detector pair. Upon successful completion, this function returns a value of zero. Otherwise, -1 will be returned. DAQlogbook_update_triggerGlobalCounter Synopsis #include “DAQlogbook.h” ALICE DAQ and ECS manual The ALICE Electronic Logbook 480 int DAQlogbook_update_triggerGlobalCounter(unsigned int run, unsigned long L2aCount, float ctpDuration) Description Returns Update the L2a and ctpDuration fields of the logbook table for the given run. Upon successful completion, this function returns a value of zero. Otherwise, -1 will be returned. DAQlogbook_update_setCTPbitInDetectorMask Synopsis #include “DAQlogbook.h” int DAQlogbook_update_setCTPbitInDetectorMask(unsigned int run) Description Set to 1, for the given run, the bit in the detectorMak field of the logbook table corresponding to the CTP detector ID (represented by the 3-letter code TRI). Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_insert_AMORE_agent Synopsis #include “DAQlogbook.h” int DAQlogbook_insert_AMORE_agent(unsigned int run, const char *detector, const char *agentname, const char *agentversion, const char *params) Description Returns Create a new entry in the logbook_AMORE_agents table, thus registering an AMORE agent as being active for the given run. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_update_AMORE_agent Synopsis #include “DAQlogbook.h” int DAQlogbook_update_AMORE_agent(unsigned int run, const char *agentname, unsigned int mo_published, unsigned int mo_v_published, unsigned long long bytes_published, float aver_cpu_time, float aver_real_time) ALICE DAQ and ECS manual Application Programming Interface Description Returns 481 Update the statistics of the given AMORE agent for the given run. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_update_AMORE_agent_summary_img Synopsis #include “DAQlogbook.h” int DAQlogbook_update_AMORE_agent_summary_img(unsigned int run, const char *agentname, char *summary_img, unsigned long n_bytes) Description Returns Update the Quality Assurance summary image of the given AMORE agent for the given run. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_update_HLTmode Synopsis #include “DAQlogbook.h” int DAQlogbook_update_HLTmode(unsigned int run, const char *HLTmode) Description Returns Update the HLTmode field of the logbook table for the given run. It also sets to 1 the bit in the detectorMak field of the logbook table corresponding to the HLT detector ID (represented by the 3-letter code HLT) if the given HLT mode is set to B or C. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_update_local_HLT_stats Synopsis #include “DAQlogbook.h” int DAQlogbook_update_local_HLT_stats(unsigned int run, char *LDC, unsigned long hltAccepts, unsigned long hltRejects, unsigned long long hltBytesRejected, unsigned long long *hltBytesRejectedPerTriggerClass) ALICE DAQ and ECS manual The ALICE Electronic Logbook 482 Description Update the HLT statistics per detector LDC (stored in the logbook_stats_LDC table) and the number of bytes rejected - following an HLT decision - per trigger class (stored in the logbook_trigger_classes table) for the given run. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_insert_HLT_stats Synopsis #include “DAQlogbook.h” int DAQlogbook_insert_HLT_stats(unsigned int run,char *LDC) Description Returns Create a new entry in the logbook_stats_HLT_LDC table, thus registering an HLT LDC for the given run. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_update_HLT_stats Synopsis #include “DAQlogbook.h” int DAQlogbook_update_HLT_stats(unsigned int run, char *LDC, unsigned long hltAccepts, unsigned long hltPartialAccepts, unsigned long hltOnly, unsigned long hltRejects, unsigned long *hltAcceptsPerTriggerClass, unsigned long *hltPartialAcceptsPerTriggerClass, unsigned long *hltOnlyPerTriggerClass, unsigned long *hltRejectsPerTriggerClass) Description Returns Update the HLT decisions per HLT LDC (stored in the logbook_stats_HLT_LDC table) and per trigger class (stored in the logbook_trigger_classes table) for the given run. Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_add_comment Synopsis #include “DAQlogbook.h” int DAQlogbook_add_comment(unsigned int run, const char *title, const char * const comment,...) ALICE DAQ and ECS manual Logbook Daemon Description 483 Insert a new Log Entry of class PROCESS on the logbook_comments table. The only inserted fields are run, class, title and comment. The run parameter is optional. Returns Upon successful completion, this function returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. 24.4 Logbook Daemon The logbookDaemon is a daemon process that extracts data concerning the ALICE magnets and the LHC configuration published by the DCS via DIM and inserts it in the DB at start of run. The subscribed DIM services and the corresponding DB fields are listed in Table 24.30. Table 24.30 logbookDaemon DIM services and DB fields relationship. DIM Service DB Field DCS_GRP_L3MAGNET_CURRENT L3_magnetCurrent field of the logbook table DCS_GRP_DIPOLE_CURRENT Dipole_magnetCurrent field of the logbook table DCS_GRP_L3MAGNET_POLARITY No field, this value is used to complete the value provided by the DCS_GRP_L3MAGNET_CURRENT service DCS_GRP_DIPOLE_POLARITY No field, this value is used to complete the value provided by the DCS_GRP_DIPOLE_CURRENT service ALICEDAQ_LHCMachineMode beamType field of the logbook table ALICEDAQ_LHCBeamEnergy beamEnergy field of the logbook table ALICEDAQ_LHCFillNumber LHCFillNumber field of the logbook table ALICEDAQ_LHCTotBunchInteract LHCTotalInteractingBunches field of the logbook table ALICEDAQ_LHCTotBunchNotInteractBea m1 LHCTotalNonInteractingBunch esBeam1 field of the logbook table ALICEDAQ_LHCTotBunchNotInteractBea m2 LHCTotalNonInteractingBunch esBeam2 field of the logbook table Additionally, it also provides a publish mechanism (via DIM) to notify Start of Run and End of Run events of partitions and detectors, thus avoiding the need from the ALICE DAQ and ECS manual The ALICE Electronic Logbook 484 different online subsystems processes to constantly poll the DB. The available DIM services are listed in Table 24.31. Table 24.31 logbookDaemon run events DIM services DIM Service Description /LOGBOOK/SUBSCRIBE/ECS_SOR_${PART} ECS Start of Run per partition (replacing ${PART} by the partition name) /LOGBOOK/SUBSCRIBE/ECS_SOR_${DET} ECS Start of Run per detector (replacing ${DET} by the 3-letter detector code) /LOGBOOK/SUBSCRIBE/ECS_EOR_${PART} ECS End of Run per partition (replacing ${PART} by the partition name) /LOGBOOK/SUBSCRIBE/ECS_EOR_${DET} ECS End of Run per detector (replacing ${DET} by the 3-letter detector code) /LOGBOOK/SUBSCRIBE/DAQ_SOR_${PART} DAQ Start of Run per partition (replacing ${PART} by the partition name) /LOGBOOK/SUBSCRIBE/DAQ_SOR_${DET} DAQ Start of Run per detector (replacing ${DET} by the 3-letter detector code) /LOGBOOK/SUBSCRIBE/DAQ_EOR_${PART} DAQ End of Run per partition (replacing ${PART} by the partition name) /LOGBOOK/SUBSCRIBE/DAQ_EOR_${DET} DAQ Start of Run per detector (replacing ${DET} by the 3-letter detector code) 24.5 Tools Below is a list of the available command-line tools providing access to the eLogbook’s repository. insert_file Synopsis Description insert_file HOSTNAME USERNAME PASSWORD DATABASE COMMENTID FILEID FILENAME SIZE TITLE CONTENT_TYPE Insert a file in the logbook_files table attached to an already existing Log Entry. Parameters: ALICE DAQ and ECS manual Tools 485 Returns • HOSTNAME: MySQL server hostname. • USERNAME: MySQL username. • PASSWORD: MySQL password. • DATABASE: MySQL database name. • COMMENTID: ID of the Log Entry to which the file should be attached to (corresponding to the commentid field of the logbook_files table. • FILEID: ID of the file (corresponding to the fileid field of the logbook_files table. • FILENAME: full filename (including path). • SIZE: file size in bytes. • TITLE: file title. • CONTENT_TYPE: file Content Type. Upon successful completion, this command returns a value of zero. Otherwise, 1 will be returned. logbookCloseRun Synopsis Description logbookCloseRun RUNNUMBER Close a run not properly terminated (“zombie” run), calling the DAQlogbook_update_EndRun function of the C API. Parameters: • Returns RUNNUMBER: Run number. Upon successful completion, this command returns a value of zero. Otherwise, 1 will be returned. logbookGetTriggerInfo Synopsis Description logbookGetTriggerInfo RUNNUMBER Fetche and print the trigger information related with the given run. Parameters: • Returns RUNNUMBER: Run number. Upon successful completion, this command returns a value of zero. Otherwise, 1 will be returned. ALICE DAQ and ECS manual The ALICE Electronic Logbook 486 logbookShellAPI Synopsis Description logbookGetTriggerInfo -c COMMAND [-r RUN] [-v] [-h] Fetche and print information related with a specific run. Parameters: • Returns -c COMMAND: defines which information should be printed. COMMAND can be one of: • getRuntype: prints the ECS run type. • getActiveGDCs: prints the IDs of the active GDCs. • getActiveLDCs: prints the IDs of the active LDCs. • getActiveDDLs: prints the IDs of the active DDLs. • -r RUN: Run number. If not provided, the run number is taken from the DATE_RUN_NUMBER environment variable. • -v: execute in verbose mode. • -h: print help. Upon successful completion, this command returns a value of zero. Otherwise, an error code with a value equal to the line number where the error occurred will be returned. DAQlogbook_dim_gateway Synopsis Description DAQlogbook_dim_gateway -s DIM_SERVICE -m DIM_MODE [-o] Implement an interface between DIM and the eLogbook. Received messages are inserted as Log Entries. Parameters: • -s DIM_SERVICE: DIM service name. • -m DIM_MODE: defines the operation mode. DIM_MODE can be one of: • Returns • subscribe: subscribe to external server service and insert a Log Entry at each service update. • command: create a DIM command service and insert a Log Entry at each remote DIM client command execution. -o: don’t run as a daemon (by default, the command will run as a daemon). Upon successful completion, this command returns a value of zero. Otherwise, -1 will be returned. ALICE DAQ and ECS manual Graphical User Interface 487 24.6 Graphical User Interface 24.6.1 Overview The eLogbook’s Web-based GUI (available at https://cern.ch/alice-logbook) was developed using modern Web technologies, including PHP5, Javascript and Cascading Style Sheets (CSS). It is hosted on an Apache web server and can be accessed from the experimental area (inside the experiment's technical network), the CERN General Purpose Network (GPN) and the internet. 24.6.2 Authentication and Authorization Authentication is implemented via the CERN Authentication central service, providing Single Sign On (SSO) and removing the effort of authenticating the users from the eLogbook software. This way, when a user tries to access the GUI, he is redirected to the CERN Login page where it has to provide his credentials. If successful, he is then redirected back to the GUI. Authorization is implemented in the GUI with 5 different levels of privileges: • NONE: no access the GUI • READ: read-only access • WRITE: read/write access (e.g. can write Log Entries) • ADMIN: same as previous + can grant WRITE privileges • SUPER: same as previous + can grant ADMIN privileges At the first login, the user’s details are stored in the logbook_users table. Additionally, READ privilege is given by default. 24.6.3 Features Below is a brief description of the main features of the eLogbook’s GUI. 24.6.3.1 Run Statistics The Run Statistics section provides users access to both data-taking statistics and conditions, ranging from event and data rates to trigger configurations and LHC parameters. It is presented in a tabular view, with different subsections grouped in individual tabs. Given the number of available fields, some tabs allow users to select which fields should be displayed. Additionally, there’s an Overview tab which allows users to aggregate some of the data-taking statistics by different criteria (such as number of detectors or ECS partition) or as a function of time. ALICE DAQ and ECS manual The ALICE Electronic Logbook 488 24.6.3.2 Run Details The Run Details section provides users access to all the available information concerning a specific run, including infoLogger messages and AMORE histograms. 24.6.3.3 Log Entries The Log Entries section allows users to read or create reports related to the ALICE operations. The inserted Log Entries can have attached files, with thumbnails being created for image files. Several view modes are available, ranging from “1 line per Log Entry” compressed views to expanded and full views. These reports can belong to zero, one or several logical groups denominated Subsystems. Users can also reply to existing Log Entries, thus creating a thread. To be able to insert new Log Entries, a user needs to have at least the WRITE privilege. 24.6.3.4 Announcements Announcements are special Log Entries which should be used to broadcast short messages of general interest to the ALICE Collaboration. Although appearing as a normal Log Entry in the Log Entries section, an announcement is also displayed in the Big Screen View page and in the ALICE Live public website. When inserting a new announcement, users must set a validity timestamp, thus defining until when should the messages be displayed. As for normal Log Entries, a user needs to have at least the WRITE privilege to create new announcements. 24.6.3.5 Automatic Email Notification The eLogbook allows an automatic email notification to be sent every time a new Log Entry is inserted. There are 2 possible configurations: 24.6.3.6 • Global: an email is sent for every inserted Log Entry. • Per Subsystem: an email is sent for every inserted Log Entry belonging to a given Subsystem. The email address is defined in the email field of the logbook_subsystems table, where additional configuration parameters can also be defined. Search Filters One of the main goals of the eLogbook is to allow the members of the ALICE Collaboration to search for runs that match their criteria. To accomplish that, a filtering mechanism has been implemented in the Run Statistics section, allowing users to set a search filter for each available field. Filters corresponding to fields displayed on different tabs can be combined, although they can only be set or modified when in the corresponding tab. Some filters have predefined values available (defined in the logbook_filters table), thus allowing easy access to common queries. ALICE DAQ and ECS manual Graphical User Interface 489 The search filters are also available in the Log Entries section, although they cannot be combined with the ones from the Run Statistics section. 24.6.3.7 Export Run Statistics The eLogbook allows users to export the information displayed in the Run Statistics section in 3 different formats: • TXT: text format, with 1 line corresponding to 1 run and values separated by semicolons. • XML: XML format, with the root element depending on the exported tab and each element corresponding to 1 run. • EXCEL: spreadsheet format, with 1 row corresponding to 1 run. Additionally, users can choose different export options, such as include/exclude headers (for easier parsing) or exporting only the run numbers. ALICE DAQ and ECS manual 490 The ALICE Electronic Logbook ALICE DAQ and ECS manual LHC machine monitoring 25 This chapter describes the tool developed to read information about the beams delivered by the LHC machine and publishing them on the ALICE DIM (DISTRIBUTED INFORMATION MANAGEMENT) server to be stored into the electronic logbook at the start of each run. The LHC values are published by means of the DIP (DATA INTERCHANGE PROTOCOL) system, which allows relatively small amounts of soft real-time data to be exchanged between very loosely coupled heterogeneous systems. A Java application has been developed to perform an off-line cross-check between the values stored into the ALICE logbook by means of the DIP/DIM process and the ones stored into the LHC Logging Database. ALICE DAQ and ECS manual 25.1 DATA INTERCHANGE PROTOCOL (DIP) . . . . . . . . . . . . . 492 25.2 LHC beam info: DIP client/DIM server . . . . . . . . . . . . 496 25.3 LHC beam info: off-line cross-check . . . . . . . . . . . . . . 498 LHC machine monitoring 492 25.1 DATA INTERCHANGE PROTOCOL (DIP) 25.1.1 The DIP architecture DIP is an information distribution service, and as such it may be profitably compared to subscribing to a newspaper or magazine (Figure 25.1). Figure 25.1 Information distribution service of a newspaper or magazine If a person wants to receive a magazine on a regular basis, he may subscribe to it by giving its name to the supplier. Whenever a new edition of that magazine is published, the person will receive a copy of it. If the person did not receive an edition when he was expecting one, he will contact the supplier who will give him one (ideally). The person is of course not restricted to subscribing to one magazine only or just to magazines from a single publisher. The person needs only to provide the supplier the names of the publications he is interested in (he does not need to know who or where the publisher is) and he will receive new editions of the requested publication as they become available. DIP is essentially playing the role of the Supplier in the above scenario. There is one notable difference between DIP and the magazine scenario: while a magazine is published on a periodic basis, this is not necessarily the case with DIP publications which may contain event based data which is updated as and when the event(s) occur. Important components in the DIP architecture are publishers, subscribers and publications as shown in the diagram of Figure 25.2. ALICE DAQ and ECS manual DATA INTERCHANGE PROTOCOL (DIP) Figure 25.2 493 DIP architecture. In the above diagram we see that DIP servers and clients act as the Publishers and Subscribers. The arrows show the flow of information between the producer (Data source) and Subscriber. The dark (green) arrows indicate that some action is required by the writer of the Publisher or Subscriber in order to make the information flowing. The light-coloured arrows are used where the flow of information between components is handled by DIP. A description of the components identified in the above diagram follows: • Publisher: a producer of DIP data. A Publisher is responsible for the definition of the structure, the content and the provision of the DIP data to its Subscribers. • Subscriber: a client of DIP data. • Data source: this represents the source of the data that is to be sent via DIP. It may be internal or external to the DIP server. The DIP server is responsible for accessing the data source and writing it out to DIP through the Publication when it is needed (e.g. when the value obtained from the data source has changed). • Publication: is a named object that represents an atomic piece of data published in DIP. The writer of the server must write the code that obtains the data from the Data Source and writes it into the publication object. A client subscribes to the publication by providing DIP with the publications name. • DIP: provides the mechanism by which data is passed from the Publisher to the Subscriber (via the publication object), running on a DIP NAME SERVER (DNS). Moreover, the role of the DNS is to maintain the list of Publications available, and connect the Subscribers to the Publishers. • Subscription: an object given to the client by DIP when that client subscribes to a publication. With this object a client may request the most recently published value of the publication or unsubscribe from a publication the client had previously subscribed to. • SubscriptionListener: would be the equivalent of a magazine reader in our analogy. The SubscriptionListener acts as a wrapper, containing several callbacks. The most important of which handles data from those publications subscribed to when it arrives. The dark (green) arrow going out from the SubscriptionListener indicates that the implementer of the client must provide some code in the SubscriptionListener to do something with the data when it arrives at the Listener (i.e. display it on a console). The DIP communication is based within LHC TECHNICAL NETWORK (TN), as shown in the Figure 25.3, which shows the DIP organization deployed in 2008, with a ALICE DAQ and ECS manual LHC machine monitoring 494 central DNS in the TN and various Publishers and Subscribers communicating together across domains. This DNS is maintained by the IT/CO group. Figure 25.3 Cross–domain DIP communication on the TN. There are some important characteristics about DIP one should be aware of: • the Publisher and its Subscribers don’t know each other explicitly. The DIP protocol, actually the DNS, connects the Subscribers with their Publishers of interest. A Subscriber knows at any time the status of its connections with the Publishers; • a Publication is a structured container of atomic data. It is a consistent item and the Subscribers cannot subscribe only to a subset of its content; • there is no filtering mechanism provided by DIP itself, i.e. once subscribed to a Publication, the Subscriber will receive all its updates; • the Publisher has full control over data content, quality and timestamp; • there is a notion of “data contract” between the Publishers and their Subscribers; a Publisher responsible for a set of Publications is not allowed to change online the structure of its Publications, once they have been made available for potential Subscribers; • there are no particular security mechanisms implemented in the DIP protocol; for example, a Subscriber has no means to authenticate the source of the data it receives; similarly, the DIP protocol does not offer a Publisher the possibility to restrict access to a set of authenticated Subscribers; another example, the Publications’ names are not nominative; in other words, when a Publisher stops, the Publications’ names it was using are from then on freely available to other Publishers; • there is no feedback to a Publisher that the data published actually reached its Subscribers (also known as “one-way” communication); hence there is no retransmission in case of a data delivery issue. ALICE DAQ and ECS manual DATA INTERCHANGE PROTOCOL (DIP) 495 25.1.2 Setting up development environment The distributions and a tutorial of DIP can be obtained from IT/CO’s DIP support web page (http://wikis.web.cern.ch/wikis/display/EN/DIP+and+DIM). Two versions are available, DIP for Java and DIP for C++ (as zip files). Both these distributions run on the Windows XP/2000 and Linux platforms. For C++ users: • Under Linux o Be sure to modify your make file to include the following directories, /dip/include/dip and /dip/include/dim, in your include search path. o Set the environment variable LD_LIBRARY_PATH to include the directory /dip For Java users: • Remember to include \dip\dim.jar and \dip\dip.jar in the class path when you compile and run your DIP applications. Additionally, ensure that the shared libraries (libdim.so and libjdim.so for Linux) are reachable. Before starting a DIP based application, be sure to set the environment variable DIM_DNS_NODE to vodip01.cern.ch and DIM_DNS_PORT to 2506. This provides the application with the location of the name server that is required for DIP to run correctly. 25.1.2.1 DIP installation for C++ user under Linux Download the released tar file for Linux 32 bit (Linux tarball 32bits) from the DIP Web site and save it in a temporary directory. Create the dip directory on your account directory (eg. /home/dip_user) and untar the file: 1: 2: 3: 4: > mkdir dip > cd dip /dip > cp /tmp/DIP_tar_file.tar . /dip > tar -xvf DIP_tar_file.tar Create the bash script in Listing 25.1 (called dipEnv.sh) in your /dip directory, to set the environment variables need to develop, compile and run the DIP applications. Listing 25.1 dipEnv.sh: environmental setup 1: #!/bin/bash 2: export CLASSPATH= /dip/linux/lib/dip.jar 3: export LD_LIBRARY_PATH= /dip/lib:/opt/dim/linux:$LD_LIBRARY_PATH 4: export DIM_DNS_NODE=vodip01.cern.ch 5: export DIM_DNS_PORT=2506 6: export DIM_HOST_NODE=IPaddress_of_DIM_host_node ALICE DAQ and ECS manual LHC machine monitoring 496 Check the connection with the DIP NAME SERVER (vodip01.cern.ch), which belongs to the TN. If it is not accessible, this means that your host does not belong to the TN. Request CERN-IT to include your host in the TN. This check can be done launching, in a terminal, the Java DIP browser from your directory with the bash script shown in Listing 25.2. Listing 25.2 runBrowser.sh: DIP browser launcher 1: #!/bin/bash 2: export CLASSPATH= /dip/linux/lib/dip.jar 3: export LD_LIBRARY_PATH= /dip/lib:/opt/dim/linux:$LD_LIBRARY_PATH 4: java -jar /dip/linux/tools/dipBrowser.jar If it does not start, check if Java real-time with a version greater than 1.4.2-16 is installed (j2re.1.4.2-16) in the directory /usr/java. 25.2 LHC beam info: DIP client/DIM server To retrieve the information on the beams a DIP client has been developed (LHCClient.cpp) that subscribes to the following publications: • dip/acc/LHC/Beam/Energy • dip/acc/LHC/Beam/IntensityPerBunch/Beam1 • dip/acc/LHC/Beam/IntensityPerBunch/Beam2 • dip/acc/LHC/RunControl/BeamMode • dip/acc/LHC/RunControl/MachineMode • dip/acc/LHC/RunControl/FillNumber • dip/acc/LHC/RunControl/RunConfiguration • dip/acc/LHC/RunControl/CirculatingBunchConfig/Beam1 • dip/acc/LHC/RunControl/CirculatingBunchConfig/Beam2 This application has been developed in /dip/linux/test, in which you can find the makefile to compile the C++ code and produce the executable. The information collected from the above DIP publication, after a processing step, is published on the ALICE DCS DIM server (alidcsdimdns.cern.ch), from which a logbook daemon takes it and stores it into the ALICE Logbook at start of each run. The DIM publications subscribed to by the logbook daemon are the following: • ALICEDAQ_LHCBeamMode: char (‘C’), ‘LHC Beam Mode (STABLE BEAMS, INJECTION PROBE BEAM, ...)’ • ALICEDAQ_LHCBeamType: char (‘C’), ‘Type of collisions (‘p-p’,’Pb-Pb’)’ • ALICEDAQ_LHCBeamEnergy: float (‘F’), ‘Energy of the beam in GeV’ • ALICEDAQ_LHCFillNumber: int (‘I’), ‘LHC Fill Number’ ALICE DAQ and ECS manual LHC beam info: DIP client/DIM server 497 • ALICEDAQ_LHCTotalInteractingBunches: int (‘I’), ‘Number of Interacting Bunches’ • ALICEDAQ_LHCTotalNonInteractingBunchesBeam1: int (‘I’), ‘Number of Non-Interacting Bunches in Beam 1’ • ALICEDAQ_LHCTotalNonInteractingBunchesBeam2: int (‘I’), ‘Number of Non-Interacting Bunches in Beam 2’ • ALICEDAQ_LHCBetaStar: char (‘C’), ‘LHC Beta* in meters’ • ALICEDAQ_LHCInstIntensityInteractingBeam1: float (‘F’), ‘Instantaneous Intensity for Interacting Bunches in Beam 1 in num. of charged particles’ • ALICEDAQ_LHCInstIntensityInteractingBeam2: float (‘F’), ‘Instantaneous Intensity for Interacting Bunches in Beam 2 in num. of charged particles’ • ALICEDAQ_LHCInstIntensityNonInteractingBeam1: float (‘F’), ‘Instantaneous Intensity for Non-Interacting Bunches in Beam 1 in num. of charged particles’ • ALICEDAQ_LHCInstIntensityNonInteractingBeam2: float (‘F’), ‘Instantaneous Intensity for Non-Interacting Bunches in Beam 2 in num. of charged particles’ • ALICEDAQ_LHCFillingSchemeName: char (‘C’), ‘LHC Filling Scheme name (‘Single_12b_8_8_8’, ‘150ns_152b_140_16_140_8bpi’,...)’ Other DIM publications are available for future applications: • ALICEDAQ_LHCTotIntensityInteractingBeam1: float (‘F’), ‘Total Intensity for Interacting Bunches in Beam 1 in num. of charged particles’ • ALICEDAQ_LHCTotIntensityInteractingBeam2: float (‘F’), ‘Total Intensity for Interacting Bunches in Beam 2 in num. of charged particles’ • ALICEDAQ_LHCTotIntensityNonInteractingBeam1: float (‘F’), ‘Total Intensity for Non-Interacting Bunches in Beam 1 in num. of charged particles’ • ALICEDAQ_LHCTotIntensityNonInteractingBeam2: float (‘F’), ‘Total Intensity for Non-Interacting Bunches in Beam 2 in num. of charged particles’ All these values are sent to the infoBrowser every minute to be stored during the run. The DIP client/DIM server process continuously runs and, if it stops for any reason, in less than one minute, is restarted automatically thanks to the execution of the bash script in Listing 25.3, (called restartLHCClient.sh and stored in /dip directory) through the crontab service of Linux. The script executes every minute the check of the PID (PROCESS IDENTIFICATION) value of the process; if it is ‘0’, it sets the environment and starts the process. ALICE DAQ and ECS manual LHC machine monitoring 498 Listing 25.3 restartLHCClient.sh: automatic check and restart of DIP client 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: #!/bin/bash #daemon name: DIPDIMPROCESS="DIP_client_process_name" #pgrep command path: PGREP="/usr/bin/pgrep" #find the PID of daemon $PGREP ${DIPDIMPROCESS} #check if the daemon is active or not; if not, restart the daemon if [ $? -ne 0 ] then # ================== Setup for DATE ========================= [ -d /dateSite ] && export DATE_SITE=/dateSite if [ -f /date/setup.sh -a -d /dateSite ]; then export DATE_ROOT=/date . /date/setup.sh fi # ================== Setup for DIP/DIM ====================== export DIM_HOST_NODE=IPaddress_of_DIM_host_node . /dip/dipEnv.sh cd /dip/linux/test ./$DIPDIMPROCESS & fi 25.3 LHC beam info: off-line cross-check The non reliability of DIP publications has been proven so it is necessary to perform an off-line cross-check of values published by the DIP/DIM process and stored online by the logbook deamon. To do this, a Java application has been developed in order to perform an off-line cross-check between the values stored in the ALICE logbook (a MySQL database) and the ones stored in the LHC Logging Database, the system developed to permanently store and manage the measured values of the most important parameters, configurations and working characteristics of the all accelerator parts (PS, SPS, LINAC, LHC, etc), and experiments. The Java application uses the logging-data-extractor-client API, developed by the LHC Logging Database team. The application must be registered to have access to the data. To register a new application, the developing team of the LHC Logging Database has been contacted (be-dep-co-dm@cern.ch) and a meeting has been organized. During this discussion both high-level views on the analysis/application objectives and also technical/implementation details have been addressed. As soon as the new application is registered and a sample method to access the data is sent, it is possible to start the development of the application to manage the data extracted by the LHC Logging Database. The following Java packages are needed to develop, compile and run the applications to extract the information from LHC Logging Database: jdk1.6.0_20 and jre1.6.0_20 in /usr/java, and, for the specific Java application developed for the off-line cross-check, also the Java MySQL connector package mysql-connector-java-5.1.13 is needed, to retrieve and eventually update the information in the ALICE logbook. ALICE DAQ and ECS manual LHC beam info: off-line cross-check 499 By means of the Eclipse’s software (‘Eclipse IDE for Java Developers’ package from http://www.eclipse.org/downloads/) it is possible to create, modify and test the project of the application with the Java code and classes organized in the specified user workspace ( /workspace/projectName). In /workspace directory of the same host on which the DIP/DIM process runs, the Java project LHCLoggingProject has been developed to perform the daily off-line cross-check of the previous 24 hours of data taking. It needs to have the appropriate permissions to access and update the values in the ALICE logbook. The Java application, named ALICEDataExtractionACR.java, connects to and extracts the information from the LHC Logging Database stored in the last 24 hours of the day before, then it connects to and extracts the beam information stored in the ALICE logbook in the same time range, and for each run and for each variable it makes the comparison between the two values. If they are different, an update of ALICE logbook value is applied with the LHC Logging ones. Moreover the last value of each variable has been stored in specific files to know the starting point status for the next check. Listing 25.4 ACRLHCLoggingCheck.sh: automatic start of Java application for the off-line cross-check. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: ALICE DAQ and ECS manual #!/bin/bash export JAVA_HOME=/usr/java/jdk1.6.0_20 export PATH=$JAVA_HOME/bin:$PATH JAVA_HOME_JRE=/usr/java/jre1.6.0_20 JAVA_PROJECT_DIR=/home/alicedaq/workspace/LHCLoggingProject JAVA_LOG_DIR=/tmp/LogbookCrossCheck cd $JAVA_PROJECT_DIR CLASSPATH="" for i in lib/*.jar; do CLASSPATH=${CLASSPATH}:/home/alicedaq/workspace/LHCLoggingProject /$i; done; export CLASSPATH=/home/alicedaq/workspace/LHCLoggingProject/build/bin:/u sr/java/jre1.6.0_20/lib/ext/mysql-connector-java-5.1.13-bin.jar${ CLASSPATH} cp bin/* build/bin/ cd src/java javac -Xlint:deprecation ALICEDataExtractionACR.java cp ALICEDataExtractionACR.class ../../bin/ cp ALICEDataExtractionACR.class ../../build/bin/ DATETOCHECK=`date --date="now -1 day" +%F` echo ALICE cross-check for the date: $DATETOCHECK JAVAEXEC="$JAVA_LOG_DIR/JavaExec/$DATETOCHECK.txt" cd $JAVA_LOG_DIR/JavaExec/ for i in $(ls -rt) do DATE=${i:0:10} echo "$i --> date: $DATE" LASTDATECHECKED=$DATE done MAKECHECK=1 echo Last date checked: $LASTDATECHECKED if [ $DATETOCHECK = $LASTDATECHECKED ] && [ -s $JAVAEXEC ] ; then MAKECHECK=0; else NEXTDAY="$LASTDATECHECKED +1 day" NEXTDATETOCHECK=`date --date="$NEXTDAY" +%F` fi echo Make check flag: $MAKECHECK 500 LHC machine monitoring 41: cd $JAVA_PROJECT_DIR/src/java 42: 43: if [ $MAKECHECK = 1 ] ; then 44: echo "Make cross-check from $NEXTDATETOCHECK to $DATETOCHECK" 45: ENDDATE=$DATETOCHECK 46: STARTDATE=$LASTDATECHECKED 47: DATE=$STARTDATE 48: echo Start date: $NEXTDATETOCHECK 49: echo End date: $ENDDATE 50: DAY=1 51: 52: while [ $DATE != $ENDDATE ]; do 53: CURDATE="$STARTDATE +$DAY day" 54: echo date command: $CURDATE 55: DATETOCHECK=`date --date="$CURDATE" +%F` 56: DATE=$DATETOCHECK 57: echo The date to check is $DATETOCHECK 58: let DAY=$DAY+1 59: 60: JAVAEXEC="$JAVA_LOG_DIR/JavaExec/$DATETOCHECK.txt" 61: echo Java executed file name: $JAVAEXEC 62: 63: if [ ! -e "$JAVAEXEC" ] ; then 64: 65: echo Make the check for day $DATETOCHECK 66: cp /tmp/LogbookCrossCheck/lastHX:AMODE.txt /tmp/LogbookCrossCheck/NextLastBackup/nextlastHX:AMODE.txt 67: cp /tmp/LogbookCrossCheck/lastHX:BMODE.txt /tmp/LogbookCrossCheck/NextLastBackup/nextlastHX:BMODE.txt 68: cp /tmp/LogbookCrossCheck/lastHX:ENG.txt /tmp/LogbookCrossCheck/NextLastBackup/nextlastHX:ENG.txt 69: cp /tmp/LogbookCrossCheck/lastHX:FILLN.txt /tmp/LogbookCrossCheck/NextLastBackup/nextlastHX:FILLN.txt 70: cp /tmp/LogbookCrossCheck/lastLHC.BQM.B1:FILLED_BUCKETS.txt /tmp/LogbookCrossCheck/NextLastBackup/nextlastLHC.BQM.B1:FILLED_B UCKETS.txt 71: cp /tmp/LogbookCrossCheck/lastLHC.BQM.B2:FILLED_BUCKETS.txt /tmp/LogbookCrossCheck/NextLastBackup/nextlastLHC.BQM.B2:FILLED_B UCKETS.txt 72: 73: java ALICEDataExtractionACR $DATETOCHECK 74: 75: else 76: 77: if [ ! -s $JAVAEXEC ] ; then 78: 79: echo Make again the check 80: cp /tmp/LogbookCrossCheck/NextLastBackup/nextlast*.txt /tmp/LogbookCrossCheck/ 81: java ALICEDataExtractionACR $DATETOCHECK 82: fi 83: fi 84: done 85: else 86: echo "Date $LASTDATECHECKED already cross-checked!" 87: fi 88: 89: cd $JAVA_PROJECT_DIR ALICE DAQ and ECS manual Part VII The Transient Data Storage November 2010 ALICE DAQ Project TDS The Transient Data Storage 26 This chapter describes the Transient Data Storage as it has been deployed at ALICE and its associated packages. ALICE DAQ and ECS manual 26.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 504 26.2 The Transient Data Storage architecture . . . . . . . . . . . . 504 26.3 The TDSM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504 The Transient Data Storage 504 26.1 Introduction ALICE data have to be recorded on the Permanent Data Storage (PDS). ALICE chose the CERN Adanced STORage manager (CASTOR) as support for the PDS. The decision was taken to implement in the ALICE counting room a Transient Data Storage (TDS) area where files would be stored at the output of the DAQ. The TDS would ensure low latencies and high reliability, enough to keep writing data whatever the status of the PDS is. A dedicated software package was developed to control the TDS: the Transient Data Storage Manager (TDSM). The target of the TDSM is to assign disks for writing to the GDCs, move the data files to the PDS and trigger their registration into AliEn. We will herein review the architecture and the features of both the TDS and the TDSM. 26.2 The Transient Data Storage architecture The Transient Data Storage is organized in sets of hard disks, grouped in RAID6, realized as Disk Arrays (DAs). Several FibreChannel switches connect the DAs to the data writers (the GDCs) and the data readers (the TDSM Movers, hosts that handle the transfer of files from TDS to PDS). Transfers in and out of the TDS disks work better when crossing fewer FibreChannel switches. For this reason, switches and hosts are organized in groups, a concept that helps optimizing the data traffic by keeping it (if possible) within the same group. In this model, a “group” is equivalent to one FibreChannel switch. 26.3 The TDSM The TDS needs careful handling on order to: 1. Write data in ROOT format on the TDS with the smaller possible impact on the data acquisition procedure. 2. Synchronize the use of the disks belonging to the TDS to make them work at the best of their capabilities. 3. Efficiently migrate the data from the TDS to the PDS and register it in AliEn. An example of the architecture of the TDSM is show in Figure 26.1. Here the data flows top to bottom, GDCs to TDS to CASTOR. Three group of disks have been highlighted within the TDS. Three volumes are used for writing (exclusive mode), while three other volumes are selected for reading in shared mode. Write volumes are critical as they must not slow down the DAQ: for this reason they are not shared. Read volumes, on the other hand, can experience latencies without effecting the migration procedure. It is mandatory that disks are not used for simultaneous write and read operations, which would make both operations extremely slow. ALICE DAQ and ECS manual The TDSM 505 Outside the data flow we can see: a. the TDSM Manager (left side): a node that coordinates all activities in the TDSM and keeps the liaison with the GDCs and with the CTP; b. the TDSM configuration database which contains shared parameters, status variables, historic records and shared procedures; c. the DAQ logbook, used to record the status of each file and to provide the necessary assistance to the TDSM operator; d. the AliEn spooler, which runs on a dedicated node, talking to the DAQ network and to the CERN General Purpose Network (GPN), where the AliEn file catalogue gateway is hosted. The small scheme on the top-right of the figure shows the state transition scheme that applies to the data disks. GDC Feedback to the CTP GDC GDC GDC disabled TDS free TDSM Manager TDSM filling configuration & control DB full emptying TDSM File Mover AliEn spooler TDSM File Mover TDSM File Mover DAQ logbook DAQ network CERN GPN MSS network AliEn Figure 26.1 TDSM File Mover CASTOR Architecture of the TDSM (example). The central component of the TDSM is the configuration and control database. All the communications between actors are made through this database (dashed lines). Physically this database can run on any machine. It uses a separate set of tables in order to allow run independently from any other component (e.g. migrate data while the DAQ is under maintenance). The location of the TDSM database is defined in the DAQ/ECS configuration database. The TDS and the TDSM can be monitored via dedicated TDSM statistics that reports the throughputs and timings grouped by several criterias (machines, groups, volumes). This allows spotting and solving problems such as faulty disk volumes or misbehaving TDSM File Movers. The TDSM package provides a set of utilities for the operation of the TDS. These provide functions such as disabling a faulty volume, excluding an un-recoverable volume from the setup, off-loading a volume being rebuilt, disabling faulty File ALICE DAQ and ECS manual The Transient Data Storage 506 Movers and so on. All these operations can be done at any time, regardless from the concurrent activities in ALICE, without stopping the TDSM and without changing the configuration database. Other TDSM utilities allow more complex operations, such as re-organizing the hardware resources or assigning new nodes to the set. These operations usually require stopping and restarting the TDSM operation and therefore cannot be done when ALICE is in a running state. In order to make good use of the hardware resources in ALICE, the TDSM monitors the occupancy of the TDS. Whenever this exceeds a pre-defined threshold, a dedicated script (named “TDS Full” script) is called. Actions taken by this script can be, for example, pausing the data acquisition process or changing the trigger profile in order to reduce the detectors data rates (e.g. allowing only rare trigger with lower throughput). A second script, named “TDS empty” script, is called whenever the TDS resources go below a second pre-define threshold, to revert whatever action was taken by the “TDS Full” script. For detailed instructions on how to re-organize, re-configure, run and control the TDSM, refer to the ALICE DAQ WIKI. 26.3.1 The TDSM and the DAQ The TDSM handles the attribution of resources to the DAQ as follows. At start of run, the mStreamRecorder process checks if the write volume declared for this particular run is available. If not, it requests a volume to the TDSM and waits for the directory to be assigned. When the TDSM completes the transaction, it creates a symbolic link which is detected by mStreamRecorder and used to create the output ROOT files. During the run, the mStreamRecorder process checks at regular intervals the presence of the write directory. Whenever the write volume changes state (e.g. when it gets full), the TDSM removes the symbolic link and creates a new one pointing to a newly assigned volume. The mStreamRecorder process detects this change, closes the current output file and creates a new one, ensuring the transition to the new volume. After a pre-defined period of inactivity, the TDSM changes the status of the write volumes and triggers their migration. This avoids stalling volumes for too long at the end of a run. The operator can, at any time, trigger the migration of the data files written by a partition to the PDS. This is translated into a command for the TDSM that takes care of changing the status of the write volumes. Please note that the recostruction of a given run cannot be started as long as the run is not completed: therefore triggering the migration of a run before the run is closed will not speed up the reconstruction of the run itself (it will only make the data files available earlier to the reconstruction process). 26.3.2 Size of the output files The size of the output files written on TDS is a very important factor. The PDS requests to handle big files, in order to optimize tape handling and packing. For ALICE DAQ and ECS manual The TDSM 507 this reason it is better not to request the migration of a run to PDS as long as the run is not over. Smaller files increase the latency for their migration to PDS and their recall from tape, while making tape handling less efficient and slower. At ALICE, the size of the ROOT data files is specified in the configuration of mStreamRecorder. Refer to the ALICE DAQ WIKI for more details on this subject. 26.3.3 Links within the TDS and TDSM components At Point 2, all the TDS and TDSM components are inside the ALICE DAQ network (to allow communication between hosts and databases). However, two channels need to be open outside this network: 1. Data links to the PDS. 2. Communication links to AliEn. Data links are allowed via firewalling commands on the Ethernet switches between the TDS and the PDS. Communications with AliEn are done using hosts equipped with two NICs, one connected to the DAQ network and one connected to whatever network hosts the AliEn server (currently on GPN). 26.3.4 The AliEn spooler There are two separate implementations for the AliEn spooler: 1. one or more DAQ-hosted machines running dedicated Offline software for the registration; 2. one or more Offline-hosted machines handling the registration. In the first architecture, the DAQ takes care of checking the health of the registration process, of the AliEn spooler daemon and of the forward progress of the procedure. In the second architecture, currently in use in ALICE, the DAQ provides the needed information to the gateway. This information is then handled by the Offline software (which takes care of error handling, error recovery and forward progress checking). Hot-swap between multiple nodes is performed in case of failures: the operator is notified of the event, but no immediate corrective action is required as long as there is at least one active gateway available. ALICE DAQ and ECS manual 508 The Transient Data Storage ALICE DAQ and ECS manual Part VIII November 2010 References References 27 1. K. Aamodt et al. (ALICE Collaboration), J Instrum., 3, S08002 (2008). 2. ALICE Collaboration, ALICE Technical Design Report on Trigger, Data Acquisition, High-Level Trigger, Control system, CERN/LHCC/2003-062. 3. C. Gaspar, A distributed Information Management System for the DELPHI experiment at CERN, in Proc. of the IEEE Real Time Conference, Vancouver, Canada, 1993. 4. J. Barlow et al., Run Control in Model: the State Manager, in Proc. of the 6th Conf. on Real Time Computer Applications in Nuclear, Particle and Plasma Physics, Williamsburg, VA, USA, 1989. 5. B. Franek and C. Gaspar, SMI++, An object oriented framework for designing distributed control systems, IEEE Trans. Nucl. Sci. 45 (1998) 1946-1950. 6. http://dev.mysql.com/doc/mysql. 7. http://httpd.apache.org. 8. Apache Software Foundation, Hypertext PreProcessor, http://www.php.net. 9. J.P. Baud et al., CASTOR status and evolution, Proc. Conf. on Computing in High Energy Physics, La Jolla, CA, USA, 24-28 March 2003 (SLAC, Stanford). 10. R. Brun and F. Rademakers, ROOT An object oriented data analysis framework, Nucl. Instr. Meth. A389 (1997) 81-86. 11. ALICE Collaboration, ALICE Technical Design Report on Trigger, Data Acquisition, High-Level Trigger, Control system, CERN/LHCC/2003-062, 41-108. 12. B. G. Taylor, LHC Machine Timing Distribution for the Experiments, in Proc. of the 6th Workshop on Electronics for LHC Experiments, Cracow, Poland, 2000 (CERN 2000-010, Geneva, 2003). 13. Danny Cohen, On Holy Wars and a Plea for Peace, IEEE Computer, Oct. 1981, 48-54. 14. ALICE Collaboration, ALICE Technical Proposal, CERN/LHCC/1995-71. 15. R. Divià et al, Data Format over the ALICE DDL, Internal Note ALICE-INT-2002-010 V5.1. 16. ALICE Collaboration, ALICE Technical Design Report on Computing, CERN-LHCC-2005-018, 15-21. 17. ALICE Collaboration, ALICE Technical Design Report on Computing, ALICE DAQ and ECS manual References 512 CERN-LHCC-2005-018, 27-30. 18. T. Oetiker, Round Robin Database tool, http://www.rrdtool.com, 2003. 19. http://cern.ch/alice-daq. 20. M. Boccioli, F. Carena and O. Pinazza, ALICE DCS Run Control Tool, DCS Internal Note, 2010. ALICE DAQ and ECS manual List of Figures 513 List of Figures Figure 1.1 Figure 3.1 Figure 3.2 Figure 3.3 Figure 3.4 Figure 3.5 Figure 3.6 Figure 3.7 Figure 3.8 Figure 3.9 Figure 4.1 Figure 4.2 Figure 4.3 Figure 4.4 Figure 4.5 Figure 4.6 Figure 4.7 Figure 4.8 Figure 4.9 Figure 4.10 Figure 4.11 Figure 4.12 variables. Figure 4.13 Figure 4.14 Figure 5.1 Figure 5.2 Figure 5.3 Figure 6.1 Figure 6.2 Figure 7.1 Figure 7.2 Figure 7.3 Figure 7.4 Figure 7.5 ALICE DAQ and ECS manual DAQ architecture overview. . . . . . . . . . . . . . 2 Streamlined unextended event format . . . . . . . . . . 15 Streamlined extended event format . . . . . . . . . . . 16 Paged event logical format . . . . . . . . . . . . . . 17 Paged event first-level vector format . . . . . . . . . . 18 Collider mode event identification . . . . . . . . . . . 19 Fixed target mode event identification . . . . . . . . . . 19 The full event format . . . . . . . . . . . . . . . . 32 Example of use of DATE event vectors . . . . . . . . . . 40 Example of complete paged event . . . . . . . . . . . 41 DATE configuration database structure - main tables. . . . . 45 The initial editDb view. . . . . . . . . . . . . . . . 57 GDC cloning window. . . . . . . . . . . . . . . . . 57 LDC cloning window. . . . . . . . . . . . . . . . . 58 New equipment creation display. . . . . . . . . . . . . 58 Equipment configuration display. . . . . . . . . . . . 59 Detectors configuration display. . . . . . . . . . . . . 59 Triggers configuration display. . . . . . . . . . . . . 60 Membanks configuration display. . . . . . . . . . . . 60 Event building rules configuration display. . . . . . . . . 61 Environment variables configuration display. . . . . . . . 61 Environment variables configuration display showing user defined 62 Files configuration display. . . . . . . . . . . . . . . 62 Example of a DAQ system. . . . . . . . . . . . . . . 63 The DATE online monitoring, local and remote configurations . 86 The DATE offline monitoring . . . . . . . . . . . . . 87 The DATE relayed monitoring . . . . . . . . . . . . . 88 Main event loop executed by the readout process. . . . . 110 The generic readList concept of the readout process. . . . 115 Event identification mechanism of the RorcData equipment. 128 The software elements to handle the RORC device. . . . . 130 Example of one sub-event in paged event mode. . . . . . 133 The data flow for an LDC with 3 RORC devices . . . . . 140 The back-pressure algorithm. . . . . . . . . . . . . 147 514 List of Figures Figure 8.1 ALICE Trigger. . . . . . . . . . . . . . . . . . . 150 Figure 10.1 A schematic block-diagram of the mStreamRecorder. The legend is shown at the top 168 Figure 11.1 The DATE infoLogger architecture . . . . . . . . . . 184 Figure 13.1 The EDM architecture. . . . . . . . . . . . . . . . 205 Figure 14.1 The runControl system architecture. . . . . . . . . . . 213 Figure 15.1 Memory layout of physmem . . . . . . . . . . . . . 241 Figure 17.1 The format of the CTP event data. . . . . . . . . . . . 276 Figure 17.2 The format of interaction records. . . . . . . . . . . . 277 Figure 17.3 TRIGGER-DAQ-HLT overall architecture. . . . . . . . . 278 Figure 17.4 DAQ-HLT interface schematic view. . . . . . . . . . . 279 Figure 17.5 DAQ-HLT Data Flow overview. . . . . . . . . . . . 279 Figure 17.6 Data flow in the LDC in the DAQ system with HLT active. . 280 Figure 17.7 Interconnections between hltAgents. . . . . . . . . . . 282 Figure 18.1 ECS/DCS interface. . . . . . . . . . . . . . . . . 301 Figure 18.2 ECS/TRG interface. . . . . . . . . . . . . . . . . 302 Figure 19.1 The architecture of the ACT and its interfaces with the different online systems and detectors. 306 Figure 19.2 ACT hierarchy. . . . . . . . . . . . . . . . . . . 307 Figure 19.3 Items activation status state diagram. . . . . . . . . . 308 Figure 19.4 ACT workflow diagram. . . . . . . . . . . . . . . 310 Figure 19.5 ACT database schema. . . . . . . . . . . . . . . . 311 Figure 22.1 DA framework architecture. . . . . . . . . . . . . . 396 Figure 23.1 Schema of the main dependencies of AMORE. . . . . . . 406 Figure 23.2 The publisher-subscriber paradigm in AMORE. . . . . . 406 Figure 23.3 Description of a module. . . . . . . . . . . . . . . 407 Figure 23.4 Schema of the database. . . . . . . . . . . . . . . 409 Figure 23.5 Left: the publisher Finite State Machine. Right: the client Finite State Machine. 414 Figure 23.6 Sequence of methods calls on the agent and the client modules. 416 Figure 23.7 The archiving system in AMORE. . . . . . . . . . . . 418 Figure 23.8 Class diagram (including some interaction information) of the package archiver. 419 Figure 24.1 The architecture of the eLogbook and it’s interfaces with the other ALICE systems and the LHC. 442 Figure 24.2 eLogbook’s database schema . . . . . . . . . . . . . 443 Figure 25.1 Information distribution service of a newspaper or magazine 492 Figure 25.2 DIP architecture. . . . . . . . . . . . . . . . . . 493 Figure 25.3 Cross–domain DIP communication on the TN. . . . . . . 494 Figure 26.1 Architecture of the TDSM (example). . . . . . . . . . . 505 ALICE DAQ and ECS manual List of Listings 515 List of Listings Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing Listing 3.1 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 5.1 5.2 5.3 5.4 6.1 7.1 7.2 7.3 7.4 7.5 9.1 9.2 10.1 10.2 10.3 10.4 10.5 10.6 11.1 15.1 15.2 15.3 15.4 16.1 16.2 17.1 Detecting swapping of the event data . . . . . . . . . . . . . . Example DATE banks dump . . . . . . . . . . . . . . . . . Example of configuration files . . . . . . . . . . . . . . . . . Example of roles database . . . . . . . . . . . . . . . . . . Example of trigger configuration . . . . . . . . . . . . . . . . Example of detectors configuration . . . . . . . . . . . . . . . Example of event-building configuration . . . . . . . . . . . . . Example of banks configuration . . . . . . . . . . . . . . . . Example of dumpDbs output . . . . . . . . . . . . . . . . . Example of event dump in C: . . . . . . . . . . . . . . . . . Examples of use of the monitorClients utility . . . . . . . . . . . Examples of use of the monitorSpy utility . . . . . . . . . . . . . Creation of configuration files . . . . . . . . . . . . . . . . . Example of an equipment source code file . . . . . . . . . . . . Pseudo code of equipment routine ArmRorcData() . . . . . . . . Pseudo code of equipment routine AsynchReadRorcData() . . . . . Pseudo code of equipment routine ReadEventRorcData() . . . . . Pseudo code of equipment routine DisArmRorcData() . . . . . . . Pseudo code for handling a FIFO for a single process . . . . . . . . Example of COLE configuration: . . . . . . . . . . . . . . . . Example of COLE equipment configuration: . . . . . . . . . . . . A simple configuration with 3 streams per GDC, recording to a local disk . A simple configuration with 3 streams per GDC, recording to CASTOR . The configuration for ROOT recording to CASTOR . . . . . . . . . The configuration with special properties for the GDC pcaldXXgdc . . . An API function used by the MSR to create an AliMDC object . . . . . Examples of starting MSR in the stand-alone mode . . . . . . . . . Setting the Facility name in C programs . . . . . . . . . . . . . Example of GRUB to trim the Linux memory region to 1 GB . . . . . . Example of LILO to trim the Linux memory region to 1 GB . . . . . . Example to list the physmem physical base addresses and sizes . . . . Example of testing the physmem driver with utility physmemTest . . . Example of dateBufferManagerValidate run . . . . . . . . . . . . Example of simpleFifoValidate run . . . . . . . . . . . . . Example of directory structure on CASTOR . . . . . . . . . . . . ALICE DAQ and ECS manual 35 . 52 . 64 . 64 . 65 . 65 . 65 . 66 . 67 . 90 103 104 105 124 141 143 144 144 145 155 157 170 170 170 171 178 181 189 233 233 235 237 259 264 284 . List of Listings 516 Listing 19.1 Listing 19.2 Listing 19.3 Listing 19.4 Listing 19.5 Listing 19.6 Listing 19.7 Listing 19.8 Listing 20.1 Listing 20.2 Listing 20.3 Listing 20.4 Listing 23.1 Listing 25.1 Listing 25.2 Listing 25.3 Listing 25.4 cross-check. ACT_handle type definition . . . . . . . . . . . . . . . . . . 315 ACT_system type definition . . . . . . . . . . . . . . . . . 315 ACT_t_systemCategory type definition. . . . . . . . . . . . . 315 ACT_t_systemParams type definition . . . . . . . . . . . . . 316 ACT_item type definition . . . . . . . . . . . . . . . . . . 316 ACT_t_itemCategory type definition . . . . . . . . . . . . . . 316 ACT_t_itemActiveStatus type definition . . . . . . . . . . . 317 ACT_instance type definition . . . . . . . . . . . . . . . . 317 Example of rorc_find program . . . . . . . . . . . . . . . . . 336 Example of rorc_qfind program . . . . . . . . . . . . . . . . . 336 Example of an FeC2 script . . . . . . . . . . . . . . . . . . . 351 Example of a DDG configuration file . . . . . . . . . . . . . . . 361 Example of a configuration file for the archiver . . . . . . . . . . . 419 dipEnv.sh: environmental setup . . . . . . . . . . . . . . . . 495 runBrowser.sh: DIP browser launcher . . . . . . . . . . . . . 496 restartLHCClient.sh: automatic check and restart of DIP client . . . 498 ACRLHCLoggingCheck.sh: automatic start of Java application for the off-line 499 ALICE DAQ and ECS manual List of Tables 517 List of Tables Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table 0.1 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 5.1 5.2 5.3 5.4 5.5 6.1 7.1 7.2 7.3 7.4 7.5 7.6 10.1 10.2 11.1 14.1 14.2 14.3 14.4 14.5 14.6 14.7 17.1 19.1 19.2 19.3 Software versions corresponding to this guide . . . . . . . Base event header structure . . . . . . . . . . . . . . The successive list of records in a data file generated by DATE . Commonly used platforms and their endianness . . . . . . Common data header structure . . . . . . . . . . . . Common Data Header Status and Error bits . . . . . . . . Equipment header structure . . . . . . . . . . . . . . Event vector structure . . . . . . . . . . . . . . . . Payload descriptor structure . . . . . . . . . . . . . Monitor source parameter syntax . . . . . . . . . . . . Event types . . . . . . . . . . . . . . . . . . . . Monitor types . . . . . . . . . . . . . . . . . . . Bytes swapping control . . . . . . . . . . . . . . . Monitoring configuration parameters . . . . . . . . . . Equipment suites in the readList package . . . . . . . . . RorcData equipment parameters for all data sources . . . . . RorcData equipment parameters (equipment software). . . . RorcData equipment parameters (RORC internal data generator) RorcData equipment parameters (FEIC) . . . . . . . . . RorcData equipment parameters (detector electronics) . . . . RorcSplitter equipment parameters . . . . . . . . . . . Attributes in MSR configuration files . . . . . . . . . . Rules of precedence for MSR attribute values . . . . . . . infoLogger configuration parameters - environment variables . Common RunParameters . . . . . . . . . . . . . . . LDC RunParameters . . . . . . . . . . . . . . . . GDC RunParameters . . . . . . . . . . . . . . . . EDM RunParameters . . . . . . . . . . . . . . . . LDC run-time variables . . . . . . . . . . . . . . . GDC run-time variables . . . . . . . . . . . . . . . EDM run-time variables . . . . . . . . . . . . . . . File Exchange Server daqFES_files table . . . . . . . . ACTsystems table . . . . . . . . . . . . . . . . . ACTitems table . . . . . . . . . . . . . . . . . . ACTinstances table . . . . . . . . . . . . . . . . ALICE DAQ and ECS manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v . 19 . 33 . 34 . 36 . 37 . 38 . 39 . 40 . 92 . 94 . 94 . 99 106 118 135 136 136 137 137 138 174 175 184 218 219 221 223 224 226 227 287 311 312 312 . List of Tables 518 Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table 19.4 19.5 19.6 19.7 23.1 23.2 23.3 23.4 23.5 23.6 23.7 23.8 23.9 23.10 23.11 23.12 23.13 24.1 24.2 24.3 24.4 24.5 24.6 24.7 24.8 24.9 24.10 24.11 24.12 24.13 24.14 24.15 24.16 24.17 24.18 24.19 24.20 24.21 24.22 24.23 24.24 24.25 24.26 24.27 24.28 24.29 24.30 24.31 ACTlockedItems table . . . . . . . . . . . . . ACTconfigurations table . . . . . . . . . . . . ACTconfigurationsContent table . . . . . . . . ACTinfo table . . . . . . . . . . . . . . . . . amoreconfig List of the agents. . . . . . . . . . . . amoreref Configuration files table. . . . . . . . . . . Agents tables fields description . . . . . . . . . . . latest_values table . . . . . . . . . . . . . . . . Archives tables . . . . . . . . . . . . . . . . . globals table . . . . . . . . . . . . . . . . . . roles table . . . . . . . . . . . . . . . . . . . users table . . . . . . . . . . . . . . . . . . . agents_access table . . . . . . . . . . . . . . . agents_details table . . . . . . . . . . . . . . . DIM commands . . . . . . . . . . . . . . . . Data passed to the Logbook at SOR . . . . . . . . . Members of the class MonitorObject . . . . . . . . . logbook table (per run conditions and statistics) . . . . logbook_detectors table . . . . . . . . . . . . logbook_stats_LDC table . . . . . . . . . . . . logbook_stats_LDC_trgCluster table . . . . . . logbook_stats_GDC table . . . . . . . . . . . . logbook_stats_files table . . . . . . . . . . . logbook_daq_active_components table . . . . . . logbook_shuttle table . . . . . . . . . . . . . logbook_DA table . . . . . . . . . . . . . . . logbook_AMORE_agents table . . . . . . . . . . logbook_trigger_clusters table . . . . . . . . logbook_trigger_classes table . . . . . . . . . logbook_trigger_inputs table . . . . . . . . . logbook_trigger_config table . . . . . . . . . logbook_stats_HLT table . . . . . . . . . . . . logbook_stats_HLT_LDC table . . . . . . . . . . logbook_comments table . . . . . . . . . . . . logbook_comments_interventions table . . . . . logbook_files table . . . . . . . . . . . . . . logbook_threads table . . . . . . . . . . . . . logbook_subsystems table . . . . . . . . . . . logbook_comments_subsystems table . . . . . . . logbook_users table . . . . . . . . . . . . . . logbook_users_privileges table . . . . . . . . logbook_users_profiles table . . . . . . . . . logbook_filters table . . . . . . . . . . . . . DETECTOR_CODES table . . . . . . . . . . . . . TRIGGER_CLASSES table . . . . . . . . . . . . . logbook_config table . . . . . . . . . . . . . logbookDaemon DIM services and DB fields relationship. logbookDaemon run events DIM services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 313 314 314 409 410 410 411 411 411 412 412 412 413 419 421 423 444 446 447 448 448 448 449 450 451 451 452 453 453 454 454 454 455 456 456 456 457 457 457 458 458 458 459 459 460 483 484 ALICE DAQ and ECS manual List of Acronyms 519 List of Acronyms A ACR AliEn AliMDC AliROOT AMORE Apache ALICE Control Room Alice Environment A class to represent raw data as a ROOT object (originally designed for alimdc program) ALICE Off-line framework for simulation, reconstruction and analysis based on ROOT Automatic MOnitoring Environment HTTP server B BC BE Bunch Crossing Big-Endian C CASTOR CDH COLE CTP ClT CERN Advanced STORage Manager Common Data Header Configurable LDC Emulator Central Trigger Processor Calibration Trigger flag D D-RORC DAQ DATEMON DATE DB DCA DCS DDG DDL DIM DAQ Read-Out Receiver Card Data Acquisition System DATE system monitoring set Data Acquisition and Test Environment Database Detector Control Agent Detector Control System DDL Data Generator program Detector Data Link Distributed Information Manager package ALICE DAQ and ECS manual 520 List of Acronyms DIU DMA DST DTSTW Destination Interface Unit in RORC Direct Memory Access Detector Software Trigger event type Data Transmission Status Word in RORC E ECS EDM EOB EOR EPS Experiment Control System Event Distribution Manager End of Burst End of Run Encapsulated Postscript file format F FEE FEIC FERO FIFO FSM FeC2 Front-End Electronics Front-End Emulator Interface Card Front-End Read-Out First In First Out buffer type Finite State Machine Front-end Control and Configuration program G GB GDC GUI GigaByte Global Data Collector Graphical User Interface H HLT HTTP High-Level Trigger Hypertext Transfer Protocol I IPC Inter-Process Communication K KB KiloByte L LDC LSB LTU L2a Local Data Concentrator Least Significant Bit Local Trigger Unit Level-2 accept (trigger) M MB MOOD MSB MSR ms MegaByte Monitor Of Online Data and Detector Debugger Most Significant Bit Multiple-Stream Recorder millisecond ALICE DAQ and ECS manual List of Acronyms 521 N ns nanosecond P PCA PDS PNG pRORC Partition Control Agent Permanent Data Storage Portable Network Graphics file format 32-bit/33 MHz PCI bus RORC R RCS ROI ROOT RORC Run-Control Server interface Region Of Interest An object-oriented data analysis framework Read-Out Receiver Card S SBC SIU SMI SOB SOR s Single Board Computer Source Interface Unit (in DDL) State Management Interface Start of Burst Start of Run second T TPA TPC TRG TTC Trigger Partition Agent Time Projection Chamber Trigger Timing, Trigger and Control system ALICE DAQ and ECS manual 522 List of Acronyms ALICE DAQ and ECS manual Index 523 Index Symbols >Cole 157 >COMMON 170–173, 175–176, 178–179 >DETECTORS 155 >Detectors_section 155 >EQTYPES 157 >Events_section 155 >LDCS 157 >Options_section 155 >OSTREAMS 170–173, 175–176 >RECORDERS 170–173, 175–176 A AliEN 174, 519 AliMDC 177–178, 519 AliRoot 167, 177, 180, 519 ALPHA 34 AMORE iv Apache 511, 519 ArmHw 157, 224 ArmRorcData 133, 140–141, 148 ArmRorcSplitter 134 ArmRorcTrigger 134, 148 AsynchReadRorcData 133, 139–141, 143, 148 AsynchReadRorcTrigger 134, 141, 148 B BIGPHYS 199, 232 BunchCrossing 22 ALICE DAQ and ECS manual Index 524 D DATE iii DATE MySQL 62 DATE_COMMON_DEFS 14, 96, 102, 250 DATE_ROOT 126, 140 DATE_SITE 105–106, 155–156, 163, 180, 185–186, 188, 214–215, 228, 281 DATE_SITE_CONFIG 105–106, 180, 214–215, 281 DATE_SITE_LOGS 185 DDL iii DisArmHw 158 DisArmRorcData 134, 140, 142, 144, 148 DisArmRorcSplitter 134 DisArmRorcTrigger 134, 148 D-RORC iii DTSTW 128–129, 131–133, 141, 143, 146, 520 E ECS iv ECS_LOGS 303 Event types CALIBRATION_EVENT 21, 33, 94 DETECTOR_SOFTWARE_TRIGGER_EVENT 21, 33, 94 END_OF_BURST 22, 33, 94, 204 END_OF_DATA 21, 33, 94, 221 END_OF_RUN 22, 29, 33, 94, 204 END_OF_RUN_FILES 22, 33, 94, 204 EVENT_FORMAT_ERROR 22, 94, 197 PHYSICS_EVENT 21, 33, 94, 206 START_OF_BURST 21, 33, 94, 204 START_OF_DATA 21, 33, 94, 221 START_OF_RUN 21–22, 29, 33, 94, 162, 204 START_OF_RUN_FILES 21, 33, 94, 162, 204 SYSTEM_SOFTWARE_TRIGGER_EVENT 21, 33, 94 EventArrived 157 EventArrivedRorcData 133, 148 EventArrivedRorcSplitter 134 EventArrivedRorcTrigger 134, 139, 141, 148 EventID 19–20, 22–24, 30–31, 127, 134, 142, 148, 204–206, 223–225, 227, 250–251 F FEIC 135–137, 520 Front-end emulator 135, 520 FSM 293, 520 I IPC 199, 232, 281, 520 ALICE DAQ and ECS manual Index 525 L L0 37 L1 36–37, 144 L2A 25, 276–277, 520 L2a 25, 276–277, 520 LOGBOOK 198, 228 LOGLEVEL 106, 163, 170–171, 174, 179–181, 194, 220, 222–223, 228 M MySQL 43–44, 56, 68, 77, 124 MySQL-based databases 55 P Paged events 14–18, 20–21, 38, 40–41, 220 PDS 3, 86–87, 91, 180, 222, 278, 521 R RCS 213, 218, 521 ReadEvent 158, 224–225 ReadEventRorcData 134, 139–140, 142, 144, 148 ReadEventRorcSplitter 134 ReadEventRorcTrigger 134, 148 RFIO 169, 177 RPM 7 S Streamlined 14–16, 18, 21, 30, 154, 157, 163, 220, 264 Sub-detector 277 Sub-event 3, 30–31, 35, 126–127, 131–133, 139, 142, 147, 156, 163, 199, 204, 206–208, 219, 224, 226 T TARGET 18–19, 22–23, 30, 33, 127, 218 TDS 3 TEST_ANY_ATTRIBUTE 28 TEST_DETECTOR_IN_PATTERN 26 TEST_SYSTEM_ATTRIBUTE 28, 35 TEST_TRIGGER_IN_PATTERN 25 TEST_USER_ATTRIBUTE 29 TPA 297, 299, 302, 521 TTC 2, 4, 276, 521 ALICE DAQ and ECS manual 526 Index ALICE DAQ and ECS manual
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.6 Linearized : Yes Tagged PDF : Yes Page Mode : UseOutlines XMP Toolkit : Adobe XMP Core 4.0-c320 44.293068, Sun Jul 08 2007 18:10:11 Create Date : 2010:12:16 09:00:09Z Creator Tool : FrameMaker 8.0 Modify Date : 2010:12:16 09:16:45+01:00 Metadata Date : 2010:12:16 09:16:45+01:00 Producer : Acrobat Distiller 9.4.0 (Windows) Format : application/pdf Title : ug_book.book Creator : divia Document ID : uuid:99dcb02b-d7b0-4219-875a-9844dba72062 Instance ID : uuid:7a91a222-e120-4f66-ab22-1c5d685e4ee6 Page Count : 546 Author : diviaEXIF Metadata provided by EXIF.tools