Big Data and AI at the CERN LHC by Dr. Thea Klaeboe Aarrestad

The CERN Large Hadron Collider (LHC) generates an unprecedented O(10,000) exabytes of raw data annually from high-energy proton collisions. Managing this vast data volume while adhering to computational and storage constraints requires real-time event filtering systems capable of processing millions of collisions per second. These systems, leveraging a multi-tiered architecture of FPGAs, CPUs, and GPUs, must rapidly reconstruct and analyze collision events, discarding over 98% of the data within microseconds. As the LHC transitions to its high-luminosity era (HL-LHC), these data-processing systems—operating in radiation-shielded caverns 100 meters underground — must contend with data rates comparable to 5% of global internet traffic, alongside unprecedented event complexity. Ensuring data integrity for physics discovery demands efficient machine learning (ML) algorithms optimized for real-time inference, achieving extreme throughput and ultra-low latency. https://thaarres.github.io/ https://ethz.ch/en.html Talk from Systems Distributed '25: https://systemsdistributed.com Join the chat at https://slack.tigerbeetle.com/invite