1. Why the Standard Model is both brilliant and incomplete 2. What the Large Hadron Collider actually does 3. The 40-petabyte data problem 4. How AI anomaly detection works at CERN 5. ATLAS and CMS: the two experiments leading the charge 6. Quantum machine learning: QNN and QLSTM models 7. The EPA project: AI running the accelerator itself 8. What "new physics" would actually mean 9. Why this moment is different from previous searches 10. The risks of AI-led discovery 11. What comes next for CERN's AI program ---

CERN's AI is now questioning the laws of physics at the LHC…

Q: Why does CERN need AI to analyze LHC data?

The LHC generates approximately 40 petabytes of data per year. Even after aggressive real-time filtering, the remaining dataset is too large and complex for traditional analysis methods to screen comprehensively for all possible types of new physics. AI models, particularly unsupervised anomaly detection systems, can scan events for unusual patterns without requiring scientists to specify in advance what they are looking for.

TL;DR: Scientists at CERN are using AI anomaly detection models, including quantum machine learning approaches like QNN and QLSTM, to scan Large Hadron Collider collision data for events that cannot be explained by the Standard Model of particle physics. The LHC generates roughly 40 petabytes of data per year. AI is the only scalable tool capable of screening it for "exotic-looking" collisions in real time. A parallel initiative called the EPA project is using AI to optimize the accelerator's own operations. If the AI finds something the Standard Model cannot explain, it will be the biggest physics discovery in decades.

What you will learn

Why the Standard Model is both brilliant and incomplete
What the Large Hadron Collider actually does
The 40-petabyte data problem
How AI anomaly detection works at CERN
ATLAS and CMS: the two experiments leading the charge
Quantum machine learning: QNN and QLSTM models
The EPA project: AI running the accelerator itself
What "new physics" would actually mean
Why this moment is different from previous searches
The risks of AI-led discovery
What comes next for CERN's AI program

Why the Standard Model is both brilliant and incomplete

The Standard Model of particle physics is, by most measures, the most precisely tested scientific theory ever constructed. It describes the fundamental particles that make up all matter, the forces through which they interact, and the mechanism by which they acquire mass. It predicted the existence of the Higgs boson — discovered at CERN in 2012 — nearly 50 years before the discovery occurred. Its predictions have been confirmed by experiments to a precision of one part in a trillion.

And yet, physicists are completely certain it is wrong. Or at least, incomplete.

The problems are not subtle. The Standard Model has no explanation for dark matter, which appears to make up roughly 27% of the universe's mass and energy. It has no mechanism to account for the observed imbalance between matter and antimatter — without which neither you nor the planet you are standing on would exist. It cannot incorporate gravity in a mathematically consistent way. It provides no explanation for why there are exactly three generations of fundamental matter particles, or why the masses of those particles span a range of more than 11 orders of magnitude.

These are not edge cases. They are central, unresolved problems. Every physicist working in the field knows the Standard Model is a placeholder — an extraordinarily good one, but a placeholder nonetheless. The question that has dominated experimental physics for three decades is: where do you look for what lies beyond it?

The answer, since the early 1980s, has been: build a bigger collider. Smash particles together at higher energies. Look for deviations from Standard Model predictions in the debris. The Large Hadron Collider at CERN is the culmination of that strategy. And now, for the first time, CERN is deploying AI specifically designed to notice the things that human physicists and traditional analysis pipelines might miss.

What the Large Hadron Collider actually does

The Large Hadron Collider is a circular particle accelerator buried roughly 100 meters underground along the border between France and Switzerland. Its circumference is 27 kilometers. Inside it, two beams of protons travel in opposite directions at 99.9999991% of the speed of light, guided by thousands of superconducting magnets cooled to temperatures colder than outer space.

When these beams collide, the energy of impact is transformed into matter. New particles burst into existence for fractions of a second before decaying into other particles, which decay further, cascading outward through layers of detectors in a shower of signals. Each collision is called an event. The LHC produces up to 40 million collisions per second when running at full capacity.

Most of these collisions are uninteresting — protons clipping each other at glancing angles, producing spray patterns that physicists have catalogued exhaustively. The rare collisions, the ones in which the full energy of both protons is concentrated into a single violent interaction, are where the interesting physics happens. The Higgs boson was found in those rare events, after analyzing trillions of collisions and finding a statistical excess where the Standard Model alone could not explain the data.

The detectors around the collision points — ATLAS, CMS, ALICE, and LHCb — are instruments of extraordinary complexity. ATLAS alone is 46 meters long, 25 meters in diameter, and weighs approximately 7,000 tonnes. It contains roughly 100 million individual sensor channels. CMS, its counterpart on the opposite side of the ring, is smaller but denser, and houses the world's largest silicon tracker.

Together, these detectors generate a continuous stream of data that, if stored in full, would accumulate at a rate impossible to manage with any current infrastructure. The solution has always been to filter aggressively in real time, discarding events that look unremarkable and saving only those that match predefined criteria for potentially interesting physics. The risk with that approach is obvious: you can only find what you were looking for.

The 40-petabyte data problem

CERN's LHC generates approximately 40 petabytes of data per year. To put that in context, it would take roughly 8 million DVDs to store one year of LHC data. If you stacked those DVDs, the pile would be taller than Mount Everest.

The filtering problem is severe. ATLAS and CMS together record around 1,000 events per second after initial hardware triggers reduce the collision rate from 40 million. That sounds manageable until you realize that each recorded event can contain hundreds of thousands of individual detector measurements, and that identifying whether any particular event contains evidence of new physics requires comparing its characteristics against theoretical predictions computed across hundreds of possible signal scenarios.

Traditional physics analysis proceeds like this: theorists propose a model predicting a new particle. Experimentalists design a search optimized for the signature that particle would leave. They apply selection cuts — criteria that filter events based on specific features — and look for a statistical excess above the Standard Model background in the surviving sample. This process takes years and relies on knowing, in advance, what you are looking for.

The problem is that new physics might not look like anything theorists predicted. The history of physics is littered with discoveries that no one anticipated. The muon was so unexpected that Nobel laureate I.I. Rabi famously asked, "Who ordered that?" If a genuinely new phenomenon produces events that do not match any existing theoretical prediction, it would pass through every conventional search undetected — because no search was designed to find it.

This is precisely the gap that AI anomaly detection is designed to fill. Rather than searching for specific signatures, the AI learns what "normal" LHC events look like and flags anything that deviates from that learned baseline. It does not need to know what it is looking for. It needs only to recognize what it has never seen before.

How AI anomaly detection works at CERN

The core idea behind AI anomaly detection at the LHC is model-agnostic new physics searches — searches that do not assume any particular theory about what lies beyond the Standard Model.

The approach works by training a machine learning model on large samples of simulated or recorded collision events that represent the Standard Model expectation. The model learns the statistical distribution of event features: the energies, angles, and identities of detected particles; the patterns of energy deposition across detector layers; the correlations between different decay products. It builds, in essence, a learned model of what physics looks like when nothing new is happening.

Once trained, the model is exposed to real collision data. Events that fit the learned distribution generate low anomaly scores. Events that deviate significantly from the learned distribution — that contain combinations of features the model has not encountered — generate high anomaly scores. Those high-scoring events are the candidates for further investigation. They are the exotic-looking collisions that might signal something the Standard Model cannot explain.

The implementation details matter. Autoencoders are a common architecture for this task: the model learns to compress an event's features into a compact representation and then reconstruct them. Events that the model cannot reconstruct accurately are, by definition, unlike anything it was trained on. Variational autoencoders add a probabilistic layer, allowing the model to assign likelihood scores rather than binary pass/fail judgments. Graph neural networks can model the relational structure of particle showers, capturing not just individual particle properties but how they relate to each other within an event.

What ATLAS and CMS are deploying goes beyond these standard approaches. The experiments are testing AI that operates at multiple stages of the analysis pipeline — not just offline, after events are recorded, but as part of the real-time trigger system that decides which events to save in the first place.

ATLAS and CMS: the two experiments leading the charge

ATLAS and CMS are the two general-purpose detectors at the LHC. Both were built to search for new particles and phenomena across the broadest possible range of energies and event topologies. Their science programs overlap substantially, which is intentional — independent confirmation by two separate experiments is required before any discovery claim can be made.

Both collaborations now have active machine learning programs specifically focused on anomaly detection and new physics searches.

The ATLAS experiment has deployed several unsupervised learning techniques across different physics channels. One active program uses autoencoders to search for anomalous di-jet events — collisions producing two sprays of particles in a back-to-back configuration, which are sensitive to new heavy particles decaying into quarks or gluons. The collaboration has also applied weakly supervised methods, which train on data itself rather than relying entirely on simulation, reducing dependence on potentially imperfect theoretical models.

The CMS experiment has similarly expanded its anomaly detection portfolio. CMS researchers have published results from model-agnostic searches in the di-muon mass spectrum, where the high-resolution CMS muon system makes it particularly sensitive. The collaboration has also applied deep learning to the problem of distinguishing quark-initiated jets from gluon-initiated jets, and has begun testing AI triggers that can retain interesting events without relying on fixed selection criteria.

Both experiments are transparent about the challenge. Anomaly detection at the LHC is technically demanding because the "background" — the Standard Model physics that looks normal — is itself extraordinarily complex. Distinguishing a genuinely exotic event from a rare but expected Standard Model process, or from a detector malfunction, requires the AI to have a deep and accurate model of everything it should consider unremarkable.

Quantum machine learning: QNN and QLSTM models

Among the most technically forward-looking aspects of CERN's AI program is the application of quantum machine learning to particle physics data analysis.

Quantum Neural Networks (QNNs) and Quantum Long Short-Term Memory models (QLSTMs) exploit quantum mechanical properties — superposition and entanglement — to process information in ways that classical neural networks cannot directly replicate. The theoretical appeal for particle physics is clear: quantum systems are the natural language of particle physics. Encoding particle collision data into a quantum circuit may allow the model to capture correlations that a classical model would require exponentially more parameters to represent.

The practical reality is more nuanced. Current quantum hardware is noisy. Quantum circuits with many qubits accumulate errors faster than they can be corrected. Running large QNN or QLSTM models on real LHC data requires either near-term quantum hardware improvements or hybrid approaches that combine classical preprocessing with quantum inference on the most information-dense subcomponents of the event.

CERN researchers are pursuing both paths. Simulation studies have shown that QNNs can match or outperform classical neural networks on certain classification tasks with fewer trainable parameters — a significant advantage for problems where data is limited or computational resources are constrained. QLSTMs, which incorporate quantum circuits into the gating mechanism of recurrent neural networks, are being tested on time-series patterns in detector signals, where the sequential nature of particle shower development makes recurrent architectures naturally appropriate.

The current status of this work is best described as promising but not yet deployed at scale in production searches. The value of quantum approaches for LHC physics will ultimately be determined by whether the performance gains justify the additional implementation complexity, and whether quantum hardware improves fast enough to make them practical within the LHC's operational timeline.

The EPA project: AI running the accelerator itself

Separate from the physics analysis programs, CERN is deploying AI to optimize the operation of the LHC itself through the EPA (Efficient Physics Analysis) project and related machine learning initiatives for accelerator control.

The LHC's performance as a physics instrument depends on more than the quality of its detectors. The accelerator must sustain high-intensity proton beams in stable orbits for hours at a time, during which the beams collide billions of times. Maintaining beam stability requires continuous adjustment of thousands of parameters across the accelerator chain: the strength and alignment of magnets, the radio-frequency systems that accelerate particles, the collimators that protect the machine from beam losses, and the feedback systems that correct for external perturbations.

Historically, these adjustments have been made by operators following established procedures, with some degree of automated feedback from classical control systems. The complexity of the interactions between different subsystems — changing one parameter affects others in ways that can propagate through the entire machine — makes manual optimization slow and suboptimal.

AI models trained on operational data can learn the correlations between different machine parameters and the resulting beam quality metrics. They can suggest parameter adjustments that a human operator might not intuit, predict when beam instabilities are likely to develop before they become problems, and potentially operate the accelerator closer to its theoretical performance limits for longer periods.

The practical stakes are significant. The LHC's physics output is measured in integrated luminosity — essentially, the total number of collisions delivered to the experiments. More collisions mean more data. More data means better statistics. Better statistics are the currency of particle physics: they determine how confidently you can claim a discovery or set a limit on a theory. An AI system that can increase the LHC's effective luminosity by even a few percent delivers a commensurate improvement in the physics output of every experiment on the machine.

What "new physics" would actually mean

When physicists talk about discovering new physics at the LHC, they mean finding evidence of a particle, force, or phenomenon not described by the Standard Model. The range of theoretical candidates is enormous, and most have colorful names that belie their serious implications.

Supersymmetric particles — superpartners of the known particles predicted by supersymmetry — would explain the Higgs boson's mass stability and provide a dark matter candidate. The LHC has been searching for them since it turned on in 2008 and has not found them at the mass ranges initially expected. This does not mean they do not exist, but it has shifted the expectations.

Leptoquarks are hypothetical particles that would couple quarks to leptons — the two main families of matter particles that the Standard Model treats as completely separate. Evidence for leptoquarks would suggest a deeper unified structure beneath the Standard Model.

Dark photons are hypothetical mediators of a "dark sector" that interacts with ordinary matter only very weakly. They are one candidate explanation for the persistent anomalies seen in precision measurements of the muon's magnetic moment, which has repeatedly deviated from Standard Model predictions.

Extra dimensions predict that spacetime has more than four dimensions, with the extra ones compactified at scales accessible to LHC energies. Kaluza-Klein excitations — higher-dimensional echoes of known particles — would appear as heavy resonances decaying into Standard Model particles.

What would any of these discoveries actually mean? At the most immediate level, it would overturn 50 years of theoretical certainty and require physicists to revise the foundational framework of their field. At a practical level, the history of physics suggests that understanding the universe's fundamental structure eventually yields technologies that could not have been anticipated at the time of discovery. Quantum mechanics was not developed for its applications in semiconductors and lasers. It was developed because the photoelectric effect was confusing. The applications came later, after the understanding was established.

A discovery of new physics at the LHC would be the biggest scientific event of the 21st century so far.

Why this moment is different from previous searches

CERN has been searching for new physics since the LHC's first data-taking run in 2010-2012, which produced the Higgs boson discovery. Runs 2 and 3 followed with more collisions at higher energies and no additional fundamental discoveries, despite extensive searches across hundreds of theoretical scenarios.

This period has been frustrating for the community. The "desert" — the theoretical possibility that the Standard Model might hold without modification all the way up to the energy scales at which quantum gravity becomes relevant — has gone from being a fringe concern to a genuine possibility that physicists take seriously.

What is different now is the nature of the search strategy. Previous searches were hypothesis-driven: propose a model, optimize a search for its signature, look for an excess. The exhaustive coverage of motivated theoretical scenarios using this approach is now largely complete. The models that were easiest to detect, those predicting large signals at LHC energies, have been ruled out. What remains either predicts small signals buried in large backgrounds, or predicts signatures that do not match any established search category.

AI anomaly detection is a direct response to this situation. It does not require a hypothesis. It requires only the assumption that new physics, whatever form it takes, will produce events that are statistically unusual relative to the Standard Model background. That assumption is essentially unavoidable — if new physics produced events indistinguishable from Standard Model processes, no experiment could ever detect it regardless of the analysis method used.

The combination of unprecedented data volumes from Run 3 (ongoing through 2026) and the High Luminosity LHC upgrade planned for the late 2020s — which will increase the collision rate by roughly a factor of five — with AI models capable of detecting unusual patterns without preconceptions represents a qualitatively different mode of discovery. The LHC is, for the first time, genuinely searching for the unknown rather than for specific candidates.

The risks of AI-led discovery

Enthusiasm for AI in particle physics needs to be tempered by an honest accounting of the risks.

The most serious is spurious anomalies. Machine learning models can flag events as anomalous for reasons that have nothing to do with new physics. Detector effects, software bugs, unusual environmental conditions, and statistical fluctuations in background processes can all produce event samples that look unusual to a model trained on typical conditions. The history of particle physics includes multiple high-profile "discoveries" that did not survive scrutiny: the 750 GeV diphoton excess that generated hundreds of theoretical papers before evaporating with more data, the 3.8-sigma anomaly in the W boson mass measurement whose interpretation remains contested.

The statistical look-elsewhere effect is particularly relevant for anomaly detection. If you search across thousands of different event features for any unusual pattern, you will find some by chance. Quantifying the significance of an anomaly found through a model-agnostic search requires careful statistical treatment that differs from traditional hypothesis-driven searches.

There is also the question of model dependence. AI anomaly detection is often described as model-agnostic, but this is not quite accurate. The model learns to identify deviations from its training data. If the training data — whether from simulation or from data itself — does not perfectly represent the Standard Model background, the model will identify deviations from its imperfect training distribution rather than from true Standard Model expectations. Systematic biases in detector simulation are a known and ongoing challenge.

Finally, there is a genuine risk that AI flags something interesting that cannot immediately be connected to any theoretical framework. The particle physics community has robust procedures for vetting discovery claims, but those procedures evolved around hypothesis-driven searches. How to correctly interpret and respond to an anomaly found without a hypothesis is an open methodological question.

What comes next for CERN's AI program

The LHC is currently in Run 3, which began in 2022 and is scheduled to continue through 2026. Run 3 operates at a center-of-mass energy of 13.6 TeV, the highest ever achieved in a particle collider. The data sample accumulated during Run 3 will surpass all previous LHC datasets combined.

Following Run 3, the LHC will shut down for several years for the High Luminosity LHC (HL-LHC) upgrade. The HL-LHC is designed to increase the instantaneous luminosity by a factor of five to seven compared to the original design, delivering roughly ten times more collisions over its operational lifetime than all previous LHC runs combined. The data rates will be so high that AI triggers will shift from being supplementary to being essential — classical triggers will be physically unable to process the collision rate without AI-assisted preselection.

CERN is already developing the computing infrastructure to handle this. The upgrade includes new detector systems with faster readout electronics, upgraded data acquisition networks, and a trigger system that incorporates field-programmable gate arrays running neural network inference in real time — making physics-quality AI decisions about which events to save in microseconds, not the milliseconds available to current systems.

The machine learning program at CERN is expanding beyond anomaly detection into other areas: improving detector calibration and alignment, accelerating Monte Carlo simulations (which currently consume the majority of CERN's computing resources), automating quality control of detector operations, and building foundation models trained on LHC data that can be fine-tuned for specific physics tasks.

The convergence of the largest particle physics dataset ever assembled, the most capable AI tools in history, and a physics community increasingly willing to let AI search for things it was not told to look for represents a genuine inflection point. Physicists have been looking for what lies beyond the Standard Model for nearly 40 years. The tools have changed. The data is richer than ever. Whether the universe has hidden something in that data that AI can find, but 50 years of human analysis missed, is the central open question in experimental physics.

If the AI finds it, the discovery will not belong to the AI. It will belong to the scientists who had the insight to ask the right question.

Frequently asked questions

What is the Standard Model and why does CERN want to break it?

The Standard Model is the theoretical framework that describes all known fundamental particles and the forces between them (except gravity). It is extraordinarily accurate but known to be incomplete — it cannot explain dark matter, the matter-antimatter asymmetry in the universe, or the mass of the Higgs boson without fine-tuned parameters. CERN's experiments are designed to find evidence of new particles or phenomena that extend or replace the Standard Model.

Why does CERN need AI to analyze LHC data?

The LHC generates approximately 40 petabytes of data per year. Even after aggressive real-time filtering, the remaining dataset is too large and complex for traditional analysis methods to screen comprehensively for all possible types of new physics. AI models, particularly unsupervised anomaly detection systems, can scan events for unusual patterns without requiring scientists to specify in advance what they are looking for.

What are ATLAS and CMS and why are there two of them?

ATLAS and CMS are the two large general-purpose particle detectors at the LHC. They are independently designed and operated by separate international collaborations. Having two experiments with different detector technologies looking at the same collisions provides cross-checks: a genuine discovery of new physics must appear in both experiments, not just one.

What is quantum machine learning and why is CERN using it?

Quantum machine learning uses quantum computing hardware to perform machine learning computations. Quantum Neural Networks (QNNs) and Quantum Long Short-Term Memory networks (QLSTMs) can, in principle, represent certain patterns with exponentially fewer parameters than classical equivalents. For particle physics, where the underlying processes are inherently quantum mechanical, this approach may offer accuracy advantages. CERN is testing these models, though they remain in the research phase rather than production deployment.

What is the EPA project?

The EPA (Efficient Physics Analysis) project at CERN applies AI to optimize the operation of the LHC accelerator itself — not just to analyze the physics data it produces. The goal is to use AI to maintain higher beam quality for longer periods, which directly increases the number of collisions delivered to experiments and thus improves the statistical power of every physics search.

What would happen if CERN's AI found something that breaks the Standard Model?

The discovery process would follow standard particle physics protocols: the anomaly would be scrutinized by both ATLAS and CMS independently, alternative explanations would be systematically ruled out, and the statistical significance would be assessed against established thresholds (typically 5-sigma, or one chance in 3.5 million of being a fluctuation). If confirmed, it would be the most significant physics discovery since the Higgs boson in 2012 and would trigger an era of theoretical and experimental work to characterize the new phenomenon.

When will the High Luminosity LHC begin operating?

The High Luminosity LHC upgrade is scheduled to begin after Run 3 concludes in 2026, following a multi-year shutdown for installation. The HL-LHC is expected to begin physics operations in the early 2030s, delivering roughly ten times more total collision data than all previous LHC runs combined. AI will be essential to handling the data rates the upgraded machine will produce.

Let's Build Something Together

CERN's AI is now questioning the laws of physics at the LHC

Weekly Newsletter