Does anyone know why they are using language models instead of a more purpose-built statistical model? My intuition is that a language model would either be overfit, or its training data would have a lot of noise unrelated to the application and significantly drive up costs.
This might be some journalistic confusion. If you go to the CERN documentation at https://twiki.cern.ch/twiki/bin/view/CMSPublic/AXOL1TL2025 it states
> The AXOL1TL V5 architecture comprises a VICReg-trained feature extractor stacked on top of a VAE.
… they’re not? Who said they are? The article even explicitly says they’re not?
It's not an LLM, it is a purpose built model. https://arxiv.org/html/2411.19506v1
5 years ago we would've called it a Machine Learning algorithm. 5 years before that, a Big Data algorithm.