I agree. I also think it's about the hardware and, obviously, recognizing AD as the fundamental primitive.
Particular architectures don't matter so much yet. It's quite possible that S3-Mamba or xLSTM could be used in lieu of transformers and we would still have LLMs.