It makes me wonder that despite the fast improvements in model capacity (and the claims) we're still using variations on a 9-year old architecture. How is it that we haven't been able to use LLMs to actually improve that?
[flagged]
[flagged]