No they are clearly not just scaled up versions of gpt 2; there are different LLM architectures like mixture of experts etc that appeared relatively recently. I am not an expert though, far from it.
MoE and such are basically performance enhancements, they don't make the model smarter.
MoE and such are basically performance enhancements, they don't make the model smarter.