They might be converging somewhat. The ultimate limiting factor is training data. Eventually I think they will converge and then the competition will be on memory and compute efficiency, with the best being the smallest maximally capable model.