So, if I understand correctly, this is about finding the optimal (or at least a better one) GPT architecture?
Anyway, "1980 experiments, 6 improvements" makes me wonder if this is better than a random search or some simple heuristic.