It seems like they’ve been optimising their models for coding. That’s what the benchmarks used in the article suggest at least.