Still worth it imho for important code, but it shows that they are hitting a ceiling while trying to improve the model which they try to solve by making it more token-inefficient.