logoalt Hacker News

throwup238yesterday at 5:39 PM2 repliesview on HN

> Although it doesn't really matter much. All of the open weights models lately come with impressive benchmarks but then don't perform as well as expected in actual use. There's clearly some benchmaxxing going on.

Agreed. I think the problem is that while they can innovate at algorithms and training efficiency, the human part of RLHF just doesn't scale and they can't afford the massive amount of custom data created and purchased by the frontier labs.

IIRC it was the application of RLHF which solved a lot of the broken syntax generated by LLMs like unbalanced braces and I still see lots of these little problems in every open source model I try. I don't think I've seen broken syntax from the frontier models in over a year from Codex or Claude.


Replies

algorithm314yesterday at 5:43 PM

Can't they just run the output through a compiler to get feedback? Syntax errors seem easier to get right.

show 2 replies
ej88yesterday at 5:49 PM

the new meta is purchasing rl environments where models can be self-corrected (e.g. a compiler will error) after sft + rlhf ran into diminishing returns. although theres still lots of demand for "real world" data for actually economically valuable tasks