I would claim that LLMs desperately need proprietary code in their training, before we see an...

nomel • yesterday at 11:43 PM • 4 replies • view on HN

I would claim that LLMs desperately need proprietary code in their training, before we see any big gains in quality.

There's some incredible source available code out there. Statistically, I think there's a LOT more not so great source available code out there, because the majority of output of seasoned/high skill developers is proprietary.

To me, a surprising portion of Claude 4.5 output definitely looks like student homework answers, because I think that's closer to the mean of the code population.

Replies

bearjaws • today at 1:59 AM

I will say many closed source repos are probably equally as poor as open source ones.

Even worse in many cases because they are so over engineered nobody understands how they work.

➕ show 1 reply

bhadass • today at 12:16 AM

yeah, but isn't the whole point of claude code to get people to provide preference data/telemetry data to anthropic (unless you opt out?). same w/ other providers.

i'm guessing most of the gains we've seen recently are post training rather than pretraining.

➕ show 1 reply

typ • today at 1:50 AM

I'd bet, on average, the quality of proprietary code is worse than open-source code. There have been decades of accumulated slop generated by human agents with wildly varied skill levels, all vibe-coded by ruthless, incompetent corporate bosses.

➕ show 4 replies

andai • yesterday at 11:56 PM

Let's start with the source code for the Flash IDE :)

alt Hacker News

Replies