I really liked this part: In December 2024, during the frenzied adoption of LLM coding assistants,...

homarp • yesterday at 5:14 PM • 6 replies • view on HN

I really liked this part:

In December 2024, during the frenzied adoption of LLM coding assistants, we became aware that such tools tended—unsurprisingly—to produce Go code in a style similar to the mass of Go code used during training, even when there were newer, better ways to express the same idea. Less obviously, the same tools often refused to use the newer ways even when directed to do so in general terms such as “always use the latest idioms of Go 1.25.” In some cases, even when explicitly told to use a feature, the model would deny that it existed. [...] To ensure that future models are trained on the latest idioms, we need to ensure that these idioms are reflected in the training data, which is to say the global corpus of open-source Go code.

Replies

munk-a • yesterday at 6:10 PM

PHP went through a similar effort a while back to just clear out places like Stackoverflow of terrible out of date advice (e.g. posts advocating magic_quotes). LLMs make this a slightly different problem because, for the most part, once the bad advice is in the model it's never going away. In theory there's an easier to test surface around how good the advice it's giving is but trying to figure out how it got to that conclusion and correct it for any future models is arcane. It's unlikely that model trainers will submit their RC models to various communities to make sure it isn't lying about those specific topics so everything needs to happen in preparation of the next generation and relying on the hope that you've identified the bad source it originally trained on and that the model will actually prioritize training on that same, now corrected, source.

➕ show 2 replies

Groxx • yesterday at 7:25 PM

They're particularly bad about concurrent go code, in my experience - it's almost always tutorial-like stuff, over-simplified and missing error and edge case handling to the point that it's downright dangerous to use... but it routinely slips past review because it seems simple and simple is correct, right? Go concurrency is so easy!

And then you point out issues in a review, so the author feeds it back into an LLM, and code that looks like it handles that case gets added... while also introducing a subtle data race and a rare deadlock.

Very nearly every single time. On all models.

➕ show 3 replies

robviren • yesterday at 5:42 PM

I have run into that a lot which is annoying. Even though all the code compiles because go is backwards compatible it all looks so much different. Same issue for python but in that case the API changes lead to actual breakage. For this reason I find go to be fairly great for codegen as the stability of the language is hard to compete with and the standard lib a powerful enough tool to support many many use cases.

BiraIgnacio • yesterday at 6:32 PM

I definitely see that with C++ code Not so easy to "fix", though. Or so I think. But I do hope still, as more and more "modern" C++ code gets published

HumblyTossed • yesterday at 6:02 PM

The use of LLMs will lead to homogeneous, middling code.

➕ show 8 replies

dakolli • yesterday at 9:29 PM

I'd prefer we start nuking the idea of using LLMs to write code, not help it get better. Why don't you people listen to Rob Pike, this technology is not good for us. Its a stain on software and the world in general, but I get it most of ya'll yearn for slop. The masses yearn for slop.

➕ show 2 replies

alt Hacker News

Replies