> A general pattern for LLMs is that they look really good at things you are bad at. This is tr...

ryandrake • yesterday at 10:07 PM • 17 replies • view on HN

> A general pattern for LLMs is that they look really good at things you are bad at.

This is true for coding, too, which I think, to a large degree, might explain the polarized differences in opinions on HN about the quality of LLM-produced code. You have the 1. "AI produces code better than I could possibly write, one shots things it would take me days to do, and has made me 10X more productive!" camp, and you have the 2. "AI constantly produces poor code needing rework, makes mistakes, has to be babysat, and ultimately costs me time!" camp, with a spectrum in between those. How could the output of the same product be seen so differently? Well, I have bad news for camp 1...

Replies

OhSoHumble • yesterday at 10:57 PM

I've caught Claude Code generating some pretty egregious security vulnerabilities. I'm using it to build an AI RPG site and the goal is to use web assembly as a bridge between author submitted code and LLMs in order to help shore up state management at the game level.

The language that I picked for the game runtime is Python. Claude really thought that the best way to validate user submitted Python was to bypass the WASM sandbox and execute it within the application container using shell exec - essentially opening up an RCE vulnerability.

I also find that the quality of Claude Code degrades substantially. Claude really wants to implement every feature in as bespoke way as possible. This is fine when you first generate the project but over time you'll find that every web modal is implemented differently. Every button is different. Business logic is disconnected. It's why agentically produced codebases are MUCH larger than they should be; every feature is developed in a vacuum.

Then I'm trying to shove stuff in my AGENTS.md or CLAUDE.md files like "ALWAYS look for existing patterns within the codebase to keep it consistent." But the harness doesn't always work and it'll generate useless, verbose code anyways.

In some cases it's useful - like if I am shaky on the DSA knowledge needed for a specific operation or optimization then Claude can replace Stackoverflow. But, man, I'm so frustrated with it.

kybernetikos • yesterday at 10:15 PM

I think there are some factors beyond just skill too - the kinds of tasks you're giving the AI, and how involved you are in ensuring the output is good (via either extensive planning guidance, extensive review/testing, or a combination).

kiba • yesterday at 10:38 PM

I used LLM to teach me how to code and get through obstacles that would have me spending a lot of time doing ???. Typically, I just write code that I know a lot of time is absolutely wrong but the LLM helpfully point out mistakes.

I am slowly doing more of my own code and cutting out the LLM out of the loop in the unfamiliar territory I am working in.

My main concern is not so much productivity but understanding the code I have written and feeling agency over it.

The LLM is a very good teacher.

onion2k • yesterday at 10:31 PM

Well, I have bad news for camp 1..

It's bad if they work in a part of the industry where code quality or efficiency matters. That's maybe 10% of the total though.

➕ show 2 replies

YZF • today at 5:22 AM

I disagree this is the source of the polarization. Maybe it's part of it.

I have been coding since about 1983 or so. I shipped high quality products that have been used by millions of people. From embedded software to desktop applications to distributed systems.

I don't think I'm in the "don't understand what code should look like camp" (I mean you never know but the evidence seems to show that I do know what I'm doing). I use AI as a tool and it helps me be more productive. I don't "one shot things that would take me days to do". I use it to help me automate things that I could do manually where it is faster and more effective. I review every step and if I don't like something I adjust. There are some specific situations where it basically does as good a job as I would do in running some experiments, doing some analysis or writing some small amount of code. I still know what the changes need to look like broadly, where to make them, and what patterns to follow. It just automates the work and sometimes does have some additional insight that can complement my views. Unlike me it is all knowing about everything in terms of access to "knowledge". It knows all the details of how a certain runtime manages memory, Linux internals and various open source software. I could go look it up myself (which I'd do before AI) but I don't hold it all in my head like AI basically does. It is also "all knowing" in the code base I work in (more so than me, it's a huge code base, I have an outline and a high level picture in my head but not every single code line) where again I can dive into the code but it helps me extract the relevant information faster.

I think the polarization is more on the how you use the tool, what situations you use the tool for, which domain are you operating in (languages, applications etc.). You can also one-shot simple tools and helpers that are not the production software which is another way to accelerate your workflow.

➕ show 1 reply

overgard • today at 2:24 AM

Yup, pretty much.

The hard part too is it's not like you can just learn the basics and be able to tell good code apart from bad -- the more you learn to code, the more intricate your understanding of good code is. It's like becoming a good writer; just knowing grammar and spelling doesn't make your writing interesting. Not to mention that there's just a lot of bad advice out there that you can't recognize as bad advice if you're not a regular practitioner. Like, "Clean Code" is IMO a terrible book, but a ton of people follow it because it has the sheen of respectability.. until, hopefully, they learn some new patterns and realize those old ones aren't very good. But you pick these things up with experience and doing the work! Otherwise if you're just reading other peoples opinions, you'll see a bunch of people say "Clean Code is great" and a bunch of other people say it's rubbish, and you'll have no way to know who you should listen to. (If you disagree with me on Clean Code the book that's fine -- I'm just using it to make a point -- sub in a different book/ideology if it suits you)

I think looking at an LLM code and thinking you're now a coder is like watching a someone play guitar and think you can just pick up a guitar and play a song. The truth is, if you want to be good, you have to do the work.

One of the things I hate about AI is that we're going to have a generation of "programmers" that are absolutely shit at programming, create problems for everyone else, and will have absolutely no idea how bad they are. And they'll probably never get better, because you can't get better by just asking claude to do shit for you. And then the LLMs themselves will probably start to degrade because they'll be trained on the slop since it'll heavily outnumber handwritten code..

➕ show 1 reply

sebastianmestre • today at 10:22 AM

Is this really a split that exists?

In my case I see Claude produce code much worse than I would, but it's certainly much quicker and, even after reworking, it makes me finish tasks in less time.

wzdd • today at 5:11 AM

It really depends. If you're cranking out prototypes or testing ideas, it's genuinely great. But if you're familiar with the code it's very easy to spot its (many) mistakes. It's Gell-Mann amnesia.

Then again, I just caught Claude writing setTransparent(!opaque == false), opaque being a bool, on a purely vibecoded project. Which was pretty impressive. ("• You're right, that's nonsense.")

theshrike79 • today at 9:11 AM

The difference between prose, art and code is that we can define "good code" with deterministic tools. Not perfectly, but to a large degree.

koonsolo • today at 5:15 AM

I'm in camp 3, where sometimes I don't really care how good or bad the code is. For internal tools for example, you can let the LLM crunch out code really fast, you can validate output but don't even have to look at the code. These kind of "weekend projects" can get finished in an hour or two, and so are really 10x.

For bigger production ready code, you indeed have to guard the architecture. But for the code, in some corners you can get away with sloppy code, as long as it kind of works.

What I'm saying is, code doesn't always has to be great. You will just have to judge the places where it needs to be high quality, and other places where you can get away with sloppy code.

➕ show 1 reply

fluidcruft • yesterday at 11:05 PM

There's a third camp between these extremes who is like "goddamn it just type this shit out for me so I don't have to do it myself".

➕ show 2 replies

tempest_ • yesterday at 10:24 PM

It isnt though.

The industry largely has selected for camp 1 long ago.

If you don't get immediate negative feedback camp 1 can go quite a ways before problems surface.

➕ show 1 reply

soulofmischief • today at 12:50 AM

LLMs can generate code, but the quality of the code at scale is just not there currently by all important metrics such as security, maintainability, separation of concerns, etc.

Today, it's a kind of chaos magic wherein you summon the beast and try your best to contain him, knowing that someone will probably die in the process. Sometimes literally. It's still a force multiplier in the right hands and domain, and agentic coding is a paradigm that won't retract, at least until something better supplants it.

The problem is that few engineers actually have the discipline available to constrain these models appropriately and instead rely on a hodgepodge network of "skills" aka prompt fragments which are passed around and glued together.

I consider myself as having such discipline, being strongly architecturally-minded, user-first, etc. in both design and implementation. And I still struggle to contain the beast many days. I just got through screaming at Claude for intentionally taking a shortcut that I'd forbidden, leading to a ton of wasted time and tokens.

Sometimes I feel like I saved weeks of R&D with a single ten-minute task handed off to an agent, other times I feel like I'd get better returns playing slots in Vegas at the alarming rate Claude burns through money.

observationist • yesterday at 10:57 PM

I, for one, welcome our new AI overlords. They provide me with only the finest Gell-Mann amnesia, straight from the tap.

petesergeant • today at 10:00 AM

I think this is a straw man.

> the polarized differences in opinions on HN about the quality of LLM-produced code

Are there strong differences of opinion about the quality? I've seen very few people claim that LLMs write better code than they do.

> one shots things it would take me days to do, and has made me 10X more productive

This is an entirely different claim from the former, and you're conflating them.

The boost from LLM-assisted code isn't _expertise_, it's the power of having an always-on team of reasonable junior developers from every discipline you can possibly imagine willing to do your whim.

Take for example Jesse Vincent / obra[0], who is an exceptional developer, with great taste, and a stack of well-received open-source software to his name. He posts a lot on how he's being made more productive by AI-assisted development. Do you have bad news for him about the quality of his work...?

0: https://en.wikipedia.org/wiki/Jesse_Vincent

scruple • today at 1:49 AM

[dead]

bitwize • yesterday at 11:09 PM

Eric S. Raymond has basically stopped writing code by hand altogether. He consistently delivers high quality code without intervening to fix the LLM's output himself, much faster than he would have been able to alone. This is very bad news for camp 2 because it means one of three things:

1) he is extraordinarily lucky

2) he is extraordinary brilliant at manipulating LLMs

3) you really are "holding it wrong" and you are hobbling yourself with your failure to properly learn the tools

The first two seem rather unlikely.

➕ show 6 replies

alt Hacker News

Replies