I’ll try to steelman this comment. Anyone who uses coding tools knows that the output is heavily affected by details of the task you give it. The same model can give you garbage code or genius code for the same problem with slightly different framing. So it’s not necessarily a limitation in the model’s training that causes it to output security bugs. The model might be great at writing secure code, but you need a different harness to elicit that behavior.
Counterargument: just because the problem can be fixed without training, doesn’t mean training isn’t a possible solution.
I guess I thought this should be obvious to everyone but, looking at code and finding exploits is completely different from .. writing exploits.
For one thing exploits often require completely different parts of the code to chain together. Sometimes parts of code the LLM itself isn’t writing.
And, LLMs are ALREADY trained negatively against writing buggy or exploitable code.
In every prompt: "write me code without exploitable bugs".
I know it doesn't work so easily as someone who uses AI for coding, but I do find repetition of basics in almost every prompt keeps the AI focused.
Counter-counter-argument: for LLMs, tokens are units of thinking. And token use is, on the margin, directly proportional to costs of inference. So while the details of the harness, and how you prompt the model, and nature of the code and docs you put in context, etc. all matter to the quality of output you get from LLM coding tool, ultimately, there's always a ceiling to how much you're willing to spend on solving a problem - say, no more than 30 minutes, or $10, on refactoring a target module or implementing a small feature - and that puts a limit on how much thinking the model can put into it.
Thing is, writing secure and efficient and readable and simple code is in many cases fundamentally over that limit. It's possible, but you can't afford (or rationally just don't want) to spend as much on it as it's required for superhuman quality on all these aspects. Also most of the time, you don't want to operate at a limit - you probably expected that feature to take 30 seconds and less than $1 to implement. So you choose, both what the model optimizes for, and how much.
Because of that, no matter how good the model and the harness and the prompting are, $10 spent on coding is still bound to leave behind some security vulnerabilities that subsequent $10 spent on security review will find (especially with a model post-trained for that, at expense of general performance).