logoalt Hacker News

sarchertechyesterday at 6:56 PM3 repliesview on HN

Translating from a natural language spec to code involves a truly massive amount of decision making.

For a non trivial program, 2 implementations of the same natural language spec will have thousands of observable differences.

Where we are today, that is agents require guardrails to keep from spinning out, there is no way to let agents work on code autonomously that won’t end up with all of those observable differences constantly shifting, resulting in unusable software.

Tests can’t prevent this because for a test suite to cover all observable behavior, it would need to be more complex than the code. In which case, it wouldn’t be any easier for machine or human to understand.

The only solution to this problem is that LLMs get better. Personally I think at the point they can pull this off, they can do any white collar job, and there’s not point in planning for that future because it results in either Mad Mad or Star Trek.


Replies

wtallisyesterday at 11:36 PM

> Tests can’t prevent this because for a test suite to cover all observable behavior, it would need to be more complex than the code. In which case, it wouldn’t be any easier for machine or human to understand.

I don't think "complex" is the right word here. A test suite would generally be more verbose than the implementation, but a lot of the time it can simply be a long list of input->output pairs that are individually very comprehensible and easily reviewable to a human. The hard part is usually discovering what isn't covered by the test case, rather than validating the correctness of the test cases you do have.

show 3 replies
Herringyesterday at 7:56 PM

Agreed, but with one exception: are tests supposed to cover all observable behavior? Usually people are happy with just eliminating large/easy classes of bad (unintended) behavior, otherwise they go for formal verification which is an entirely different ballgame.

show 1 reply
logicchainsyesterday at 7:15 PM

>For a non trivial program, 2 implementations of the same natural language spec will have thousands of observable differences.

If they're not defined in the spec then these differences shouldn't matter, they're just implementation details. And if they do matter, then they should be included in the spec; a natural language spec that doesn't specify some things that should be specified is not a good spec.

show 2 replies