logoalt Hacker News

heikkilevantoyesterday at 10:33 AM5 repliesview on HN

If we consider the prompts and LLM inputs to be the new source code, I want to see some assurance we get the same results every time. A traditional compiler will produce a program that behaves the same way, given the same source and options. Some even go out of their way to guarantee they produce the same binary output, which is a good thing for security and package management. That is why we don't need to store the compiled binaries in the version control system.

Until LLMS start to get there, we still need to save the source code they produce, and review and verify that it does what it says on the label, and not in a totally stupid way. I think we have a long way to go!


Replies

afavouryesterday at 2:32 PM

> If we consider the prompts and LLM inputs to be the new source code, I want to see some assurance we get the same results every time.

There’s a related issue that gives me deep concern: if LLMs are the new programming languages we don’t even own the compilers. They can be taken from us at any time.

New models come out constantly and over time companies will phase out older ones. These newer models will be better, sure, but their outputs will be different. And who knows what edge cases we’ll run into when being forced to upgrade models?

(and that’s putting aside what an enormous step back it would be to rent a compiler rather than own one for free)

show 2 replies
energy123yesterday at 11:24 AM

Greedy decoding gives you that guarantee (determinism). But I think you'll find it to be unhelpful. The output will still be wrong the same % of the time (slightly more, in fact) in equally inexplicable ways. What you don't like is the black box unverifiable aspect, which is independent of determinism.

show 3 replies
properbrewyesterday at 12:48 PM

> I want to see some assurance we get the same results every time

Genuine question, but why not set the temperature to 0? I do this for non-code related inference when I want the same response to a prompt each time.

show 2 replies
pjmlpyesterday at 10:43 AM

Anyone doing benchmarks with managed runtimes, or serverless, knows it isn't quite true.

Which is exactly one of the AOT only, no GC, crowds use as example why theirs is better.

show 3 replies
aurareturnyesterday at 11:37 AM

> If we consider the prompts and LLM inputs to be the new source code, I want to see some assurance we get the same results every time.

Give a spec to a designer or developer. Do you get the same result every time?

I’m going to guess no. The results can vary wildly depending on the person.

The code generated by LLMs will still be deterministic. What is different is the product team tools to create that product.

At a high level, does using LLMs to do all or most of the coding ultimately help the business?

show 2 replies