If we consider the prompts and LLM inputs to be the new source code, I want to see some assurance we get the same results every time. A traditional compiler will produce a program that behaves the same way, given the same source and options. Some even go out of their way to guarantee they produce the same binary output, which is a good thing for security and package management. That is why we don't need to store the compiled binaries in the version control system.
Until LLMS start to get there, we still need to save the source code they produce, and review and verify that it does what it says on the label, and not in a totally stupid way. I think we have a long way to go!
The intermediate product argument is the strongest point in this thread. When we went from assembly to C, the debugging experience changed fundamentally. When we went from C to Java, how we thought about memory changed. With LLMs, I'm still debugging the same TypeScript and Python I was before.
The generation step changed. The maintenance step didn't. And most codebases spend 90% of their life in maintenance mode.
The real test of whether prompts become a "language" is whether they become versioned, reviewed artifacts that teams commit to repos. Right now they're closer to Slack messages than source files. Until prompt-to-binary is reliable enough that nobody reads the intermediate code, the analogy doesn't hold.
Isn't this a little bit of a category error? LLMs are not a language. But prompts to LLMs are written in a language, more or less a natural language such as English. Unfortunately, natural languages are not very precise and full of ambiguity. I suspect that different models would interpret wordings and phrases slightly differently, leading to behaviors in the resulting code that are difficult to predict.
I would like to hijack the "high level language" term to mean dopamine hits from using an LLM.
"Generate a Frontend End for me now please so I don't need to think"
LLM starts outputting tokens
Dopamine hit to the brain as I get my reward without having to run npm and figure out what packages to use
Then out of a shadowy alleyway a man in a trenchcoat approaches
"Pssssttt, all the suckers are using that tool, come try some Opus 4.6"
"How much?"
"Oh that'll be $200.... and your muscle memory for running maven commands"
"Shut up and take my money"
----- 5 months later, washed up and disconnected from cloud LLMs ------
"Anyone got any spare tokens I could use?"
The article starts with a philosophically bad analogy in my opinion. C-> Java != Java -> LLM because the intermediate product (the code) changed its form with previous transitions. LLMs still produce the same intermediate product. I expanded on this in a post a couple months back:
https://www.observationalhazard.com/2025/12/c-java-java-llm....
"The intermediate product is the source code itself. The intermediate goal of a software development project is to produce robust maintainable source code. The end product is to produce a binary. New programming languages changed the intermediate product. When a team changed from using assembly, to C, to Java, it drastically changed its intermediate product. That came with new tools built around different language ecosystems and different programming paradigms and philosophies. Which in turn came with new ways of refactoring, thinking about software architecture, and working together.
LLMs don’t do that in the same way. The intermediate product of LLMs is still the Java or C or Rust or Python that came before them. English is not the intermediate product, as much as some may say it is. You don’t go prompt->binary. You still go prompt->source code->changes to source code from hand editing or further prompts->binary. It’s a distinction that matters.
Until LLMs are fully autonomous with virtually no human guidance or oversight, source code in existing languages will continue to be the intermediate product. And that means many of the ways that we work together will continue to be the same (how we architect source code, store and review it, collaborate on it, refactor it, etc.) in a way that it wasn’t with prior transitions. These processes are just supercharged and easier because the LLM is supporting us or doing much of the work for us."
IDK how everyone else feel about it, but a non-deterministic “compiler” is the last thing I need.
The problem is you still have to type prompts. That might require less word-count, but you still have to type it up, and it won't be short. For a small code base, your llm code might be a couple of pages, but for a complex code base it might be the size of medium-length novel.
In the end, you have text typed by humans, that is lengthy. and it might contain errors in logic, contradictions, unforeseen issues in the instructions. And the same processes and tooling used for syntactic code might need to apply to it. You will need to version control your prompts for example.
LLMs solve the labor problem, not the management problem. You have to spend a lot of time and effort with pages and pages of LLM prompts, trying to figure out which part of the prompt is generating which part of your code base. LLMs can debug and troubleshoot, but they can't debug and troubleshoot your prompts for you. I doubt they can take their own output, generated by multiple agents and lots of sessions and trace it all back to what text in your prompt caused all the mess either.
On one hand, I want to see what this experimentation will yield, on the other hand, it had better not create a whole suite of other problems to solve just to use it.
My confusion really is when experienced programmers advocate for this stuff. Actually typing in the code isn't very hard. I like the LLM-assistance aspect of figuring out what to actually code, and do some research. But actually figure out what code to type in, sure LLMs save time, but not that much time. getting it to work, debugging, troubleshooting, maintaining, those tend to be the pain-points.
Perhaps there are shops out there that just crank out lots of LoC, and even measure developer performance based on LoC? I can see where this might be useful.
I do think LLM-friendly high-level languages need to evolve for sure. But the ideal workflow is always going to be a co-pilot type of workflow. Humans researching and guiding the AI.
Psychologically, until AI can maintain it's own code, this is a really bad idea. Actually typing out the code is extremely important for humans to be able to understand it. Or if someone wrote the code, you have to write something that is part of that code base and figure out how things fit together, AI can't do that for you, if you're still maintaining the codebase in any capacity.
I have a source file of a few hundred lines implementing an algorithm that no LLM I've tried (and I've tried them all) is able to replicate, or even suggest, when prompted with the problem. Even with many follow up prompts and hints.
The implementations that come out are buggy or just plain broken
The problem is a relatively simple one, and the algorithm uses a few clever tricks. The implementation is subtle...but nonetheless it exists in both open and closed source projects.
LLMs can replace a lot of CRUD apps and skeleton code, tooling, scripting, infra setup etc, but when it comes to the hard stuff they still suck.
Give me a whiteboard and a fellow engineer anyday
One thing I think the “LLM as new high-level language” framing misses is the role of structure and discipline. LLMs are great at filling in patterns, but they struggle with ambiguity, the exact thing we tolerate in human languages.
A practical way to get better results is to stop prompting with prose and start providing explicit models of what we want. In that sense, UML-like notations can act as a bridge between human intent and machine output. Instead of:
“Write a function to do X…”
we give:
“Here’s a class diagram + state machine; generate safe C/C++/Rust code that implements it.”
UML is already a formal, standardized DSL for software structure. LLMs have no trouble consuming textual forms (PlantUML, Mermaid, etc.) and generating disciplined code from them. The value isn’t diagrams for humans but constraining the model’s degrees of freedom.
One of the reasons we have programming languages is they allow us to express fluently the specificity required to instruct a machine.
For very large projects, are we sure that English (or other natural languages) are actually a better/faster/cheaper way to express what we want to build? Even if we could guarantee fully-deterministic "compilation", would the specificity required not balloon the (e.g.) English out to well beyond what (e.g.) Java might need?
Writing code will become writing books? Still thinking through this, but I can't help but feel natural languages are still poorly suited and slower, especially for novel creations that don't have a well-understood (or "linguistically-abstracted") prior.
After working with the latest models I think these "it's just another tool" or "another layer of abstraction" or "I'm just building at a different level" kind of arguments are wishful thinking. You're not going to be a designer writing blueprints for a series of workers to execute on, you're barely going to be a product manager translating business requirements into a technical specification before AI closes that gap as well. I'm very convinced non-technical people will be able to use these tools, because what I'm seeing is that all of the skills that my training and years of experience have helped me hone are now implemented by these tools to the level that I know most businesses would be satisfied by.
The irony is that I haven't seen AI have nearly as large of an impact anywhere else. We truly have automated ourselves out of work, people are just catching up with that fact and the people that just wanted to make money from software can now finally stop pretending that "passion" for "the craft" was every really part of their motivating calculus.
>Following this hypothesis, what C did to assembler, what Java did to C, what Javascript/Python/Perl did to Java, now LLM agents are doing to all programming languages.
What did Javascript/Python do to Java? They are not interchangeable nor comparable. I don't think Federico's opinion is worth reading further.
A novice prefers declarative control, an expert prefers procedural control
Beginner programmers want: "make this feature"
Experienced devs want: control over memory, data flow, timing, failure modes
That is why abstractions feel magical at first and suffocating later which sparks this whole debate.
At this point I'm just waiting for people claiming they managed team of 20 people, where "20 people" were LLMs being fed a prompt
The biggest and least controversial thing will be when anthropic create a onedrive/googledrive integration that lets white collar employees create, edit, and export word documents into pdfs, referring to other files in the folder. This alone will increase average white paper employee productivity by 100x and lead to the most job displacement.
For instance: Here is an email from my manager at 1pm today. Open the policy document he is referring to, create a new version, and add the changes he wants. refer to the entire codebase (our company onedrive/google drive/dropbox whatever) to make sure it is contextually correct.
>Sure, here is the document for your review
Great, reply back to manager with attachment linked to OneDrive
There's nothing novel in this article. This is what every other AI clickbait article is propagating.
I want to see an example of the application with well written documentation which produces well working application based on those docs.
I discovered that it is not trivial to conceptualize app to that extent of clarity which is required for deterministic output of LLM. It's way easier to say than to actually implement by yourself (that's why examples are so interesting to see).
Backwards dynamics when you get spec/doc based on the source code does not work good enough.
The US military loves zillion-page requirements documents. Has anyone (besides maybe some Ph.Dork at DARPA) tried feeding a few to coder LLMs to generate applications - and then thrown them at test suites ?
I’m not sure I buy this. GPT-5.2 codex still makes design errors that I as an engineer have to watch and correct. The only way I know how to catch it and then steer the model towards a correction is to be able to read the code and write some code into the prompt. So one can’t abstract programming language away through an agent…
There's a reason we distinguish between programmers and managers; if "LLMs are just the new high level language", then a manager is just another programmer operating on a level above code. I mean, sure, we can say that, but words become kind of meaningless at this point.
I wouldn't call it the new high level language. It's a new JIT. But that's not doing it justice. There's a translation of natural language to a tokenizer and processor that's akin to the earliest days of CPUs. It's a huge step change from punch cards. But there's also a lot to learn. I think we will eventually develop a new language that's more efficient for processing or multiple layers of transformers. Tbh Google is leapfrogging everyone in this arena and eventually we're going to more exotic forms of modelling we've never seen before except in nature. But from an engineering perspective all I can see right now is a JIT.
This is a good summary of any random week's worth of AI shilling from your LinkedIn feed, that you can't get rid of.
If we can treat the prompts as the versionable source code artefact, then sure. But as long as we need to fine-tune the output that's not a high level language. In the same way no one would edit the assembly that a compiler produces
Can we stop repeating this canard, over and over?
Every "classic computing" language mentioned, and pretty much in history, is highly deterministic, and mind-bogglingly, huge-number-of-9s reliable (when was the last time your CPU did the wrong thing on one of the billions of machine instructions it executes every second, or your compiler gave two different outputs from the same code?)
LLMs are not even "one 9" reliable at the moment. Indeed, each token is a freaking RNG draw off a probability distribution. "Compiling" is a crap shoot, a slot machine pull. By design. And the errors compound/multiply over repeated pulls as others have shown.
I'll take the gloriously reliable classical compute world to compile my stuff any day.
Hi HN! OP here. Thanks for reading and commenting -- (and @swah for posting!). It's unsettling to hit the HN front page, and even more so with an article that I hastily wrote down. I guess you never know what's going to hit a nerve.
Some context: I'm basically trying to make sense of the tidal wave that's engulfing software development. Over the last 2-3 weeks I've realized that LLMs will start writing most code very soon (I could be wrong, though!). This article is just me making sense of it, not trying to convince anybody of anything (except of, perhaps, giving the whole thing a think). Most of the "discarded" objections I presented in the list were things I espoused myself over the past year. I should have clarified that in the article.
I (barely) understand that LLMs are not a programming language. My point was that we could still think of them as a "higher level programming language", despite them 1) not being programming languages; 2) being wildly undeterministic; 3) also jumping levels by them being able to help you direct them. This way of looking at the phenomenon of LLMs is to try to see if previous shifts in programming can explain at least partially the dynamics we are seeing unfold so quickly (to find, in Ray Dalio's words, "another kind of those").
I am stepping into this world of LLM code generation with complicated feelings. I'm not an AI enthusiast, at least not yet. I love writing code by hand and I am proud of my hand-written open source libraries. But I am also starting to experience the possibilities of working on a higher level of programming and being able to do much more in breadth and depth.
I fixed an important typo - here I meant: "Economically, only quality is undisputable as a goal".
Responding to a few interesting points:
@manuelabeledo: during 2025 I've been building a programming substrate called cell (think language + environment) that attempts to be both very compact and very expressive. Its goal is to massively reduce complexity to turn general purpose code more understandable (I know this is laughably ambitious and I'm desperately limited in my capabilities of pulling through something like that). But because of the LLM tsunami, I'm reconsidering the role of cell (or any other successful substrate): even if we achieve the goal, how will this interact with a world where people mostly write and validate code through natural language prompts? I never meant to say that natural language would itself be this substrate, or that the combination of LLMs and natural languages could do that: I still see that there will be a programming language behind all of this. Apologies for the confusion.
@heikkilevanto & @matheus-rr: Mario Zechner has a very interesting article where he deals with this problem (https://mariozechner.at/posts/2025-06-02-prompts-are-code/#t...). He's exploring how structured, sequential prompts can achieve repeatable results from LLMs, which you still have to verify. I'm experimenting with the same, though I'm just getting started. The idea I sense here is that perhaps a much tighter process of guiding the LLM, with current models, can get you repeatable and reliable results. I wonder if this is the way things are headed.
@woodenchair: I think that we can already experience a revolution with LLMs that are not fully autonomous. The potential is that an engineering-like approach to a prompt flow can allow you to design and review (not write) a lot more code than before. Though you're 100% correct that the analogy doesn't strictly hold until we can stop looking at the code in the same way that a js dev doesn't look at what the interpreter is emitting.
@nly: great point. The thing is that most code we write is not elegant implementations of algorithms, but mostly glue or CRUDs. So LLMs can still broadly be useful.
I hope I didn't rage bait anybody - if I did, it wasn't intentional. This was just me thinking out loud.
If the LLM is a high level language, then why aren't we saving the prompts in git?
Last I checked with every other high level language, you save the source and then rerun the compiler to generate the artifact.
With LLMs you throw away the 'source' and save the artifact.
Programming with LLMs is fundamentally different than going from a lower-level to a higher-level language, even apart from the whole non-determinism thing. With a programming language, you're still writing a for-loop, whether that's in C, Java or Rust. There's language primitives that help you think better in certain languages, but they're still, at the end of the day, code and context that you have to hold in your head and be intimately familiar with.
That changes with LLMs. For now, you can use LLMs to help you code that way; a programming buddy whose code you review. That's soon going to become "quaint" (to quote the author) given the projected productivity gains of agents (and for many developers it already has).
This is an exaggeration, if you store the prompt that was "compiled" by today's LLMs there is no guarantee that in 4 months from now you will be able to replicate the same result.
I can take some C or Fortran code from 10 years ago, build it and get identical results.
Paradigm shift ahead, folks. What I observe in the comments—often more compelling than the article itself—is the natural tension within the scientific community surrounding the 'scientific method,' a debate that's been playing out for... what, a year now? Maybe less? True, this isn't perfect, nor does it come with functionality guarantees. Talking about 10x productivity? That's relative—it hinges on the tool, the cultural background of the 'orchestra conductor,' or the specific, hands-on knowledge accumulated by the conductor, their team, organization, and even the target industry.
In essence: we're witnessing a paradigm shift. And for moments like these—I invite you—it's invaluable to have studied Popper and Kuhn in those courses.
An even more provocative hypothesis: the 'Vienna Circle' has morphed into the 'Circle of Big Tech,' gatekeepers of the data. What's the role of academia here? What happened to professional researchers? The way we learn has been hijacked by these brilliant companies, which—at least this time—have a clear horizon: maximizing profits. What clear horizon did the stewards of the scientific method have before? Wasn't it tainted by the enunciator's position? The personal trajectory of the scientist, the institution (university) funding them? Ideology, politics?
This time, it seems, we know exactly where we're headed.
(This comment was translated from Spanish, please excuse the rough edges)
And sooner or later it will happen, imho. With probabalistic compiling. And several "prompts/agents" under the hood. The majority of "replies" wins to compile. Of course good context will contribute to better refined probability.
Ask yourself "Computer memory and disk are also not 100% reliable , but we live with it somehow without man-in-the-middle manual check layer, yes?" Answer about LLM will be the same, if good enough level of similarity/same asnwers is achieved.
> The code that LLMs make is much worse than what I can write: almost certainly; but the same could be said about your assembler
Has this been true since the 90s?
I pretty much only hear people saying modern compilers are unbeatable.
I'm trying to work with vibe-coded applications and it's a nightmare. I am trying to make one application multi-tenant by moving a bunch of code that's custom to a single customer into config. There are 200+ line methods, dead code everywhere, tons of unnecessary complexity (for instance, extra mapping layers that were introduced to resolve discrepancies between keys, instead of just using the same key everywhere). No unit tests, of course, so it's very difficult to tell if anything broke. When the system requirements change, the LLM isn't removing old code, it's just adding new branches and keeping the dead code around.
I ask the developer the simplest questions, like "which of the multiple entry-points do you use to test this code locally", or "you have a 'mode' parameter here that determines which branch of the code executes, which of these modes are actually used? and I get a bunch of babble, because he has no idea how any of it works.
Of course, since everyone is expected to use Cursor for everything and move at warp speed, I have no time to actually untangle this crap.
The LLM is amazing at some things - I can get it to one-shot adding a page to a react app for instance. But if you don't know what good code looks like, you're not going to get a maintainable result.
Already there for anyone using iPaaS platforms, and despite their flaws, it is the new normal in many enterprise consulting scenarios.
When a computer programmer finally discovers English.
Does the product work? Is it maintainable?
Everything else is secondary.
For me LLMs feel closer to IDE with steroids. Unless LLMs produce the same output from the same input, I can't view them as compilers.
"Following this hypothesis, what C did to assembler, what Java did to C, what Javascript/Python/Perl did to Java, now LLM agents are doing to all programming languages."
This is not an appropriate analogy, at least not right now.
Code Agents are generating code from prompts, in that sense the metaphor is correct. However Agents then read the code and it becomes input and they generate more code. This was never the case for compilers, an LLM used in this sense is strictly not a compiler because it is not cyclic and not directional.
Please stop with these rage click posts. There is so much wrong in this article i wont even start...
More garbage content on the front page. It’s a constant AI hype pieces with zero substance from people who just happen to work for AI companies. Hacker news is really going downhill.
As per Andrej Karpathy's viral tweet from three years ago [1]:
The hottest new programming language is English
______________The thing that’s always missing from these critiques isnt about code quality or LoC or slop.
The issue is that if you fire off 10 agents to work autonomously for an extended period of time at least 9 of them will build the WRONG THING.
The problem is context management and decision making based on that context. LLMs will always make assumptions about what you want, and the more assumptions they make the higher the likelihood that one or more of them is wrong.
If you see magazines, articles, ads and TV shows from the 1980s (there are lots on YouTube and a fun rabbit hole, like the BBC Archive), the general promise was "Computers can do anything, if you program them."
Well, nobody could figure out how to program them. Except the few outcasts like us who went on to suffer for the rest of our lives for it :')
With phones & LLMs this is the closest we have come to that original promise of a computer in every home and everyone being able to do anything with it, that isn't pre-dictated by corporations and their apps:
Ideally ChatGPT etc should be able to create interactive apps on the fly on iPhone etc. Imagine having a specific need and just being able to say it and get an app right away just for you on your device.
Was StackOverflow "the new high level language"? The proliferation of public git repos?
Because that's pretty much what "agentic" LLM coding systems are an automation of, skimming through forums or repos and cribbing the stuff that looks OK.
Except that the output depends on stars' alignment.
Imagine a machine that does the job sometimes but fails on some other times. Wonderful isn't it?
Code written in a HLL is a sufficient[1] description of the resulting program/behavior. The code, in combination with the runtime, define constraints on the behavior of the resulting program. A finished piece of HLL code encodes all the constraints the programmer desired. Presuming a 'correct' compiler/runtime, any variation in the resulting program (equivalently the behavior of an interpreter running the HLL code) varies within the boundaries of those constraints.
Code in general is also local, in the sense that small perturbation to the code has effects limited to a small and corresponding portion of the program/behavior. A change to the body of a function changes the generated machine code for that function, and nothing else[2].
Prompts provided to an LLM are neither sufficient nor local in the same way.
The inherent opacity of the LLM means we can make only probabilistic guarantees that the constraints the prompt intends to encode are reflected by the output. No theory (that we now know) can even attempt to supply such a guarantee. A given (sequence of) prompts might result in a program that happens to encode the constraints the programmer intended, but that _must_ be verified by inspection and testing.
One might argue that of course an LLM can be made to produce precisely the same output for the same input; it is itself a program after all. However, that 'reproducibility' should not convince us that the prompts + weights totally define the code any more than random.Random(1).random() being constant should cause us to declare python's .random() broken. In both cases we're looking at a single sample from a pRNG. Any variation whatsoever would result in a different generated program, with no guarantee that program would satisfy the constraints the programmer intended to encode in the prompts.
While locality falls similarly, one might point out the an agentic LLM can easily make a local change to code if asked. I would argue that an agentic LLMs prompts are not just the inputs from the user, but the entire codebase in its repo (if sparsely attended to by RAG or retrieval tool calls or w/e). The prompts _alone_ cannot be changed locally in a way that guarantees a local effect.
The prompt LLM -> program abstraction presents leaks of such volume and variety that it cannon be ignored like the code -> compiler -> program abstraction can. Continuing to make forward progress on a project requires the robot (and likely the human) attend to the generated code.
Does any of this matter? Compilers and interpreters themselves are imperfect, their formal verification is incomplete and underutilized. We have to verify properties of programs via testing anyway. And who cares if the prompts alone are insufficient? We can keep a few 100kb of code around and retrieve over it to keep the robot on track, and the human more-or-less in the loop. And if it ends up rewriting the whole thing every few iterations as it drifts, who cares?
For some projects where quality, correctness, interoperability, novelty, etc don't matter, it might be. Even in those, defining a program purely via prompts seems likely to devolve eventually into aggravation. For the rest, the end of software engineering seems to be greatly exaggerated.
[1]: loosely in the statistical sense of containing all the information the programmer was able to encode https://en.wikipedia.org/wiki/Sufficient_statistic
[2]: there're of course many tiny exceptions to this. we might be changing a function that's inlined all over the place; we might be changing something that's explicitly global state; we might vary timing of something that causes async tasks to schedule in a different order etc etc. I believe the point stands regardless.
so, prompt engineering it is. Happy new LLMium.
As long as SOTA is dumblooping until LLM verify some end goal and spend as many token as possible it wont be a language. At best an inelegant dialect.
So we are going to certainly see more of these incidents then [0] from those not understanding LLM written code as now 'engineers' will let their skills decay because the 'LLMs know best'.
[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...
After re-reading the post once again, because I honestly thought I was missing something obvious that would make the whole thing make sense, I started to wonder if the author actually understands the scope of a computer language. When he says:
> LLMs are far more nondeterministic than previous higher level languages. They also can help you figure out things at the high level (descriptions) in a way that no previous layer could help you dealing with itself. […] What about quality and understandability? If instead of a big stack, we use a good substrate, the line count of the LLM output will be much less, and more understandable. If this is the case, we can vastly increase the quality and performance of the systems we build.
How does this even work? There is no universe I can imagine where a natural language can be universal, self descriptive, non ambiguous, and have a smaller footprint than any purpose specific language that came before it.