We should revisit literate programming in the agent era

239 points • by horseradish • yesterday at 7:58 PM • 143 comments • view on HN

Comments

We were taught Literate Programming and xtUML at university. In both courses, the lecturers (independently) tried to convince us that these technologies were the future. I also did an AI/ML course. That lecturer lamented that the golden era was in the past.

palata • yesterday at 11:45 PM

I am not convinced.

- Natural languages are ambiguous. That's the reason why we created programming languages. So the documentation around the code is generally ambiguous as well. Worse: it's not being executed, so it can get out of date (sometimes in subtle ways).

- LLMs are trained on tons of source code, which is arguably a smaller space than natural languages. My experience is that LLMs are really good at e.g. translating code between two programming languages. But translating my prompts to code is not working as well, because my prompts are in natural languages, and hence ambiguous.

- I wonder if it is a question of "natural languages vs programming languages" or "bad code vs good code". I could totally imagine that documenting bad code helps the LLMs (and the humans) understand the intent, while documenting good code actually adds ambiguity.

What I learned is that we write code for humans to read. Good code is code that clearly expresses the intent. If there is a need to comment the code all over the place, to me it means that the code is maybe not as good as it should be :-).

Of course there is an argument to make that the quality of code is generally getting worse every year, and therefore there is more and more a need for documentation around it because it's getting hard to understand what the hell the author wanted to do.

➕ show 8 replies

beernet • today at 9:06 AM

Literate programming sounds great in a blog post, but it falls apart the moment an agent starts hallucinating between the prose and the actual implementation. We’re already struggling with docstrings getting out of sync; adding a layer of philosophical "intent" just gives the agent more room to confidently output garbage. If you need a wall of text to make an agent understand your repo, your abstractions are probably just bad. It feels like we're trying to fix a lack of structural clarity with more tokens.

rednafi • yesterday at 11:25 PM

I think a lighter version of literate programming, coupled with languages that have a small API surface but are heavy on convention, is going to thrive in this age of agentic programming.

A lighter API footprint probably also means a higher amount of boilerplate code, but these models love cranking out boilerplate.

I’ve been doing a lot more Go instead of dynamic languages like Python or TypeScript these days. Mostly because if agents are writing the program, they might as well write it in a language that’s fast enough. Fast compilation means agents can quickly iterate on a design, execute it, and loop back.

The Go ecosystem is heavy on style guides, design patterns, and canonical ways of doing things. Mostly because the language doesn’t prevent obvious footguns like nil pointer errors, subtle race conditions in concurrent code, or context cancellation issues. So people rely heavily on patterns, and agents are quite good at picking those up.

My version of literate programming is ensuring that each package has enough top-level docs and that all public APIs have good docstrings. I also point agents to read the Google Go style guide [1] each time before working on my codebase.This yields surprisingly good results most of the time.

[1] https://google.github.io/styleguide/go/

➕ show 1 reply

rustybolt • yesterday at 9:29 PM

I have noticed a trend recently that some practices (writing a decent README or architecture, being precise and unambiguous with language, providing context, literate programming) that were meant to help humans were not broadly adopted with the argument that it's too much effort. But when done to help an LLM instead of a human a lot of people suddenly seem to be a lot more motivated to put in the effort.

➕ show 6 replies

perrygeo • yesterday at 9:28 PM

Considering LLMs are models of language, investing in the clarity of the written word pays off in spades.

I don't know whether "literate programming" per se is required. Good names, docstrings, type signatures, strategic comments re: "why", a good README, and thoughtfully-designed abstractions are enough to establish a solid pattern.

Going full "literate programming" may not be necessary. I'd maybe reframe it as a focus on communication. Notebooks, examples, scripts and such can go a long way to reinforcing the patterns.

Ultimately that's what it's about: establishing patterns for both your human readers and your LLMs to follow.

➕ show 2 replies

rorylaitila • today at 9:34 AM

Even on the latest models, LLMs are not deterministic between "don't do this thing" and "do this thing". They are both related to "this thing" and depending on other content in the context and seed, may randomly do the thing or not. So to get the best results, I want my context to be the smallest possible truthful input, not the most elaborated. More is not better. I think good names on executable source code and tightest possible documentation is best for LLMs, and probably for people too.

cfiggers • yesterday at 10:14 PM

Interesting and semi-related idea: use LLMs to flag when comments/docs have come out of sync with the code.

The big problem with documentation is that if it was accurate when it was written, it's just a matter of time before it goes stale compared to the code it's documenting. And while compilers can tell you if your types and your implementation have come out of sync, before now there's been nothing automated that can check whether your comments are still telling the truth.

Somebody could make a startup out of this.

➕ show 5 replies

jph00 • yesterday at 10:07 PM

Nearly all my coding for the last decade or so has used literate programming. I built nbdev, which has let me write, document, and test my software using notebooks. Over the last couple of years we integrated LLMs with notebooks and nbdev to create Solveit, which everyone at our company uses for nearly all our work (even our lawyers, HR, etc).

It turns out literate programming is useful for a lot more than just programming!

➕ show 1 reply

cadamsdotcom • yesterday at 9:40 PM

Test code and production code in a symmetrical pair has lots of benefits. It’s a bit like double entry accounting - you can view the code’s behavior through a lens of the code itself, or the code that proves it does what it seems to do.

You can change the code by changing either tests or production code, and letting the other follow.

Code reviews are a breeze because if you’re confused by the production code, the test code often holds an explanation - and vice versa. So just switch from one to the other as needed.

Lots of benefits. The downside is how much extra code you end up with of course - up to you if the gains in readability make up for it.

s3anw3 • today at 10:09 AM

I think the tension between natural language and code is fundamentally about information compression. Code is maximally compressed intent — minimal redundancy, precise semantics. Prose is deliberately less compressed — redundant, contextual, forgiving — because human cognition benefits from that slack.

Literate programming asks you to maintain both compression levels in parallel, which has always been the problem: it's real work to keep a compressed and an uncompressed representation in sync, with no compiler to enforce consistency between them.

What's interesting about your observation is that LLMs are essentially compression/decompression engines. They're great at expanding code into prose (explaining) and condensing prose into code (implementing). The "fundamental extra labor" you describe — translating between these two levels — is exactly what they're best at.

So I agree with your conclusion: the economics have changed. The cost of maintaining both representations just dropped to near zero. Whether that makes literate programming practical at scale is still an open question, but the bottleneck was always cost, not value.

jasfi • today at 7:59 AM

I wrote something similar where you specify the intent in Markdown at the file level. That can also be done by an AI agent. Each intent file compiles to a source file.

It works, but needs improvement. Any feedback is welcome!

https://intentcode.dev

https://github.com/jfilby/intentcode

eisbaw • today at 7:45 AM

https://github.com/eisbaw/litterate_bitorrent 800 pages, noweb extracts rust. Made by claude in a ralph loop over 1-2 days.

yes, it downloads actual torrents.

teleforce • today at 3:20 AM

Not sure if the author know about CUE, here's the HN post from early this year on literate programming with CUE [1].

CUE is based of value-latticed logic that's LLM's NLP cousin but deterministic rather than stochastic [2].

LLMs are notoriously prone to generating syntactically valid but semantically broken configurations thus it should be used with CUE for improving literate programming for configs and guardrailing [3].

[1] CUE Does It All, But Can It Literate? (22 comments)

https://news.ycombinator.com/item?id=46588607

[2] The Logic of CUE:

https://cuelang.org/docs/concept/the-logic-of-cue/

[3] Guardrailing Intuition: Towards Reliable AI:

https://cue.dev/blog/guardrailing-intuition-towards-reliable...

jimbokun • today at 3:34 AM

This does seem exciting at first glance. Just write the narrative part of literate programming and an LLM generates the code, then keep the narrative and voila! Literate programming without the work of generating both.

However I see two major issues:

Narrative is meant to be consumed linearly. But code is consumed as a graph. We navigate from a symbol to its definition, or from definition to its uses, jumping from place to place in the code to understand it better. The narrative part of linear programming really only works for notebooks where the story being told is dominant and the code serves the story.

Second is that when I use an LLM to write code, the changes I describe usually require modifying several files at once. Where does this “narrative” go relative to the code.

And yes, these two issues are closely related.

yuppiemephisto • today at 5:39 AM

I do a form of literate programming for code review to help read AI code. I use [Lean 4](lean-lang.org) and its doc tool [Verso](https://github.com/leanprover/verso/) and have it explain the code through a literate essay. It is integrated with Lean and gets proper typechecking etc which I find helpful.

librasteve • yesterday at 9:38 PM

I dont know Org, but Rakudoc https://docs.raku.org/language/pod is useful for literate programming (put the docs in the code source) and for LLM (the code is "self documenting" so that in the LLM inversion of control, the LLM can determine how to call the code).

https://podlite.org is this done in a language neutral way perl, JS/TS and raku for now.

Heres an example:

  #!/usr/bin/env raku
  =begin pod
  =head1 NAME
  Stats::Simple - Simple statistical utilities written in Raku

  =head1 SYNOPSIS
      use Stats::Simple;

      my @numbers = 10, 20, 30, 40;

      say mean(@numbers);     # 25
      say median(@numbers);   # 25

  =head1 DESCRIPTION
  This module provides a few simple statistical helper functions
  such as mean and median. It is meant as a small example showing
  how Rakudoc documentation can be embedded directly inside Raku
  source code.

  =end pod

  unit module Stats::Simple;

  =begin pod
  =head2 mean

      mean(@values --> Numeric)

  Returns the arithmetic mean (average) of a list of numeric values.

  =head3 Parameters
  =over 4
  =item @values
  A list of numeric values.

  =back

  =head3 Example
      say mean(1, 2, 3, 4);  # 2.5
  =end pod
  sub mean(*@values --> Numeric) is export {
      die "No values supplied" if @values.elems == 0;
      @values.sum / @values.elems;
  }

  =begin pod
  =head2 median

      median(@values --> Numeric)

  Returns the median value of a list of numbers.

  If the list length is even, the function returns the mean of
  the two middle values.

  =head3 Example
      say median(1, 5, 3);     # 3
      say median(1, 2, 3, 4);  # 2.5
  =end pod
  sub median(*@values --> Numeric) is export {
      die "No values supplied" if @values.elems == 0;

      my @sorted = @values.sort;
      my $n = @sorted.elems;

      return @sorted[$n div 2] if $n % 2;

      (@sorted[$n/2 - 1] + @sorted[$n/2]) / 2;
  }

  =begin pod
  =head1 AUTHOR
  Example written to demonstrate Rakudoc usage.

  =head1 LICENSE
  Public domain / example code.
  =end pod

cmontella • today at 3:07 AM

I agree with this. I've been a fan of literate programming for a long time, I just think it is a really nice mode of development, but since its inception it hasn't lived up to its promise because the tooling around the concept is lacking. Two of the biggest issues have been 1) having to learn a whole new toolchain outside of the compiler to generate the documents 2) the prose and code can "drift" meaning as the codebase evolves, what's described by the code isn't expressed by the prose and vice versa. Better languages and tooling design can solve the first problem, but I think AI potentially solves the second.

Here's the current version of my literate programming ideas, Mechdown: https://mech-lang.org/post/2025-11-12-mechdown/

It's a literate coding tool that is co-designed with the host language Mech, so the prose can co-exist in the program AST. The plan is to make the whole document queryable and available at runtime.

As a live coding environment, you would co-write the program with AI, and it would have access to your whole document tree, as well as live type information and values (even intermediate ones) for your whole program. This rich context should help it make better decisions about the code it writes, hopefully leading to better synthesized program.

You could send the AI a prompt, then it could generate the code using live type information; execute it live within the context of your program in a safe environment to make sure it type checks, runs, and produces the expected values; and then you can integrate it into your codebase with a reference to the AI conversation that generated it, which itself is a valid Mechdown document.

That's the current work anyway -- the basis of this is the literate programming environment, which is already done.

The docs show off some more examples of the code, which I anticipate will be mostly written by AIs in the future: https://docs.mech-lang.org/getting-started/introduction.html

➕ show 1 reply

DennisL123 • today at 7:47 AM

If agents can already read and rewrite code, literate programming might actually be unnecessary. Instead of maintaining prose alongside code, you could let agents generate explanations on demand. The real requirement becomes writing code in a form that is easily interpretable and transformable by the next agent in the chain. In that model, code itself becomes the stable interface, while prose is just an ephemeral view generated whenever a human (or another agent) needs it.

➕ show 1 reply

trane_project • yesterday at 11:08 PM

I think full literate programming is overkill but I've been doing a lighter version of this:

- Module level comments with explanations of the purpose of the module and how it fits into the whole codebase.

- Document all methods, constants, and variables, public and private. A single terse sentence is enough, no need to go crazy.

- Document each block of code. Again, a single sentence is enough. The goal is to be able to know what that block does in plain English without having to "read" code. Reading code is a misnomer because it is a different ability from reading human language.

Example from one of my open-source projects: https://github.com/trane-project/trane/blob/master/src/sched...

gervwyk • yesterday at 9:31 PM

For me this is where a config layer shines. Develop a decent framework and then let the agents spin out the configuration.

This allows a trusted and tested abstraction layer that does not shift and makes maintenance easier, while making the code that the agents generate easier to review and it also uses much less tokens.

So as always, just build better abstractions.

➕ show 4 replies

ajkjk • today at 12:35 AM

I've had the same thought, maybe more grandiosely. The idea is that LLM prompts are code -- after all they are text that gets 'compiled' (by the LLM) into a lower-level language (the actual code). The compile process is more involved because it might involve some back-and-forth, but on the other hand it is much higher level. The goal is to have a web of prompts become the source of truth for the software: sort of like the flowchart that describes the codebase 'is' the codebase.

➕ show 2 replies

Arubis • yesterday at 11:28 PM

Anecdotally, Claude Opus is at least okay at literate emacs. Sometimes takes a few rounds to fix its own syntax errors, but it gets the idea. Requiring it to TDD its way in with Buttercup helps.

arikrahman • today at 2:14 AM

I have instructed my LLMs to at least provide a comment per function, but prompt it to comment when it takes out things additionally, and why it opted to choose a particular solution. DistroTube also loves declarative literate programming approach, often citing how his one document configuration with nix configures his whole system.

avatardeejay • yesterday at 11:21 PM

Something in this realm covers my practice. I just keep a master prompt for the whole program, and sparsely documented code. When it's time to use LLM's in the dev process, they always get a copy of both and it makes the whole process like 10x as coherent and continuous. Obvi when a change is made that deviates or greatly expands on the spec, I update the spec.

➕ show 1 reply

ljlolel • today at 1:56 AM

Everyone is circling getting rid of the code and just having Englishscript https://jperla.com/blog/claude-electron-not-claudevm

prpl • today at 3:04 AM

It has always been been possible to program literately in programming languages - not to the extent that you can in Web, but good code can read like a story and obviate comments

stephbook • yesterday at 9:57 PM

Take it to the logical conclusion. Track the intended behavior in a proper issue tracking software like Jira. Reference the ticket in your version control system.

Boring and reliable, I know.

If you need guides to the code base beyond what the programming language provides, just write a directory level readme.md where necessary.

➕ show 1 reply

wewewedxfgdf • today at 1:47 AM

What we need is comments that LLMs simply do not delete.

We need metadata in source code that LLMs don't delete and interpreters/compilers/linters don't barf on.

pjmlp • yesterday at 11:34 PM

I rather go with formal specifications, and proofs.

whatgoodisaroad • today at 1:05 AM

it could be fun to make a toy compiler that takes an arbitrary literate prompt as input and uses an LLM to output a machine code executable (no intermediate structured language). could call it llmllvm. perhaps it would be tremendously dangerous

rudhdb773b • today at 1:47 AM

I'd love to see what Tim Daly could with LLMs on Axiom's code base.

koolala • today at 12:41 AM

Left to right APL style code seems like it could be words instead of symbols.

senderista • yesterday at 9:55 PM

The "test runbook" approach that TFA describes sounds like doctest comments in Python or Rust.

hsaliak • today at 2:42 AM

I explored this in std::slop (my clanker) https://github.com/hsaliak/std_slop. One of it's differentiating features of this clanker i that it only has a single tool call, run_js. The LLM produces js scripts to do it's work. Naturally, i tried to teach it to add comments for these scripts and incorporate literate programming elements. This was interesting because, every tool call now 'hydrated' some free form thinking, but it comes at output token cost.

Output Tokens are expensive! In GPT-5.4 it's ~180 dollars per Million tokens! I've settled for brief descriptions that communicate 'why' as a result. The code is documentation after all.

sublinear • yesterday at 8:04 PM

> This is especially important if the primary role of engineers is shifting from writing to reading.

This was always the primary role. The only people who ever said it was about writing just wanted an easy sales pitch aimed at everyone else.

Literate programming failed to take off because with that much prose it inevitably misrepresents the actual code. Most normal comments are bad enough.

It's hard to maintain any writing that doesn't actually change the result. You can't "test" comments. The author doesn't even need to know why the code works to write comments that are convincing at first glance. If we want to read lies influenced by office politics, we already have the rest of the docs.

➕ show 4 replies

amelius • yesterday at 10:14 PM

We need an append-only programming language.

anotheryou • yesterday at 9:31 PM

but doesn't "the code is documentation" work better for machines?

and don't we have doc-blocks?

➕ show 1 reply

nailer • today at 12:15 AM

> Literate programming is the idea that code should be intermingled with prose such that an uninformed reader could read a code base as a narrative

Have you tried naming things properly? A reader that knows your language could then read your code base as a narrative.

charcircuit • yesterday at 10:34 PM

>I don't have data to support this

With there being data that shows context files which explain code reduces the performance of them, it is not straightforward that literate programming is better so without data this article is useless.

jauntywundrkind • yesterday at 10:06 PM

One of the things I love most about WebMCP is the idea that it's a MCP session that exists on the page, which the user already knows.

Most of these LLM things are kind of separate systems, with their own UI. The idea of agency being inlayed to existing systems the user knows like this, with immediate bidirectional feedback as the user and LLM work the page, is incredibly incredibly compelling to me.

Series of submissions (descending in time): https://news.ycombinator.com/item?id=47211249 https://news.ycombinator.com/item?id=47037501 https://news.ycombinator.com/item?id=45622604

akater • yesterday at 11:32 PM

The question posed is, “With agents, does it become practical to have large codebases that can be read like a narrative, whose prose is kept in sync with changes to the code by tireless machines?”

It's not practical to have codebases that can be read like a narrative, because that's not how we want to read them when we deal with the source code. We jump to definitions, arriving at different pieces of code in different paths, for different reasons, and presuming there is one universal, linear, book-style way to read that code, is frankly just absurd from this perspective. A programming language should be expressive enough to make code read easily, and tools should make it easy to navigate.

I believe my opinion on this matters more than an opinion of an average admirer of LP. By their own admission, they still mostly write code in boring plain text files. I write programs in org-mode all the time. Literally (no pun intended) all my libraries, outside of those written for a day job, are written in Org. I think it's important to note that they are all Lisp libraries, as my workflow might not be as great for something like C. The documentation in my Org files is mostly reduced to examples — I do like docstrings but I appreciate an exhaustive (or at least a rich enough) set of examples more, and writing them is much easier: I write them naturally as tests while I'm implementing a function. The examples are writen in Org blocks, and when I install a library of push an important commit, I run all tests, of which examples are but special cases. The effect is, this part of the documentation is always in sync with the code (of course, some tests fail, and they are marked as such when tests run). I know how to sync this with docstrings, too, if necessary; I haven't: it takes time to implement and I'm not sure the benefit will be that great.

My (limited, so far) experience with LLMs in this setting is nice: a set of pre-written examples provides a good entry point, and an LLM is often capable of producing a very satisfactory solution, immediately testable, of course. The general structure of my Org files with code is also quite strict.

I don't call this “literate programming”, however — I think LP is a mess of mostly wrong ideas — my approach is just a “notebook interface” to a program, inspired by Mathematica Notebooks, popularly (but not in a representative way) imitated by the now-famous Jupyter notebooks. The terminology doesn't matter much: what I'm describing is what the silly.business blogpost is largerly about. The author of nbdev is in the comments here; we're basically implementing the same idea.

silly.business mentions tangling which is a fundamental concept in LP and is a good example of what I dislike about LP: tangling, like several concepts behing LP, is only a thing due to limitations of the programming systems that Donald Knuth was using. When I write Common Lisp in Org, I do not need to tangle, because Common Lisp does not have many of the limitations that apparently influenced the concepts of LP. Much like “reading like a narrative” idea is misguided, for reasons I outlined in the beginning. Lisp is expressive enough to read like prose (or like anything else) to as large a degree as required, and, more generally, to have code organized as non-linearly as required. This argument, however, is irrelevant if we want LLMs, rather than us, read codebases like a book; but that's a different topic.

➕ show 1 reply

octoclaw • today at 10:02 AM

[dead]

JEONSEWON • today at 6:11 AM

[dead]

openclaw01 • today at 3:07 AM

[dead]

Agent_Builder • today at 5:16 AM

[dead]

aplomb1026 • yesterday at 11:32 PM

[dead]

moehj • yesterday at 10:11 PM

[dead]

alt Hacker News

We should revisit literate programming in the agent era

Comments