logoalt Hacker News

If AI writes code, should the session be part of the commit?

442 pointsby mandel_xtoday at 12:27 AM365 commentsview on HN

Comments

jedbergtoday at 6:44 AM

The way I write code with AI is that I start with a project.md file, where I describe what I want done. I then ask it to make a plan.md file from that project.md to describe the changes it will make (or what it will create if Greenfield).

I then iterate on that plan.md with the AI until it's what I want. I then ask it to make a detailed todo list from the plan.md and attach it to the end of plan.md.

Once I'm fully satisfied, I tell it to execute the todo list at the end of the plan.md, and don't do anything else, don't ask me any questions, and work until it's complete.

I then commit the project.md and plan.md along with the code.

So my back and forth on getting the plan.md correct isn't in the logs, but that is much like intermediate commits before a merge/squash. The plan.md is basically the artifact an AI or another engineer can use to figure out what happened and repeat the process.

The main reason I do this is so that when the models get a lot better in a year, I can go back and ask them to modify plan.md based on project.md and the existing code, on the assumption it might find it's own mistakes.

show 12 replies
827atoday at 4:49 AM

IMO: This might be a contrarian opinion, but I don't think so. Its much the same problem as asking, for example, if every single line you write, or every function, becomes a commit. The answer to this granularity is, much like anything, you have to think of the audience: Who is served by persisting these sessions? I would suspect that there is little reason why future engineers, or future LLMs, would need access to them; they likely contain a significant amount of noise, incorrect implementations, and red herrings. The product of the session is what matters.

I do think there's more value in ensuring that the initial spec, or the "first prompt" (which IME is usually much bigger and tries to get 80% of the way there) is stored. And, maybe part of the product is an LLM summary of that spec, the changes we made to the spec within the session, and a summary of what is built. But... that could be the commit message? Or just in a markdown file. Or in Notion or whatever.

show 11 replies
dangtoday at 3:43 AM

I floated that idea a week ago: https://news.ycombinator.com/item?id=47096202, although I used the word "prompts" which users pointed out was obsolete. "Session" seems better for now.

The objections I heard, which seemed solid, are (1) there's no single input to the AI (i.e. no single session or prompt) from which such a project is generated,

(2) the back-and-forth between human and AI isn't exactly like working with a compiler (the loop of source code -> object code) - it's also like a conversation between two engineers [1]. In the former case, you can make the source code into an artifact and treat that as "the project", but you can't really do that in the latter case, and

(3) even if you could, the resulting artifact would be so noisy and complicated that saving it as part of the project wouldn't add much value.

At the same time, people have been submitting so many Show HNs of generated projects, often with nothing more than a generated repo with a generated readme. We need a better way of processing these because treating them like old-fashioned Show HNs is overwhelming the system with noise right now [2].

I don't want to exclude these projects, because (1) some of them are good, (2) there's nothing wrong with more people being able to create and share things, (3) it's foolish to fight the future, and (4) there's no obvious way to exclude them anyhow.

But the status quo isn't great because these projects, at the moment, are mostly not that interesting. What's needed is some kind of support to make them more interesting.

So, community: what should we do?

[1] this point came from seldrige at https://news.ycombinator.com/item?id=47096903 and https://news.ycombinator.com/item?id=47108653.

YoumuChan makes a similar point at https://news.ycombinator.com/item?id=47213296, comparing it to Google search history. The analogy is different but the issue (signal/noise ratio) is the same.

[2] Is Show HN dead? No, but it's drowning - https://news.ycombinator.com/item?id=47045804 - Feb 2026 (422 comments)

show 19 replies
voxleonetoday at 3:31 PM

I’ve found a workflow that feels both structured and respectful of professional craft, especially in the context of this thread. I don’t just "vibe code" and let an LLM fill in the blanks. I use a classic design discipline (UML and use-cases) to document the process: 1. Start with requirements – 2.Define use cases - 3. Implement classes/objects (Architecture first, not after-the-fact refactors) 4. Add constraints and invariants (Contracts, boundaries, failure modes, etc.) - 5. Let the agent work inside that frame, pausing at milestones for human oversight.

Those UML/use-case/constraint artifacts aren’t committed as session logs per se, but they are part of the author’s intent and reasoning that gets committed alongside the resulting code. That gives future reviewers the why as well as the what, which is far more useful than a raw AI session transcript.

Stepping back, this feels like a decent and dignified position for a programmer in 2026: humans retain architectural judgement --> AI accelerates boilerplate and edge implementation --> version history still reflects intent and accountability rather than chat transcripts. I can’t afford to let go of the productivity gains that flow from using AI as part of a disciplined engineering process, but I also don’t think commit logs should become a dumping ground for unfiltered conversation history.

rfw300today at 2:36 AM

Why should it be? The agent session is a messy intermediate output, not an artifact that should be part of the final product. If the "why" of a code change is important, have your agent write a commit message or a documentation file that is polished and intended for consumption.

show 7 replies
yuvrajangadstoday at 12:20 PM

The session itself is mostly noise. Half of it is the model going down wrong paths, backtracking, and trying again. Storing that alongside the commit is like saving your browser history next to your finished code.

What actually helps is a good commit message explaining the intent. If an AI wrote the code, the interesting part isn't the transcript, it's why you asked for it and what constraints you gave it. A one-paragraph description of the goal and approach is worth more than a 200-message session log.

I think the real question isn't about storing sessions, it's about whether we're writing worse commit messages because we assume the AI context is "somewhere."

onion2ktoday at 5:11 AM

Conceptually this is very similar to the question of whether or not you should squash your commits. To the point that it's really the same question.

If you think you should squash commits, then you're only really interested in the final code change. The history of how the dev got there can go in the bin.

If you don't think you should squash commits then you're interested in being able to look back at the journey that got the dev to the final code change.

Both approaches are valid for different reasons but they're a source of long and furious debate on every team I've been on. Whether or not you should be keeping a history of your AI sessions alongside the code could be useful for debugging (less code debugging, more thought process debugging) but the 'prefer squash' developers usually prefer to look the existing code rather than the history of changes to steer it back on course, so why would they start looking at AI sessions if they don't look at commits?

All that said, your AI's memory could easily be stored and managed somewhere separately to the repo history, and in a way that makes it more easily accessible to the LLM you choose, so probably not.

show 3 replies
D-Machinetoday at 4:30 AM

Obviously yes, at least if not the prompts in the session, some simple / automated distillation of those prompts. Code generated by AI is already clearly not going to be reviewed as carefully as code produced by humans, and intentions / assumptions will only be documented in AI-generated comments to some limited degree, completely contingent on the prompt(s).

Otherwise, when fixing a bug, you just risk starting from scratch and wasting time using the same prompts and/or assumptions that led to the issue in the first place.

Much of the reason code review was/is worth the time is because it can teach people to improve, and prevent future mistakes. Code review is not really about "correctness", beyond basic issues, because subtle logic errors are in general very hard to spot; that is covered by testing (or, unfortunately, deployment surprises).

With AI, at least as it is currently implemented, there is no learning, as such, so this removes much of the value of code review. But, if the goal is to prevent future mistakes, having some info about the prompts that led to the code at least brings some value back to the review process.

EDIT: Also, from a business standpoint, you still need to select for competent/incompetent prompters/AI users. It is hard to do so when you have no evidence of what the session looked like. Also, how can you teach juniors to improve their vibe-coding if you can't see anything about their sessions?

show 1 reply
ZoomZoomZoomtoday at 10:19 AM

If by AI you mean the LLM-based tools common now, then I don't want the commits in PRs I'm going to review to bring any more noise than they already do. The human operator is responsible for every line, like they always were.

If by AI you mean non-supervised, autonomous conscience (as I believe the term has to be reserved for), then the answer is again no, as it's as responsible for the quality of its PRs as humans.

If the thing writing code is the former, but there's no human or responsible representative of the latter in the loop, then the code shouldn't be even suggested for consideration in a project where any people do participate. In such case there's no point in storing any additional information as the code itself doesn't have any value (besides electricity wasted to create it) and can be substituted on demand.

Commit comments are generally underused, though, as a result of how forges work, but that's another discussion.

YoumuChantoday at 2:49 AM

Should my google search history be part of the commit? To that question my answer is no.

show 4 replies
jon_northtoday at 5:20 PM

This seems like a very good idea, not just because of the desire to do human archaeology at times, but also to let further agentic exploration occur. It would be best if it became a separate section of the commit that could just be blank or contain other documentation in the case of human authorship. The commit message shouldn't get longer and longer. It should continue to tell the concise story that humans and LLMs alike consume quickly to gain some initial synthesis.

So I like the link's approach quite a bit.

CloakHQtoday at 4:31 PM

The plan.md approach solves something I've been struggling with on a browser automation project. When you're building something stateful (browser sessions, fingerprinting logic etc.) the "why" behind decisions gets lost fast. Not just for other devs, but for the AI itself in future sessions.

One thing I've added on top of the plan/project structure: a short `decisions.md` that logs only the non-obvious choices, like "tried X, it caused Y issue, went with Z instead". Basically the things that would make future-me or a future agent waste time rediscovering.

Do you find the plan.md files stay useful past the initial build, or do they mostly just serve as a commit artifact?

abustamamtoday at 3:23 AM

I don't think it should be. I think a distilled summary of what the agent did should be committed. This requires some dev discipline. But for example:

Make a button that does X when clicked.

Agent makes the button.

I tell it to make the button red.

Agent makes it red.

I test it, it is missing an edge case. I tell it to fix it.

It fixes it.

I don't like where the button is. I tell it to put it in the sidebar.

It does that.

I can go on and on. But we don't need to know all those intermediaries. We just need to know Red button that does X by Y mechanism is in the sidebar. Tests that include edge cases here. All tests passing. 2026-03-01

And that document is persisted.

If later, the button gets deleted or moved again or something, we can instruct the agent to say why. Button deleted because not used and was noisy. 2026-03-02

This can be made trivial via skills, but I find it a good way to understand a bit more deeply than commit messages would allow me to do.

Of course, we can also just write (or instruct agents to write) better PRs but AFAICT there's no easy way to know that the button came about or was deleted by which PR unless you spelunk in git blame.

raincoletoday at 4:08 AM

I hope people start doing that. Not that it has any practical usage for the repo itself, but if everyone does that, it'd probably make it much easier for open weight models to catch up the proprietary ones. It'd be like a huge crowdsourced project to collect proprietary models' output for future training.

resterstoday at 5:07 PM

What would be most useful is some kind of context representation that could be upgraded as better models get developed. If you put it in the commit then you need to compare contexts when comparing code across time. But if you make the context include the changes in the code over time, then the future context will be better at debugging a bug in code written years earlier. The years-old context is likely going to be obsolete by that time anyway.

131hntoday at 5:42 PM

Vibecoded code is not C, python, ts, je or whatever.

It need to be considered as a compiled output of vbc-c, vbc-python, or vbc-ts, or vbc-js.

Keeping the source code (the prompt) is very natural, when compiled binaries “vibecoded” output is lacking _context_ and _motivation_ (which the source code / prompt provides)

angry_octettoday at 4:10 PM

Since the code is literally the answer to What comes next after this prompt the answer is yes. Unfortunately there is also a hidden random seed in the engine (which this doesn't seem to address). But if you capture the seed, the exact version of the software and the prompt, the system is completely deterministic.

However there is an unpleasant reality: the system could be incredibly brittle, with the slightest change in input or seed resulting in significantly different output. It would be nice if all small and seemingly inconsequential input perturbations resulted in a cluster of outputs that are more or less the same, but that seems very model dependent.

rDr4g0ntoday at 1:52 PM

When I began reviewing my teammate’s PRs with AI generated code in it, something started to feel weird. It took a bit, but I realized the problem: I am not reviewing the work my teammate did.

What are they even supposed to do with feedback on the code? It has to be translated by my teammate into the language of the work they did, which is the conversation they had with the AI agent.

But the conversation isn't the "real work": the decisions made in the conversation are the real work. That is what needs capture and review.

So now I know why code reviews are kinda wrong, what can we do to have meaningful reviews of the work my teammates have done?

What I landed on is aiming to capture more and more “work” in the form of a spec, review the spec, ignore the code. this isn't novel or interesting. HOWEVER...

For the large, messy, legacy codebases I work in today, I don’t like the giant spec driven development approach that is most popular today. It’s too risky to solely trust the spec because it touches so much messy code with so many gotchas. However, with the rate of AI generated code rolling in, I simply can’t switch context quickly enough to review it all efficiently. Also, it’s exhausting.

The approach I have been refining is defining very small modules (think a class or meaningful collection of utils) with a spec and a concise set of unit tests, generating code from the spec, then not reading or editing the generated code.

Any changes to the code must be made to the spec, and the code re-generated. This puts the PR conversation in the right place, against the work I have done: which is write the spec.

So far the approach has worked for replacing simple code (eg: a nestjs service that has a handful of public methods, a bit of business logic, and a few API client calls). PRs usually have a handful of lines of glue code to review, but the rest are specs (and a selection of “trust” unit tests) and the idea is that the code can be skipped.

AI review bots still review the PR and comment around code quality and potential security concerns, which I then translate into updates to the spec.

I find this to be a good step towards the codegen future without totally handing over my (very messy and not very agent friendly) codebases.

pipejoshtoday at 4:26 PM

I settled on a similar workflow but across two agents instead of one session.

One agent writes task specs. The other implements them. Handoff files bridge the gap. The spec IS the session artifact because it captures intent, scope, and constraints before any code gets written.

The plan.md approach people are describing here is basically what happens naturally when you force yourself to write intent before execution.

gavinraytoday at 4:39 PM

This is what the Github CEO recently announced as a product/company:

https://entire.io/

Original blogpost goes over motivations + workflow:

https://entire.io/blog/hello-entire-world/

tototrainstoday at 6:20 AM

I considered this and even built a claude code extension to bring history/chats into the project folder.

Not once have I found it useful: if the intention isn't clear from the code and/or concise docs, the code is bad and needs to be polished.

Well written code written with intention is instantly interpretable with an LLM. Sending the developer or LLM down a rabbit hole of drafts is a waste of cognition and context.

robseedtoday at 5:54 PM

Unedited AI generated code should have a different blame line than regular code, something like author_ai vs author.

mandel_xtoday at 12:27 AM

I’ve been thinking about a simple problem: We’re increasingly merging AI-assisted code into production, but we rarely preserve the thing that actually produced it — the session. Six months later, when debugging or reviewing history, the only artifact left is the diff. So I built git-memento. It attaches AI session transcripts to commits using Git notes.

show 5 replies
brendanmc6today at 5:16 AM

A few things really leveled up both my software quality and my productivity in the last few months. It wasn’t session history, memory files, context management or any of that.

1. Writing a spec with clear acceptance criteria.

2. Assigning IDs to my acceptance criteria. Sounds tedious, but actually the idea wasn’t mine, at some point an agent went and did it without me asking. The references proved so useful for guiding my review that I formalized the process (and switched from .md to .yaml to make it easier).

3. Giving my agents a source of truth to share implementation progress so they can plan their own tasks and more effectively review.

Of course, I can’t help myself, I had to formalize it into a spec standard and a toolkit. Gonna open source it all soon, but I really want feedback before I go too far down the rabbit hole:

https://acai.sh

show 1 reply
tokiorytoday at 1:32 PM

Hell no, there are many companies, which doesn't use any AI (or just using copilot). I would hate to read a commit history where every commit had a "conversation" attached to it. Code should be human-first, always

causaltoday at 2:56 AM

If a car is used to get you somewhere, should you put the exhaust in bags to bring with you?

show 2 replies
xhcuvuvyctoday at 4:23 AM

No? For the same reason I don't want to work 8 hours a day with the boss looking over my shoulder.

veunestoday at 11:46 AM

The idea of "saving prompts for reproducibility" is dead on arrival. LLMs are non-deterministic by nature. In a year, they'll deprecate this model's API, and the new version will spit out completely different code with entirely new bugs for the exact same prompt. A prompt isn't source code, it's just a temporary crutch for stochastic generation. And if I have to read 50 pages of schizophrenic dialogue with an LLM just to understand why a specific function exists, that PR gets an instant reject. The artifact is and always will be readable code plus a sane commit message. Dumping a log of hallucinations will only make debugging a nightmare when this Frankenstein inevitably falls apart in prod tbh

show 1 reply
dogastoday at 3:12 PM

I created a tool that will automatically suck in claude sessions into a separate repo. It sanitizes any sensitive data like API keys. Our team finds this useful to share sessions + context.

https://github.com/gammons/ai-session

Garleftoday at 3:31 PM

I think this is the wrong mental model.

Instead, we need better (self-explaining) translation from spec to code. And better tools that help us navigate codebases we've not written ourselves.

For example, imagine a UI where you click on a feature spec file and it highlights you all the relevant tests and code.

lionkortoday at 7:39 AM

Sone of the best engineers I've seen use commit messages to explain their intent, sometimes even in many sentences, below the message.

I bet, without trying to be snarky, that most AI users don't even know you can commit with an editor instead of -m "message" and write more detail.

It's good that AI fans are finding out that commits are important, now don't reinvent the wheel and just spend a couple minutes writing each commit message. You'll thank yourself later.

show 1 reply
kzaheltoday at 7:39 AM

I would love to be able to share all my sessions automatically. But I would want to share a carefully PII/secrets redacted session. I added a "session sharing" feature to my agent wrapper that just grabs innerHTML and uploads to cloudflare. So I can share how I produced/vibe coded an entire project from start to finish.

For example: https://github.com/kzahel/PearSync/blob/main/sessions/sessio...

I think it's valuable to share that so people who are interested can see how you interact with agents. Sharing raw JSONL is probably a waste and contains too many absolute paths and potential for sharing unintentionally.

https://github.com/peteromallet/dataclaw?tab=readme-ov-file#... is one project I saw that makes an attempt to remove PII/secrets. But I certainly wouldn't share all my sessions right now, I just don't know what secrets accidentally got in them.

claud_iatoday at 10:02 AM

The raw session noise — repeated clarifications, trial-and-error prompting, hallucinated APIs — probably isn't worth preserving. But AI sessions contain one category of signal that almost never makes it into code or commit messages: the counterfactual space — what approaches were tried and rejected, which constraints emerged mid-session, why the chosen implementation looks the way it does.

That's what architectural decision records (ADRs) are designed to capture, and it's where the workflow naturally lands. Not committing the full transcript, but having the agent synthesize a brief ADR at the close of each session: here's what was attempted, what was discarded and why, what the resulting code assumes. Future maintainers — human or AI — need exactly that, and it's compact enough that git handles it fine.

JustFinishedBSGtoday at 11:23 AM

I understand the idea but the way I work, a commit isn't "a" session, it's potentially tens of sessions with branching in each session.

I honestly don't know if I'm doing something very wrong or if I have a very different working style than many people, but for me "just give the prompt/session" isn't a possibility because there isn't one.

I'm probably incredibly inefficient, because even when I don't use AI it is the same, a single commit is usually many different working states / ideas / branches of things I tried and explored that have been amended / squashed.

vtemiantoday at 7:34 AM

Git was designed for humans.

Commits, branches, and the entire model works really well for human-to-human collaboration, but it starts to be too much for agent-to-human interactions.

Sharing the entire session, in a human, readble way, offering a rich experiences to other humans to understand, is way better then having git annotations.

That's why we built https://github.com/wunderlabs-dev/claudebin.com. A free and open-source Claude Code session sharing tool, which allows other humans to better understand decisions.

Those sessions can be shared in PR https://github.com/vtemian/blog.vtemian.com/pull/21, embedded https://blog.vtemian.com/post/vibe-infer/ or just shared with other humans.

semiinfinitelytoday at 3:43 PM

Should your browser and search history be part of the commit too?

show 1 reply
hakanderyaltoday at 5:14 AM

I created a system which I call 'devlog'. Agent summarizes what it did & how it did in a concise file, and its gets committed along with first prompt and the plan file if any. Later due to noise & volume, I started saving those in a database and adding only devlog id to commit nowadays.

Now whenever I need to reason with what agent did & why, info is linked & ready on demand. If needed, session is also saved.

It helps a lot.

Lerctoday at 7:31 AM

I would say not, because it would lead some to think that what was said to the model represented what output was desired. While there is quite a bit of correlation with describing what you want with the output you receive, the nature of models as they stand mean you are not asking for what you want, you are crafting the text that elicits the response that you want. That distinction is important, and is model specific. Without keeping an archive of the entire model used to generate the output, the conversation can be very misleading.

Conversations may also be very non-linear. You can take a path attempting something, roll back to a fork in the conversation and take a different path using what you have learned from the models output. I think trying to interpret someone else's branching flow would be more likely to create an inaccurate impression than understanding.

D-Machinetoday at 6:23 AM

An important consideration somewhat missing in discussion in this thread: if we don't carefully document AI-assisted coding sessions, how can we ever hope to improve our use of AI coding tools?

This applies both to future AI tools and also experts, and experts instructing novices.

To some degree, the lack of documenting AI sessions is also at the core of much of the skepticism toward the value of AI coding in general: there are so many claims of successes / failures, but only a vanishingly small amount of actual detailed receipts.

Automating the documentation of some aspects of the sessions (skills + prompts, at least) is something both AI skeptics and proponents ought to be able to agree on.

EDIT: Heck, if you also automate documenting the time spent prompting and waiting for answers and/or code-gen, this would also go a long way to providing really concrete evidence for / against the various claims of productivity gains.

kaycey2022today at 10:42 AM

This feels woefully inadequate. It should be saving everything. Not just the prompts and replies, but also the tool calls and skill invocations. If that is too much, then why even save anything in the session?

Right now this paradigm is so novel to us that we don’t know if what is being saved is useful in anyway or just hoarding garbage.

There are some who (rightly IMO) just neatly squash their commits and destroy the working branch after merging. There are others who would rather preserve everything.

show 1 reply
nomilktoday at 9:50 AM

The way I've been storing prompts is a directory in the project called 'prompts' and an .md file for each topic/feature. Since I usually iterate a lot on the same prompt (to minimise context rot), I store many versions of the same prompt ordered chronologically (newest at top).

That way if I need to find a prompt from some feature from the past, I just find the relevant .md file and it's right at the top.

Interestingly, my projects are way better documented (via prompts) than they ever were in the pre-agentic era.

brainloungetoday at 6:36 AM

The more fundamental question is: Is there information in the AI-coding session that should be preserved? Only if the answer is "yes", the next question becomes: Where do we store that data?

git is only one possible location.

I think there is very valuable information in session logs, like the prompts, or the usage statistics at the end of the session, which model was used etc. But git history or the commit messages should focus on the outcome of the work, not on the process itself. This is why the whole issue discussion before work in git starts is also typically kept separately in tickets. Not in git itself, but close to it.

There're platforms like tulpal.com which move the whole local agent-supported process to the server and therefore have much better after-the-fact observability in what happened.

jollymonATXtoday at 2:26 PM

How verbose a history is even plausible to store and recall in modern git? This could add decent pressure on those mechanisms and the usability, for humans at least, would be taxing to consume.

alainrktoday at 4:51 AM

My complete reasoning, notes, errors have never been part of the commit. I don't see a valid reason on why the raw conversation must be included. Rather I have hooks (or just "manually" invoked) to process all of it and update the relevant documentation that I've been putting under docs/.

show 1 reply
micwtoday at 5:55 AM

IMO it depends a bit, but in most cases: No!

If you do proper software development (planing, spec, task breakdown, test case spec, implementation, unit test, acceptance test, ...) implementation is just a single step and the generated artifact is the source code. And that's what needs to be checked in. All the other artifacts are usually stored elsewhere.

If you do spec and planing with AI, you should also commit the outcome and maybe also the prompt and session (like a meeting note on a spec meeting). But it's a different artifact then.

But if you skip all the steps and put your idea directly to an coding agent in the hope that the result is a final, tested and production ready software, you should absolutely commit the whole chat session (or at least make the AI create a summary of it).

show 1 reply
gingersnaptoday at 7:20 AM

My instinct is to say that I don't want the session as part of the commit. For me that is like a Slack thread discussing the new feature, and that is not something I would commit. I think that the split shouldn't be "is this done with a machine"=> commit, I think the split for AI should be the same as before. Is it code or changes of code, then it should be included. Is it discussing, going back and forth, that is not commited now. On the other hand, if you do a plan that is then implemented, I actually do think it makes sense to save the plan, either as commit, or if you save that back to the issue.

umairnadeem123today at 4:45 AM

IMO this is solving the wrong problem. the session log is just noise - its like attaching your google search history to a stackoverflow answer to "prove" you did the research. nobody wants to read 500 lines of an agent going back and forth debugging a race condition.

the actual problem is that AI produces MORE code not better code, and most people using it aren't reviewing what comes out. if you understood the code well enough to review it properly you wouldn't need the session log. and if you didn't understand it, the session log won't help you either because you'll just see the agent confidently explaining its own mistakes.

> have your agent write a commit message or a documentation file that is polished and intended for consumption

this is the right take. code review and commit messages matter more now than they ever did BECAUSE there's so much more code being generated. adding another artifact nobody reads doesn't fix the underlying issue which is that people skip the "understand what was built" step entirely.

eddygtoday at 1:21 PM

https://specstory.com/specstory-cli is another tool in this space (it writes clean Markdown session files into the project for future reference)

jumploopstoday at 6:43 AM

I've been experimenting with a few ways to keep the "historical context" of the codebase relevant to future agent sessions.

First, I tried using simple inline comments, but the agents happily (and silently) removed them, even when prompted not to.

The next attempt was to have a parallel markdown file for every code file. This worked OK, but suffered from a few issues:

1. Understanding context beyond the current session

2. Tracking related files/invocations

3. Cold start problem on an existing codebases

To solve 1 and 3, I built a simple "doc agent" that does a poor man's tree traversal of the codebase, noting any unknowns/TODOs, and running until "done."

To solve 2, I explored using the AST directly, but this made the human aspect of the codebase even less pronounced (not to mention a variety of complex edge-cases), and I found the "doc agent" approach good enough for outlining related files/uses.

To improve the "doc agent" cold start flow, I also added a folder level spec/markdown file, which in retrospect seems obvious.

The main benefit of this system, is that when the agent is working, it not only has to change the source code, but it has to reckon with the explanation/rationale behind said source code. I haven't done any rigorous testing, but in my anecdotal experience, the models make fewer mistakes and cause less regressions overall.

I'm currently toying around with a more formal way to mark something as a human decision vs. an agent decision (i.e. this is very important vs. this was just the path of least resistance), however the current approach seems to work well enough.

If anyone is curious what this looks like, I ran the cold start on OpenAI's Codex repo[0].

[0]https://github.com/jumploops/codex/blob/file-specs/codex-rs/...

ramoztoday at 2:30 AM

We think so as well with emphasis on "why" for commits (i.e. intent provenance of all decisions).

https://github.com/eqtylab/y just a prototype, built at codex hackathon

The barrier for entry is just including the complete sessions. It gets a little nuanced because of the sheer size and workflows around squash merging and what not, and deciding where you actually want to store the sessions. For instance, get notes is intuitive; however, there are complexities around it. Less elegant approach is just to take all sessions in separate branches.

Beyond this, you could have agents summarize an intuitive data structure as to why certain commits exist and how the code arrived there. I think this would be a general utility for human and AI code reviewers alike. That is what we built. Cost /utility need to make sense. Research needs to determine if this is all actually better than proper comments in code

show 1 reply

🔗 View 50 more comments