logoalt Hacker News

dkerstentoday at 8:52 AM1 replyview on HN

This is something I realised late last year while using Claude Code. The LLM shouldn't be the one in control of the workflow, because the LLM can make mistakes, skip steps, hallucinate steps, etc. Its also wasteful of tokens.

I'm a firm believer that a "thin harness" is the wrong approach for this reason and that workflows should be enforced in code. Doing that allows you to make sure that the workflow is always followed and reduces tokens since the LLM no longer has to consider the workflow or read the workflow instructions. But it also allows more interesting things: you can split plans into steps and feed them through a workflow one by one (so the model no longer needs to have as strong multi-step following); you can give each workflow stage its own context or prompts; you can add workflow-stage-specific verification.

Based on my experience with Claude Code and Kilo Code, I've been building a workflow engine for this exact purpose: it lets you define sequences, branches, and loops in a configuration file that it then steps through. I've opted to passing JSON data between stages and using the `jq` language for logic and data extraction. The engine itself is written in (hand coded; the recent Claude Code bugs taught me that the core has to be solid) Rust, while the actual LLM calls are done in a subprocess (currently I have my own Typescript+Vercel AI SDK based harness, but the plan is to support third party ones like claude code cli, codex cli, etc too in order to be able to use their subscriptions).

I'm not quite ready to share it just yet, but I thought it was interesting to mention since it aims to solve the exact problem that OP is talking about.


Replies

user34283today at 10:35 AM

I‘ve recently started to use skills and so far it’s been working great.

Your agent can write a python script to loop and simply call „claude -p“ or „codex exec“.

For simple workflows this seems good enough and can be set up in 10 minutes without third party software.

What do you think?

show 1 reply