I find it hard to believe that these LLM systems with their enormous training sets and built-in syst...

krupan • today at 12:19 PM • 3 replies • view on HN

I find it hard to believe that these LLM systems with their enormous training sets and built-in system prompts have their output meaningfully modified by a few paragraphs of extra prompting in the form of these skill files, BUT, it is cool to see people writing out consise, focused documents like this. These would have great to have as a young developer, and great for several of the teams I've worked in in the past. I dabble with python for automating things here and there and I just learned some new things reading __mharison__'s skill in the comments here.

This kind of wisdom used to be cfound in blog posts, or in the beads of more senior developers, but they were never written out as concisely as these skill files. It's kinda funny that billions of dollars had to be spent creating a machine that's a rough human analog needing guidance to get us to produce these documents

Replies

Nizoss • today at 5:28 PM

I use a different approach, I enforce TDD using hooks. Think of it this way: You interact with your agent and ask it to implement a feature. Now every change it wants to make will have to be approved by a separate agent. This second agent is spawned using the SDK and can see the pending change, recent session history for context, instructions on how to interpret the information in relation to TDD, and any project custom instructions.

This setup works great especially when you work with multiple agents or sessions in parallel and don’t want to be babysitting TDD. You just know that no TDD shortcuts or violations will be made and can focus on the solution instead. Agents are good at internally justifying shortcuts and lowering what’s good enough as the session goes. You can notice this when you ask them to review their own work compared to when asking a new session to review the changes. The difference is stark.

What’s interesting about the TDD instructions I dogfooded for this is that there is a lot that is implicit about how to interpret operations in terms of TDD violations. For example, earlier versions of the instructions had the validation agent block multi-step refactor changes because there was no guarantee to them that further changes will follow. It would also block changes when a definition is removed while it is still being called. The reasoning is that the code will no longer build and thereby not fulfill the ”refactoring is allowed under green”. Improving the wording and clarifying the process helped from this unwanted false blocks.

If you want to give this approach a try, you’ll find it here. I’m the author and I’m happy to and any further questions: https://github.com/nizos/probity

jasonswett • today at 12:29 PM

The reason it works is because there's a difference between the model knowing something and the agent doing something. Claude will happily write giant untested functions even though it "knows" that short functions are easier to understand and then testing enables safe refactoring etc. The model also "knows" many conflicting "facts", such as the fact that testing is smart and that testing is a waste of time. It can't act on both beliefs at the same time. That's why nudging it toward your own preferred behaviors works.

➕ show 2 replies

vikramkr • today at 3:02 PM

Why is that hard to believe? It's literally the prompt telling it what to do - if you want a poem about watermelons you tell it to write a poem about watermelons, if you want tests you tell it to write tests. It's not like TDD is some universal pattern that every llm will naturally optimize towards

alt Hacker News

Replies