You can't unit test for taste if you haven't written down what you mean by taste. If you c...

trjordan • today at 1:18 PM • 15 replies • view on HN

You can't unit test for taste if you haven't written down what you mean by taste. If you can externalize it, then you can.

Follow this line of thinking, and the AI-friendly answer is easy: we just have to externalize everything we know, so Claude can implement what I want.

Except that I can't fully externalize myself. Debugging a system takes more resources than running the system. If I could write down everything I know and hand it to a machine, I'd do that, but it impossible.

People aren't books or hashmaps. If you want to build something, you need to use the tools, not teach the tools to use you.

[edit: I'm trying to figure out if there's something to be done about this. Email me if you want to chat -- tr at tern dot sh]

Replies

ElevenLathe • today at 5:51 PM

The bigger problem I have as a worker is that, once I externalize it (by writing a skill or whatever), it becomes a work-for-hire whose copyright is owned by my employer. Technically this is true of a few other things I do for work, like my .emacs and .bashrc files, small scripts I keep in ~/bin on my workstation, etc., but no employer cares to assert this unless they're being assholes for some unrelated reason. Agent skill files, especially ones that seem to semi-reliably do what they say on the tin (the white whale!), are not like that at all, and I can see them pursuing you if you try to use them at a future employer.

bonzini • today at 1:21 PM

It can't be written down as code, that's the point.

I am more familiar with taste in coding and it can at best be described—that the resulting code is too subtly different from something else in the codebase, that you're masking a different bug, that you're not following what the code tells you. The good part is that while this cannot be unit tested, you can write documentation and code comments about it that tell people what they need to know.

But for taste of the kind described in the article there's not even a definition. The logic ended up being "trust a bunch of opaque weights the most"

➕ show 2 replies

fny • today at 4:17 PM

You absolutely cannot unit test for taste.

I had this experience doing a port from Big Query to Postgres using Opus. I had unit tests to guarantee parity with the original code, and Opus insisted on building this bespoke query builder (e.g. `def _where(very_complicated_params)`) on top of sqlglot.

Even with the original code being straightforward and legible and repeated instructions to match, I had to fight with it to get close.

In the end, I ended up doing things the "old fashion way" where I copied chunks code into Claude proper and gave explicit instructions for each piece.

I clearly had externalized the requirements, and yet that wasn't sufficient. The only way to unit test further would be to use an AST to evaluate the output against metrics I couldn't even encode.

vinay_ys • today at 6:35 PM

You can externalize the things you consider as taste by writing down generalized statements, but those statements need boundary conditions and exceptions to be also specified. Except, exceptions have exceptions and when to apply the rule vs when to use exception is contextual judgement. so, whatever residual that cannot be explicitly and unambiguously and generally spelled out, we call it as taste/judgement.

giancarlostoro • today at 2:16 PM

What's kind of funny is this is how I implemented "gates" for the ticketing system I built for Claude, because Beads would just close tickets without validation. I have tickets that are literally "Human validation" tier, so it will work on the next available thing until I personally tell the model to close it. So, in that spirit, yeah, you can unit test for taste, if you implement external validation.

Unit test runs, waits for human input before passing or failing, which might seem out of the norm, but we already have QA do manual testing.

Dumblydorr • today at 2:43 PM

Randomized trial. Half of them pledge to use AI freely and liberally, half of them to never use it, compare via surveys and off-AI tests after X months. Could even flip it so then the non-users used it for X months and vice versa, see if losses/gains are stable.

delichon • today at 2:02 PM

You may be able to effectively externalize taste by "hot or not" style pair testing. Enough comparisons and I'd expect ML to be able to mimic human taste by latching on to features we're not well aware of influencing us.

➕ show 2 replies

tmoertel • today at 2:29 PM

> You can't unit test for taste if you haven't written down what you mean by taste. If you can externalize it, then you can.

I'm not so sure. For instance, you can write down what it means for a program to be free of XSS and other injection vulnerabilities. Now, how would you unit test for that property?

petra • today at 4:07 PM

Is there an issue of taste when generating images with AI ? or can we relatively rapidly train people to generate beautiful images with decent amount of variety ?

➕ show 1 reply

punnerud • today at 2:40 PM

If you have enough examples you can train an AI on your preferences, then use that distilled AI as a unit test. Don’t combine multiple into one AI. If they don’t agree you want it to fail so you can decide and retrain the tests.

eithed • today at 3:03 PM

I agree and indeed externalize everything you know *that matters*.

Want to follow certain pattern, or convention - define it, ie active record vs repository pattern, stick is as an ADR! You don't know what you want? Look at what Claude produces and then acquire taste, mark this as convetion that future sessions will follow, but stick to *one* convention!

Treat your LLMs as junior developers willing to apply various patterns willy nilly, caring only about fulfilling the ACs of given task and not about the longevity or well being of the system in general. They will not look at bigger picture to check if given pattern applies globally, or even if there are any other patterns.

pydry • today at 2:31 PM

I remember reading an interview with a fireman who described a time when his buddy evacuated a team because he "felt" that a floor would collapse imminently.

He couldn't articulate why but they trusted his gut and it did collapse.

A lot of software engineering relies on that kind of intuition and on a good team you can integrate it and benefit from it and avoid all manner of floor collapses.

➕ show 1 reply

sigbottle • today at 2:05 PM

Exactly. Every single philosophical statement in history runs up against the issue where you can just say, "yeah, it's pretty much this. You just need to do <arbitrarily hard unspecified thing that is basically unfalsifiability>". (Including this one)

And maybe that's just our limits with philosophy, modeling, assumptions, whatever. The danger is not realizing when we're in that zone.

(Fwiw I think unfalsifiability is a limit with any system - "you didn't compile in my syntax/semantics" is an gotcha that's actually valid and useful, but nobody can really determine the hard line)

deadbabe • today at 3:07 PM

You cannot externalize taste. You could perhaps mimic someone’s taste, but that’s not the taste. Knowing the taste requires actually tasting it. You can’t capture the taste, it’s already gone.

➕ show 1 reply

jimmypk • today at 3:13 PM

[flagged]

alt Hacker News

Replies