logoalt Hacker News

kstenerudtoday at 2:26 PM2 repliesview on HN

Wow... Our experiences have been very different, then. I've found each upgrade of Opus to be a noticeable improvement in its complex reasoning and delegation capabilities over its predecessor.

To me, this feels in many ways like a technical manager or team lead's job, where I guide the process along using my knowledge and experience, and then let the agent fill in the rest (to the best of its ability).

The agent can't really learn from its mistakes (at least, not without consuming precious context), so I apply a blameless postmortem process, updating the guardrails whenever it goes astray in the same way more than once.

And really, I'd rather be contemplating the more difficult and interesting questions of architecture, environment, ergonomics and market fit, so it suits me fine.


Replies

mwigdahltoday at 2:57 PM

Same here. The power upgrade going to Fable in particular is quite impressive.

epolanskitoday at 3:23 PM

> Wow... Our experiences have been very different, then. I've found each upgrade of Opus to be a noticeable improvement in its complex reasoning and delegation capabilities over its predecessor.

I haven't stated that it's not more capable nor more "intelligent", it's the opposite.

I will try to expand on what I mean.

LLMs "character/persona/tendencies" are increasingly less about acting as an assistant and more about finding the solution itself.

I use AI in a specific way: he assists, investigates and answers my question. I do the coding. It is increasingly difficult to use it as such, because it quickly jumps into giving me solutions instead of answering my specific questions.

I'll give you few examples.

I asked it to investigate DNS handling details in phoenix emailer module work, he did very little investigation and jumped into why I should've used magic links instead. Instead of assisting me in my research, it was hard wired to solve the problem (the wrong one, with a very wrong solution).

Today at work, I had a problem with batching, I wanted to understand if batching was even needed at all for our use case, and he kept circling around how to fix the batching bug instead. That's not what I asked it to do, yet, it jumped to the "solution".

I am increasingly frustrated by these models "personality" and tendencies that are unhelpful to assist me doing the task at hand and more on it doing it and me merely assisting/supervising.

Sure, very detailed prompting on how he has to act helps, but wait few turns and he drifts again to his default solution vomiting state.

Which makes me think that these models are hard wired on this mode of operation by consistent training and reinforcement of jumping from prompt to code solution.

show 1 reply