What does this mean?
> It's a different kind of tool doing a different kind of work, and that makes a clean apples-to-apples comparison to earlier models difficult.
They claim it’s a different kind of tool and then describe using it the same way you’d use any other model. This really felt way worse than the average Cloudflare blog and really just rehashed the Mythos announcement which had already called out the key parts being chaining and crafting examples.
> way worse than the average Cloudflare blog
How long has it been since you took your average? Lately all Cloudflare output has been heavily AI'd.
Sounds different because it’s hidden advertisement not a regular blog post
I think they're saying it has qualitatively different capabilities that make certain kinds of security work more worth pursuing with the model, not that the model of human-AI interaction has changed.
You're right that they're using a harness like everyone else. The general idea of giving the model a harness is not going to change. I mean even humans need harnesses to accomplish some things.
The post says they wrote a custom harness that orchestrates work between multiple separate model invocations. That is different from running Claude Code (which is a specific existing harness around the Claude models).
The post takes a while to get around to saying that, and could have included more detail besides the workflow diagram and table (which they flag as only "an example of" such a harness), but it does answer the question. It's a different kind of tool because it's a model rather than a harness+model pair.
> the model has its own emergent guardrails that sometimes cause it to push back on legitimate security research requests. But as we found, these organic refusals aren’t consistent - the same task, framed differently or presented in a different context, could produce completely different outcomes as illustrated in the examples below.
This was new. I'm surprised that a model specifically designed for security research and gated to professionals is refusing legitimate requests
'Its not X, its Y' is also a common LLM trope.
My guess is because it is a model trained specifically for security/hacking. So comparing it to Opus, trained for chat/code/etc., is apples-to-oranges.
I think what they might mean is:
Because of it's capabilities, a new kind of harness can be built for it, thus the entire system (model + harness) is a different kind of tool than say Claude code
> They claim it’s a different kind of tool and then describe using it the same way you’d use any other model. This really felt way worse than the average Cloudflare blog and really just rehashed the Mythos announcement which had already called out the key parts being chaining and crafting examples.
Hah, I was trying to parse this too.
Charitably perhaps they're being vague on exactly what's different because they're still under NDA.