Fugu Ultra [0] is not actually a model, it's a system (harness in the cloud?) that routes to several models, looks like it's a bit like OpenRouters Fusion [1].
"Rather than a single monolithic model, Fugu is a learned multi-agent orchestration system: a language model trained to route tasks across a swappable pool of underlying models and to recursively call instances of itself." - https://openrouter.ai/sakana/fugu-ultra
[0] https://sakana.ai/fugu/
[1] https://openrouter.ai/openrouter/fusionThe "Mythos-like" talk is getting kinda annoying. Us normal people have no way to compare it outside of looking at benchmarks
Without reliable benchmarks, they are Mythos-like only in the sense that they accept text as input and produce text as output.
They have an impressive set of investors [1]. Also, HN Headline [2] from the other day with 100+ comments.
I'm expecting a ban of "foreign" llms due to "safety concerns" before the year is over.
It will have nothing to do with the actual performance. But anthropic has set the bar for mythos-like systems, and whatever meets that loosely defined bar will be unsafe for the public.
My cinic take is, if the model is decent it would be hard to disprove their claim of it being Mythos-like, since now Mythos is unavailable.
First impression: Third-party benchmarks or gtfo. Personally, I've never heard of either of these companies before. We're just supposed to take their word that they've matched the best models on the market?
Sakana describes their model as a "Orchestration Model." Does that mean that it's actually a bunch of different models glued together?
I'm a simple man, I see no benchmarks at https://arena.ai/leaderboard - I can 100% tell this is a scam.
Just like many comments have been saying here, I also tested Fugu and some others and what I noticed is that they are quite expensive models, 20$ is not enough to complete a full workflow which in Opus it's possible, sure you might need to improve your prompt from the get go with Opus if you want the best results but so far that's my experience.
My next test will be Agentic systems and see how they perform
Feels like I need to repeat myself more than once a day now: https://news.ycombinator.com/item?id=48697258
> These companies providing tokens, whether SOTA or not, that want to IPO are so fucked as time goes on.
>Can't sell their SOTA models, only slightly better than the open source models for the models they can sell, cost 20x to 50x for good models, a TAM that consists almost solely of developers, with no customer of theirs actually boasting increased profits as a result of AI...
> I fear their time to IPO may have passed.
What on earth could Anthropic and OpenAI Pivot to now?
GLM produces pretty decent websites.
if fugu really is an orchestrator dispatching to opus/gpt under the hood (as the openrouter page suggests), the $20-in-one-prompt complaints actually start making sense — you're paying api markup twice.
I doubt it will rival Mythos or the upcoming Sol, and if it's not open weights it doesn't really matter in the grand scheme of things. Still, I applaud the asian LLM efforts and hope they keep up the pressure on the americans.
unless they launched 10t param models, or figured out some amazing new way to compress as many params into say 100b, I doubt it's anywhere near "mythos level". and I have no idea how many params mythos has but that was just some hear say.
Competition is accelerating, but the next breakthrough isn't just better models it's better connectivity. AgentKey bridges AI agents with real-world tools, APIs, and data.
So now as a regular American we are behind because gatekeepers saying super intelligence is too scary
It was bound to happen soon.
I think it is time that we had a UN-sponsored standards body dedicated to bench-marking the newest models from around the world, for everyone's benefit.
wtf even is “mythos-like” when smaller models can find all the same kinds of issues if you just prod it a bit more
Excellent. I'm very thankful the asian/chinese don't give a fuck about the US government. It feels good to have a competitor.
Given the national security implications, it's no surprise that Japan and China are rushing to build sovereign models post-ban. But when these startups claim parity with "Mythos," could it be that they are just optimizing for very specific inference tasks? I wonder if we are seeing the real battleground shift from raw training scale toward specialized inference.
[dead]
asian is bad wording. this is a japanese startup backed by khosla ventures. japan is an ally of west. the title makes it sound like a chinese company did this.
YES! Now things become even more interesting. US, your move.
I tried the Fugu models with some real world tales in C# and unity using mcp and open code. I exhausted the $20 plan 5 hour window in one prompt to review my theme system and plan some color changes. So I upgraded to the $100 to see the implementation and result. Well the result was worse than Opus, incredibly slow, and I ended up exhausting the new 5 hour window and have used 35% of the weekly now and it hardly created something opus was able to do at a fraction of the time and cost.
Do what you wish with this info, but it seems to be a complete waste of $$.