> Preliminary trials with Claude Mythos Preview showed that it would not provide an apples-to-apples comparison with other models because of how we had set up the experiment and how the model was served.
What does this mean? My guess is they couldn’t co-locate Mythos close enough to reduce latency?
(I’m assuming this experiment pre-dates the export controls)
This mostly reads as a comparison between Opus 4.7 and 4.1 it would be more interesting if they reran the experiment against a team of humans with 4.7 and see how much the humans still improve the results today.
Fast? Sure. Good maintainable code? Doubted. I think they skip the right metrics there. So that's just their AI promo.
stop trying to make fetch happen
> However, once again, we are seeing a pattern whereby first, models are helpful to humans. Then, humans are helpful to models. Finally, models are largely able to do things themselves. We have seen this in cybersecurity and now the same dynamics are starting to take shape at the intersection of AI and the physical world.
It’s good they are the one seeing those things because otherwise no one else would have. Now if only seeing things would translate into getting any actual economic value out of them… instead of losing billions. But hey, who am I to do a reality check on this shameless piece of hype.
[flagged]
Do you want Terminators? Because this is how you get Terminators.
I'm getting a bit tired of these disguised adverts.
Here's how non robotics engineers used AI to do a short robot integration task faster than other non robotics engineers without AI.
Where "better" mostly means faster, and who knows what happens on longer horizons, with actual robotics experts, robustness requirements, or tasks where the hard part is control rather than API spelunking.