Absolutely. These models still need a lot of this sort of hand holding, so they work best in experienced hands. I'm also skeptical of those very long runs, letting it go so long without active oversight must surely produce at least some objectionable design or implementation details, right? So I guess the people claiming those sort of results have less care for these sort of qualities.
Yes, even Claude Opus 4.6 is still running into accidents on longer chats which lasts for 3 - 4 days. But its getting better and better.