wtf even is “mythos-like” when smaller models can find all the same kinds of issues if you just prod it a bit more