I've been experimenting with Hermes, I'm convinced hermes is also just bad. Like as a harness it has got to be doing something to lobotomize these models- Even GPT-5.4 performs badly in Hermes vs just using it in Codex.