Claude in Claude code has been shown to perform persistently worse in evals than claude + a minimal harness.