This is the pain point that existed for years now and its still not solved at all.
"If A, do X. Do B,C,D. Do A" - and it just never uses X because "it forgot".
You just cant trust that the time you spend building rules will actually pay off, in fact you can trust that it will fail you sooner or later.
RAG, Harness, Skills... all was supposed to fix this, but in reality it never had.
Harnesses do fix it IMO - it’s why Claude code and Codex had a massive jump in alleged productivity on release and then seems to have flatlined. But a custom harness _would_ allow you to do things like “on every message, run lint validation and tests”. That in and of itself would be wildly useful.