The raw CDP approach makes sense for the reasons you described, but it trades one set of problems for another. When you let the LLM write its own CDP calls, you get flexibility but lose auditability — it becomes hard to reproduce exactly what the agent did in a session when debugging failures.
We ran into this when evaluating browser automation frameworks at AgDex. The ones that wrap CDP in deterministic helpers are slower to add features but much easier to debug in production. The "agent wrote its own helper" moment is magical in demos, but in prod you want a diff you can review.
Probably the right answer is what you're implicitly building: a minimal harness with good logging, so you can replay the CDP calls post-mortem. Is that something you're planning to add?