Then test Bob on what you actually want him to produce, ie novel problems, instead of trivial things that won't tell you how good he is.
Why is it a problem of the LLM if your test is unrelated to the performance you want?
What people forget about programming is it is a notation for formal logic, one that can be executed by a machine. That formal logic is for solving a problem in the real world.
While we have a lot of abstractions that solve some subproblems, there still need to connect those solutions to solve the main problem. And there’s a point where this combination becomes its own technical challenge. And the skill that is needed is the same one as solving simpler problems with common algorithms.
How can Bob produce novel things when he lacks the skills to do even trivial things?
I didn't get to be a senior engineer by immediately being able to solve novel problems. I can now solve novel problems because I spent untold hours solving trivial ones.