You are cooking them wrong. You absolutely must ask the model to do grounding work — search online, search in your files, cross-check with different agents. They are universal reasoning engines, not a fact recalling tool.
Check my 'deep research' skill, you will get the idea
https://github.com/dandaka/skills/blob/main/deep-research/SK...