As someone with a 3060, I can attest that there are really really good 7-9B models. I still use berkeley-nest/Starling-LM-7B-alpha and that model is a few years old.
If we are going for accuracy, the question should be asked multiple times on multiple models and see if there is agreement.
But I do think once you hit 80B, you can struggle to see the difference between SOTA.
That said, GPT4.5 was the GOAT. I can't imagine how expensive that one was to run.