I typically find myself using a context of between 150-500k with GPT models so local models are simply not enough and I stopped using them.
large contexts degrade the performance - attention doesn't work will for large windows like that and cloud models are kind of hacking it
local models do involve some context engineering to get it okay, but it's not that rough
That's way higher than their optimal ceiling (and absolutely suboptimal from a token cost point of view), why are you doing that?