I've worked extensively with the slightly less able cousin, the 35B A3B model and tuned my own harness around making it work well with local or non-sota models. The results are quite promising [0], if one sticks to a plan-execute approach. After a bit of fiddling with llama.cpp I was able to get it to work through a small change on a real codebase from work on a 32GB M5 (typical python FastAPI backend, so nothing out of the ordinary). While that's somewhat encouraging, the whole local experience was still far from pleasant with all the noise and heat.
[0] https://deepclause.substack.com/p/how-to-make-small-models-p...
What harness are you using?