I don't suppose you could point to any resources on where I could get started. I have a M2 with 64gb of unified memory and it'd be nice to make it work rather than burning Github credits.
You can then get Claude to create the MCP server to talk to either. Then a CLAUDE.md that tells it to read the models you have downloaded, determine their use and when to offload. Claude will make all that for you as well.
https://ollama.com
Although I'm starting to like LMStudio more, as it has more features that Ollama is missing.
https://lmstudio.ai
You can then get Claude to create the MCP server to talk to either. Then a CLAUDE.md that tells it to read the models you have downloaded, determine their use and when to offload. Claude will make all that for you as well.