logoalt Hacker News

zengidyesterday at 7:45 PM1 replyview on HN

any tips for running it locally within an agent harness? maybe using pi or opencode?


Replies

stratos123yesterday at 9:07 PM

It pretty much just works. Run the unsloth quant in llama.cpp and hook it up to pi. A bunch of minor annoyances like not having support for thinking effort. It also defaults to "interleaved thinking" (thinking blocks get stripped from context), set `"chat_template_kwargs": {"preserve_thinking": True},` if you interrupt the model often and don't want it to forget what it was thinking.