any tips for running it locally within an agent harness? maybe using pi or opencode?

zengid • yesterday at 7:45 PM • 1 reply • view on HN

Replies

It pretty much just works. Run the unsloth quant in llama.cpp and hook it up to pi. A bunch of minor annoyances like not having support for thinking effort. It also defaults to "interleaved thinking" (thinking blocks get stripped from context), set `"chat_template_kwargs": {"preserve_thinking": True},` if you interrupt the model often and don't want it to forget what it was thinking.

alt Hacker News

Replies