> Currently, I have a very lightweight harness - the pi agent with everything stripped (`pi -nc --offline`) and a short system prompt [1] to align it a bit with my style
This really is the secret to getting the most out of these models IMO. Pi is so damned good. I have a strongly tuned Pi for running Step 3.7 Flash (IQ4_XS) and Qwen 3.6 27B (FP8)
Also, thank you for llama.cpp mate :)
I have never heard of step 3.7 flash. Why do you like it? What rough spots have you encountered?