I'm on the verge of cancelling my anthropic $20 plan since it's come out. On an M5 Max 128GB, hooked up to the pi.dev harness, I get in the neighborhood of 400-450tps prefill and 30-35tps generation. It is imminently usable and at times feels more stable than my previous CC setup. Occasionally there are things it struggles with that I will bounce back over to CC for, but it is highly usable. The future is bright for local models! As a tinkerer, it makes me really happy to have a local setup I can be just as productive in, and not have the token overlords ready to shut me down at any time.
That's DS4 Flash right? How does it feel in intelligence and speed compared to DS4 Flash hosted by Deepseek themselves or another API provider? I've been using API DS4 Flash for a lot of personal projects and have been quite impressed. I've spent $1 on building ~10 toy projects and gotten them all to work within the bounds of what I wanted without having to do much besides guide the model away from dumb loops.