try here, I hate llms but this is crazy fast. https://chatjimmy.ai/
The full answer pops in milliseconds, it's impressive and feels like a completely different technology just by foregoing the need to stream the output.
We need that for this chinese 3B model that think 45s for hello world but also solves math.
Because most models today generate slowish, they give the impression of someone typing on the other end. This is just <enter> -> wall of text. Wild