I think your demo need more realistic thinking logs because thinking usually burns at least 2x to 3x of tokens of the code and for harder tasks much more.
You should check out https://tokey.ai, I made it a few months ago and has all of these suggestions.
Yes, it should use actual output from some of the open models.
Indeed, at 30tok/s make it pause for 20 seconds while "thinking" is streaming (and hidden); that's the real experience.