I think your demo need more realistic thinking logs because thinking usually burns at least 2x to 3x...

SXX • yesterday at 7:02 PM • 3 replies • view on HN

I think your demo need more realistic thinking logs because thinking usually burns at least 2x to 3x of tokens of the code and for harder tasks much more.

Replies

unglaublich • yesterday at 7:24 PM

Indeed, at 30tok/s make it pause for 20 seconds while "thinking" is streaming (and hidden); that's the real experience.

➕ show 1 reply

sig_kill • yesterday at 10:23 PM

You should check out https://tokey.ai, I made it a few months ago and has all of these suggestions.

redox99 • yesterday at 8:30 PM

Yes, it should use actual output from some of the open models.

alt Hacker News

Replies