With speculative decoding you can use more models to speed up the generation however.

grumpoholic • today at 1:01 PM • 0 replies • view on HN

alt Hacker News