If you get the inference engine to route the heavy matrix math to the GPU and the speculative drafti...

cafkafk • today at 8:20 AM • 0 replies • view on HN

If you get the inference engine to route the heavy matrix math to the GPU and the speculative drafting to the CPU without choking on latency it's probably gonna be very fast.

Would love to see the benchmarks if someone actually pulls something like that off.

alt Hacker News