logoalt Hacker News

cafkafktoday at 8:20 AM0 repliesview on HN

If you get the inference engine to route the heavy matrix math to the GPU and the speculative drafting to the CPU without choking on latency it's probably gonna be very fast.

Would love to see the benchmarks if someone actually pulls something like that off.