logoalt Hacker News

dan-robertsonyesterday at 3:26 PM0 repliesview on HN

I think being faster probably is important but it brings a bunch of challenges:

- the split pricing model makes it hard to tune model architecture for faster inference as you need to support fast and cheap versions.

- the faster the model is, the more it becomes a problem that they don’t ’understand’ time – they sit idle waiting for big compilations or they issue tools sequentially when they ought to have issued them in parallel.