logoalt Hacker News

irthomasthomasyesterday at 6:25 PM0 repliesview on HN

I dont think thats plausible because they also just launched a high-speed variant which presumably has the inference optimization and smaller batching and costs about 10x

also, if you have inference optimizations why not apply them to all models?