Does it mean if it was embedded on a Talaas chip, it could generate ~50,000+ tokens per second?

smusamashah • today at 5:58 AM • 1 reply • view on HN

Replies

Think pretty much anything is going to get a enormous speed boost if the model isn’t undergoing mem latency but is just inherently baked into the circuits asic style

alt Hacker News

Replies