logoalt Hacker News

zozbot234today at 1:07 AM1 replyview on HN

> A Opus 4.7/Gpt5.5 class model is 5 trillion parameters[1].

You could run it on a cluster of nodes that each do some mix of fetching parameters from disk and caching them in RAM. Use pipeline parallelism to minimize network bandwidth requirements given the huge size. Then time to first token may be a bit slow, but sustained inference should achieve enough throughput for a single user. That's a costly setup of course, but it doesn't cost $900k.


Replies

nltoday at 1:18 AM

> You could run it on a cluster of nodes

Not sure this is a MBP either.

show 1 reply