Probably the data center where the model is running more than anything. Another option is if Opus is...

twobitshifter • today at 11:14 AM • 0 replies • view on HN

Probably the data center where the model is running more than anything. Another option is if Opus is using anything like a Mixture of Experts approach, in which case the amount of the model loaded in memory at one time could be smaller than GLM.

alt Hacker News