logoalt Hacker News

wongarsutoday at 11:34 AM1 replyview on HN

Which conveniently fits on one 8xH100 machine. With 100-200 GB left over for overhead, kv-cache, etc.


Replies

storystarlingtoday at 7:39 PM

The unit economics seem pretty rough though. You're locking up 8xH100s for the compute of ~32B active parameters. I guess memory is the bottleneck but hard to see how the margins work on that.