logoalt Hacker News

bigyabaiyesterday at 9:34 PM0 repliesview on HN

You won't be RAM caching much of anything with experts that are 220b parameters worth of layers.