It runs like shit though in terms of tokens/second and still has a reduced context window. Vs a...

Our_Benefactors • today at 8:16 PM • 1 reply • view on HN

It runs like shit though in terms of tokens/second and still has a reduced context window. Vs a single claude prompt can easily get into 300k tokens without breaking a sweat.

I want local AI to be a thing but the hardware isn’t here yet, because the only options are a Mac Studio or DGX machines strapped together. RAM prices needs to crash before local AI has a chance at actually competing.

Replies

zozbot234 • today at 8:38 PM

The more recent Chinese models are no longer heavily limited by context size. It can easily fit in RAM on a prosumer laptop. (You can also use swap space to extemd that, since context is only written to once per inference, thus a relatively mild wear-and-tear concern.)

alt Hacker News

Replies