logoalt Hacker News

mgc8yesterday at 8:53 PM2 repliesview on HN

Is there any indication of what compute resources this will actually require (in its various incarnations)? Does it incorporate any of the optimisations pioneered by Google (such as TurboQuant, MTP) or some other original innovations to make the frontier quality realistically available to local users?


Replies

wgdyesterday at 10:39 PM

The GLM-5 series is 744B-A40B. This is not a local model for any reasonable definition of local, but it's an open model which means (once they upload the weights in a week or so) there will be a dozen third-party inference providers competing on price per token.

show 2 replies
dakollitoday at 1:59 AM

If you have 80k in hardware you can run it.. There is not such thing as an effective local model that runs on consumer hardware, anybody telling you otherwise is lying, delusional. JuSt a FeW MoRe ReLeAsEs

show 1 reply