Unsloth quantizations are available on release as well. [0] The IQ4_XS is a massive 361 GB with the ...

Yukonv • today at 5:06 PM • 1 reply • view on HN

Unsloth quantizations are available on release as well. [0] The IQ4_XS is a massive 361 GB with the 754B parameters. This is definitely a model your average local LLM enthusiast is not going to be able to run even with high end hardware.

[0] https://huggingface.co/unsloth/GLM-5.1-GGUF

Replies

zozbot234 • today at 6:14 PM

SSD offload is always a possibility with good software support. Of course you might easily object that the model would not be "running" then, more like crawling. Still you'd be able to execute it locally and get it to respond after some time.

Meanwhile we're even seeing emerging 'engram' and 'inner-layer embedding parameters' techniques where the possibility of SSD offload is planned for in advance when developing the architecture.

➕ show 1 reply

alt Hacker News

Replies