It would be nice to run a model that isn't quantized to death so it fits in 12GB of VRAM so I h...

giancarlostoro • yesterday at 12:53 PM • 0 replies • view on HN

It would be nice to run a model that isn't quantized to death so it fits in 12GB of VRAM so I have room for reasonable context window, but also, this is ONE model in a set of models, the rest of the models need to run in a GPU cluster apparently.

alt Hacker News