> The blog post implies that it currently requires 96GB of VRAM.
Has anyone tested what happens if you try and run this on lower-RAM Macs? It might work and just be a bit slower as it falls back on fetching model layers from storage.
It'd be way slower since you'd be doing that work every token
It'd be way slower since you'd be doing that work every token