logoalt Hacker News

pianopatrickyesterday at 5:07 AM3 repliesview on HN

Currently I'm testing something like this just to see what happens. I have an old laptop with 4GB of RAM. I attached a USB drive with Gemma 4 31B model (which is 32.6 GB). Currently the laptop is running llama.cpp and trying to respond to a prompt by streaming the model from disk.

The USB drive light is flickering, showing something is happening. It's been about 8 hours since I entered the prompt and I've gotten about 10 tokens back so far. I'm going to leave it running overnight and see what happens.


Replies

zozbot234yesterday at 7:01 PM

Wow, that's a true worst case scenario especially if the USB is just plain old USB 2.0 (max 480 Mbps) and/or if the drive is a spinning disk. How's the CPU doing, though? Is there any headroom given the USB bottleneck?

show 1 reply
stuaxoyesterday at 8:25 AM

Nice.

What did you use to do this, something standard like llamacpp or something else like vllm or your own contraption ?

show 1 reply