logoalt Hacker News

arjieyesterday at 6:20 PM2 repliesview on HN

Wait, this is incredible. I have a spare 5090 lying around and run a claw-like on my M4 Mini. Just plugging it into some sort of 3D print frame for stability and plugging it into the TB port might get me a pretty viable tool for local inference. Would need something neat to ensure the power etc. is well fed.

The problem is `max-num-seqs` and `max-model-len` fight each other, and unless you're in the pure single-client mode you'll need multiple slots so to speak.


Replies

pat_spaceyesterday at 8:18 PM

If you get too busy to take advantage, I'll take that spare 5090 off your hands, free of charge :)

originalvichyyesterday at 10:18 PM

Whilw you can just print something, look into eGPU enclosures. Modern cards are xboxhueg but maybe someone has one lying around and it might help with sound and airflow