I use a 4090 and 96GB ram to run local models slowly (atm Qwen-code-next at 7 tps) with their full c...

PeterStuer • today at 5:24 AM • 0 replies • view on HN

I use a 4090 and 96GB ram to run local models slowly (atm Qwen-code-next at 7 tps) with their full context window. I keep this up just for testing and practicing fallback should I lose access to Claude and GPT.

alt Hacker News