200 GB is an unfathomable amount of main memory for a CPU
(with apologies for snark,) give gpt-oss-120b a try. It’s not fast at all, but it can generate on CPU.
But it's incredibly incapable compared to SOTA models. OP wants high quality output but doesn't need it fast. Your suggestion would mean slow AND low quality output.
But it's incredibly incapable compared to SOTA models. OP wants high quality output but doesn't need it fast. Your suggestion would mean slow AND low quality output.