Possibly the best deal there is
I really need to shut up, or bite the bullet and by one.
If you graph the tokens per second on the 5090, your jaw will hit the floor at how cheap it is
The 5090 is crap for inference. Unless you like dummy models, sure they will run at light speed. All the rage is MoE with 500B-1T weights nowadays.
With only 32gb of vram, you can only run small/quantized models, in which case what's the point? At $4000, that gets you 20 months of 10x claude or chagpt subscriptions, which provide far better models. You'd need some use case where you can tolerate worse models, and use a steady supply of them. That doesn't match most people's usage patterns.