Can you keep that GPU 100% saturated at least 16 hours per day every day of the week?
If not, you aren't breaking even.
Note this is also assuming you
(1) Rent your GPUs.
(2) Pay list price, no volume breaks.
(3) Get only 85 tokens/sec. Realistically, frontier models would attain 200+ tokens/second amortized.
Inference is extremely profitable at scale.
Note this is also assuming you
(1) Rent your GPUs.
(2) Pay list price, no volume breaks.
(3) Get only 85 tokens/sec. Realistically, frontier models would attain 200+ tokens/second amortized.
Inference is extremely profitable at scale.