Are you taking into account energy costs of running a 3090 at 350 watts for a very long time?
I doubt it’s at full TDP if it’s running at 0.2 tokens per second.
You can run a RTX3090 at 250w and still get a lot of its performance with nvidia-smi.
I doubt it’s at full TDP if it’s running at 0.2 tokens per second.