If you are not planning to batch, you can run it much cheaper with Ryzen AI Max SoC devices.
Hell, if you are willing to go even slower, any GPU + ~80GB of RAM will do it.