You can make some educated guesses and find out some limits on inferencing cost by looking at 3rd party providers on platforms like openrouter. You can get some median cost /tok for a given model size. Then make some educated guesses on SotA model sizes, and you can get an estimate on pure cost of serving a model. Error bars and all that, of course. But still a range, with some limits.
> Rumors are worth squat
You can make some educated guesses and find out some limits on inferencing cost by looking at 3rd party providers on platforms like openrouter. You can get some median cost /tok for a given model size. Then make some educated guesses on SotA model sizes, and you can get an estimate on pure cost of serving a model. Error bars and all that, of course. But still a range, with some limits.