Can't run inference on encrypted weights and get any kind of performance out of it.
The whole system has encryption all the way through.
Otherwise, OpenAI/Anthropic would never use external clouds since the weights are some of the most valuable assets in the world.
The overhead shrinks with larger models. It doesn't seem that bad.
https://arxiv.org/pdf/2409.03992v2