Personally I run an ollama server. Models load pretty quickly.
There's a distinction between tokens per second and time to first token.
Delays come for me when I have to load a new model, or if I'm swapping in a particularly large context.
Most of the time, since the model is already loaded, and I'm starting with a small context that builds over time, tokens per second is the biggest impactor.
It's worth noting I don't do much fancy stuff, a tiny bit of agent stuff, I mainly use qwen-coder 30a3b or qwen2.5 code instruct/base 7b.
I'm finding more complex agent stuff where multiple agents are used can really slow things down if they're swapping large contexts. ik_llama has prompt caching which help speed this up when swapping between agent contexts up until a point.
tldr: loading weights each time isn't much of a problem, unless you're having to switch between models and contexts a lot, which modern agent stuff is starting to.