I just did something exactly like this. I have a self-hosted personal dashboard and one of the APIs I'm reading gives slightly too verbose of an output. So I added a feature to summarize the text using Qwen 3.5 2B which happily runs on a CPU. I've never clocked the tokens per second because I only generate like 100 tokens an hour in a very narrow domain of knowledge and speed isn't critical.