I am trying to run models that are on the edge of what my hardware can support. I guess many people are.
So given, as the author states, Ollama runs the LLMs inefficiently, what is the tool that runs them most efficiently on limited hardware ?