Am I missing something or are the Ollama versions of this (https://ollama.com/library/gemma4/tags) text-only for now?
Since ollama has diverged from llama.cpp, it will take a bit of time for ollama to support multi-modality. If you're using plain llama.cpp it looks like a PR has already merged for this model with vision and audio support:
https://github.com/ggml-org/llama.cpp/pull/24077
Since ollama has diverged from llama.cpp, it will take a bit of time for ollama to support multi-modality. If you're using plain llama.cpp it looks like a PR has already merged for this model with vision and audio support:
https://github.com/ggml-org/llama.cpp/pull/24077