I tried my "Generate an SVG of a pelican riding a bicycle" prompt against Gemma 3n 7.5GB from Ollama and 15GB for mlx-vlm and got a pleasingly different result for the two quantization sizes: https://simonwillison.net/2025/Jun/26/gemma-3n/
Is that actually a useful benchmark, or is it just for the laughs? I've never really understood that.
Given how primitive that image is, what's the point of even having an image model at this size?