logoalt Hacker News

simonwyesterday at 6:38 PM3 repliesview on HN

I just ran one of these locally on a Mac like this:

  uvx litert-lm run \
    --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
  gemma-4-E2B-it.litertlm \
    --backend=gpu \
    --prompt="Generate an SVG of a pelican riding a bicycle"
The first time you run that it downloads 3.2GB to ~/.cache/huggingface/hub/models--litert-community--gemma-4-E2B-it-litert-lm

It can handle audio and image input too, which is pretty cool for a 3.2GB model. For images:

  uvx litert-lm run \
    --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
  gemma-4-E2B-it.litertlm \
    --backend=gpu --vision-backend gpu \
    --attachment image.jpg --prompt describe
And for audio:

  uvx litert-lm run \
    --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
  gemma-4-E2B-it.litertlm \
    --backend=gpu --audio-backend cpu \
    --attachment audio.wav --prompt transcribe
(The pelican is rubbish, but it's only a 3.2GB file so the fact it even outputs valid SVG is impressive to me: https://gist.github.com/simonw/94b318afde4b1ce5ff67d4b5d0362... )

Replies

reactordevyesterday at 8:44 PM

Not to mention the text-only 0.8GB version. Just crazy. You can have basic real-time conversations on-device that's video and audio aware now.

show 4 replies
rcarmoyesterday at 11:12 PM

Is that actually QAT? the MLX Community models have that in their names, but these don't, and the upload dates don't quite line up.

__mharrison__yesterday at 11:19 PM

As an aside uvx is so pleasant to use... I wish Nvidia supported it as first-class rather than making folks jump through Docker hoops.

show 1 reply