Did you try an MLX version of this model? In theory it should run a bit faster. I'm hesitant to...

terhechte • yesterday at 1:59 PM • 2 replies • view on HN

Did you try an MLX version of this model? In theory it should run a bit faster. I'm hesitant to download multiple versions though.

Replies

tarruda • yesterday at 2:26 PM

Haven't tried. I'm too used to llama.cpp at this point to switch to something else. I like being able to just run a model and automatically get:

- OpenAI completions endpoint

- Anthropic messages endpoint

- OpenAI responses endpoint

- A slick looking web UI

Without having to install anything else.

KerrAvon • yesterday at 5:02 PM

Is there a reliable way to run MLX models? On my M1 Max, LM Studio seems to output garbage through the API server sometimes even when the LM Studio chat with the same model is perfectly fine. llama.cpp variants generally always just work.

alt Hacker News

Replies