I have researched for quite a bit and so far the fastest runtime is the oMLX one. But there's a...

egorfine • today at 3:56 PM • 1 reply • view on HN

I have researched for quite a bit and so far the fastest runtime is the oMLX one. But there's a caveat: ttft on MLX on M4 Pro is enormous. On M5 Pro it has been greatly sped up.

Replies

regexorcist • today at 5:03 PM

Curious if you tested llama.cpp and still found oMLX faster? I haven't tried the latter myself, might give it a go.

➕ show 1 reply

alt Hacker News

Replies