logoalt Hacker News

pbroneztoday at 11:36 AM1 replyview on HN

Rapid MLX team has done some interesting benchmarking that suggests Qwopus 27B is pretty solid. Their tool includes benchmarking features so you can evaluate your own setup.

They have a metric called Model-Harness Index:

MHI = 0.50 × ToolCalling + 0.30 × HumanEval + 0.20 × MMLU (scale 0-100)

https://github.com/raullenchai/Rapid-MLX


Replies

JumpCrisscrosstoday at 11:38 AM

Pardon the silly question, but why do I need this tool versus running the model directly (and SSH’ing in when I’m away from home)?