logoalt Hacker News

windexyesterday at 5:58 AM1 replyview on HN

What I do is i ask claude or codex to run models on ollama and test them sequentially on a bunch of tasks and rate the outputs. 30 minutes later I have a fit. It even tested the abliterated models.


Replies

codazodayesterday at 2:33 PM

Can you share the prompts?