Good point. Processing the substance of the answer might be too labor-consuming (1,000 claims x 5 mo...

kostaj • today at 3:13 PM • 1 reply • view on HN

Good point. Processing the substance of the answer might be too labor-consuming (1,000 claims x 5 models), but "thinking out loud" might improve the quality of the answers indeed. And we can still force/ask them to respond with a clear verdict at the end of their reasoning, as per the chosen rubric.

Replies

airstrike • today at 3:24 PM

If you have the model use a tool you can define the schema as a free text rationale field followed by one in the set of possible answers, so everything is nicely formatted as a JSON.

➕ show 1 reply

alt Hacker News

Replies