Yes, there's a wide variety of use cases that require different ratios of accuracy/speed. ...

scottmu • yesterday at 8:08 PM • 1 reply • view on HN

Yes, there's a wide variety of use cases that require different ratios of accuracy/speed. If you require 3 responses to be accurate, you have to multiply all 3 response accuracy probabilities, and as you've shown, this can reduce overall accuracy quite a bit. Of course, this does make the assumption that those 3 responses are independent of one another.

Replies

all2 • yesterday at 9:14 PM

One thing I considered some months ago that was very similar to what you guys have done, but at a higher abstraction layer:

1. Consult many models (or a single model with higher temp) with the same prompt

2. Intelligently chunk the outputs (by entity, concept, subject, etc.)

3. Put each chunk into a semantic bucket (similar chunks live in the same bucket)

4. Select winning buckets for each chunk.

4a. Optionally push the undervoted chunks back into the model contexts for followup: is this a good idea, does it fit with what you recommended, etc.

4b. do the whole chunk/vote thing again

5. Fuse outputs. Mention outliers.

Token spend is heavy here, where we rely on LLMs to make decisions instead of the underlying math you guys went with. IMO, the solution y'all have reached is far more elegant than my idea.

➕ show 1 reply

alt Hacker News

Replies