logoalt Hacker News

Koffiepoedertoday at 12:50 AM1 replyview on HN

We have an OCR job running with a lot of domain specific knowledge. After testing different models we have clear results that some prompts are more effective with some models, and also some general observations (eg, some prompts performed badly across all models).

Sample size was 1000 jobs per prompt/model. We run them once per month to detect regression as well.


Replies

misterchephtoday at 2:56 AM

While I believe that performance varies with respect to prompt, I have a seriously hard time believing that using the same prompt that was effective with the previous model would perform worse with the next generation of the same model from that lab and the same prompt.

show 1 reply