All the models used are shown with each page of translation and each book has a whole data provenanc...

dr_dshiv • yesterday at 8:07 PM • 3 replies • view on HN

All the models used are shown with each page of translation and each book has a whole data provenance treatment.

You can add it up!

Replies

I don't see raw token counts, just a list of steps and page counts. For example, what is the rough average token count per page in the ocr and in the translation steps for a Greek book?

I have seen Gemini costs change quite a bit when processing very similar books from the same series lately, mainly because thinking tokens have increased about 5x. Has that has happened to you as well?

Edit: for ocr I am using about 15k-25k tokens per page, but I have a complex prompt.

mmargenot • yesterday at 8:44 PM

How do you handle the more densely written pages in script ? I did a very similar exercise OCRing works from this exact collection, but I stuck with the English books for the first pass.

efilife • yesterday at 9:13 PM

Can't you just tell him?

alt Hacker News

Replies