logoalt Hacker News

bazzmttoday at 6:13 AM1 replyview on HN

Interesting approach! One question though: can the model do column detection?

The first OCR example returns output that does not detect the article columns - the bounding box is the entire first line.


Replies

yoeventoday at 7:10 AM

It can, you could try prompting the model to use object detection vision and text extraction, we realized when we purely extract text it does amazing at word/sentence level bounds since the text acts as the anchor. However, when you treat it as a object detection problem, it sees that chunk of text as a segment allowing you the extract it as one column bound. Give that a try.