logoalt Hacker News

sinandreiyesterday at 5:16 PM1 replyview on HN

Has anyone experiment with using VLM to detect "marks"? Thinking of pen/pencil based markings like underlines, circles,checkmarks.. Can these models do it?


Replies

leetharrisyesterday at 5:31 PM

None of them do it well from our experience. We had to write our own custom pipeline with a mixture of legacy CV approaches to handle this (AI contract analysis). We constantly benchmark every new multimodal and VLM model that comes out and are consistently disappointed.

show 1 reply