Has anyone experiment with using VLM to detect "marks"? Thinking of pen/pencil based markings like underlines, circles,checkmarks.. Can these models do it?
None of them do it well from our experience. We had to write our own custom pipeline with a mixture of legacy CV approaches to handle this (AI contract analysis). We constantly benchmark every new multimodal and VLM model that comes out and are consistently disappointed.
None of them do it well from our experience. We had to write our own custom pipeline with a mixture of legacy CV approaches to handle this (AI contract analysis). We constantly benchmark every new multimodal and VLM model that comes out and are consistently disappointed.