logoalt Hacker News

nickservtoday at 11:32 AM0 repliesview on HN

Gave it a try for structured data extraction. Tested returning a JSON object from images.

The output was correct, and seemed deterministic, although I ran it only 2-3 times on the same image.

Main problem is response time: it took about 20-25 seconds for a simple structure of 5 fields. As such unusable at scale, let alone "real time" processing.

Other problem is cost, it is considerably more expensive than more established models for the same document, like flash-light.

Shame, the architecture is very interesting.