Hi, Parseur founder here :D I understand what they are trying to do, but to me it feels like the m...

joss82 • today at 4:07 PM • 1 reply • view on HN

Hi, Parseur founder here :D

I understand what they are trying to do, but to me it feels like the moment when MongoDB entered the database space, with semi-structured, "flexible" storage format. It has its uses, for prototyping mostly.

But in high-volume, production workloads, giving a structure to the data you extract (what Parseur does through defining the Fields in your Mailbox, basically giving your output data a schema) adds a ton of value, and the larger the dataset, the truer it is.

Usually, you start by defining where you want your data to go, and which structure it should have, before working backwards from here and starting to extract the data. This is the key to automating your document workflow.

Replies

gergelycsegzi • today at 5:04 PM

Hey, good point about structure for integrated workflows:)

Fully agree, for enterprises we need to guarantee types, flag discrepancies and provide underlying sources so they can integrate it downstream (whether that's Databricks, n8n etc.)

Here is our documentation for working with a fixed JSON schema: https://docs.parsewise.ai/api#schema-driven-extract-convenie...

alt Hacker News

Replies