logoalt Hacker News

xxslast Thursday at 9:10 AM1 replyview on HN

as parser: keep only indexes to the original file (input), dont copy strings or parse numbers at all (unless the strings fit in the index width, e.g. 32bit)

That would make parsing faster and there will be very little in terms on tree (json can't really contain full blow graphs) but it's rather complicated, and it will require hashing to allow navigation, though.


Replies

creationixlast Thursday at 5:45 PM

yep. I built custom JSON parsers as a first solution. The problem is you can't get away from scanning at least half the document bytes on average.

With RX and other truly random-access formats you could even optimize to the point of not even fetching the whole document. You could grab chunks from a remote server using HTTP range requests and cache locally in fixed-width blocks.

With JSON you must start at the front and read byte-by-byte till you find all the data you're looking for. Smart parsers can help a lot to reduce heap allocations, but you can't skip the state machine scan.