logoalt Hacker News

n_eyesterday at 8:02 AM5 repliesview on HN

I process TB-size ndjson files. I want to use jq to do some simple transformations between stages of the processing pipeline (e.g. rename a field), but it so slow that I write a single-use node or rust script instead.


Replies

loxiasyesterday at 5:34 PM

I would love, _love_ to know more about your data formats, your tools, what the JSON looks like, basically as much as you're willing to share. :)

For about a month now I've been working on a suite of tools for dealing with JSON specifically written for the imagined audience of "for people who like CLIs or TUIs and have to deal with PILES AND PILES of JSON and care deeply about performance".

For me, I've been writing them just because it's an "itch". I like writing high performance/efficient software, and there's a few gaps that it bugged me they existed, that I knew I could fill.

I'm having fun and will be happy when I finish, regardless, but it would be so cool if it happened to solve a problem for someone else.

show 1 reply
eruyesterday at 8:17 AM

This reminds me of someone who wrote a regex tool that matches by compiling regexes (at runtime of the tool) via LLVM to native code.

You could probably do something similar for a faster jq.

nchmyyesterday at 8:10 AM

This isn't for you then

> The query language is deliberately less expressive than jq's. jsongrep is a search tool, not a transformation tool-- it finds values but doesn't compute new ones. There are no filters, no arithmetic, no string interpolation.

Mind me asking what sorts of TB json files you work with? Seems excessively immense.

show 2 replies
messeyesterday at 8:11 AM

Now I'm really curious. What field are you in that ndjson files of that size are common?

I'm sure there are reasons against switching to something more efficient–we've all been there–I'm just surprised.

show 1 reply