logoalt Hacker News

eliasdejonglast Friday at 7:59 PM4 repliesview on HN

It is also possible to encode JSON documents directly as a serialized B-tree. Then you can construct iterators on it directly, and query internal fields at indexed speeds. It is still a serialized document (possible to send over a network), though now you don't need to do any parsing, since the document itself is already indexed. It is called the Lite³ format.

Disclaimer: I am working on this.

https://github.com/fastserial/lite3


Replies

conradevlast Friday at 9:21 PM

This is super cool! I've always liked Rkyv (https://rkyv.org) but it requires Rust which can be a big lift for a small project. I see this supports binary data (`lite3_val_bytes`) which is great!

show 1 reply
cryptonectorlast Saturday at 1:57 AM

This is pretty cool.

How does Lite^3 compare to PG's JSONB? PG's JSONB is also a serialized, indexed data structure. One of the key things about JSONB is that for arrays (and so objects) it encodes first their lengths, then the values, but every so many elements (32 is the default IIRC) it encodes an offset, and the reason for this design is that when they encoded offsets only the result did not compress well (and if you think about it it will be obvious why). The price they pay for this design is that finding the offset to the nth element's value requires first finding the offset of the last entry before n that has an offset, then adding all the lengths of the entries in between. This way you get a tunable parameter for trading off speed for compressibility.

EDIT: Ok, I've looked at the format. Some comments:

- Updating in place is cool but you need to clear unused replaced data in case it's sensitive, and then unless you re-encode you will use up more and more space -- once in a while you need a "vacuum". Though vacuuming a Lite^3 document is quite simple: just traverse the data structure and write a new version, and naturally it will be vacuumed.

- On the whole I like Lite^3 quite a bit. Very clever.

- JSONB is also indexed as encoded, but IIUC it's not in-place updateable (unless the new items are the same length as the old) without re-encoding. Though I can imagine a way to tombstone old values and replace them with offsets into appended data, then the result would also need a "vacuum" once in a while.

- I'm curious about compressibility. I suspect not having long runs of pointers (offsets) helps, but still I suspect JSONB is more compressible.

I love the topic of serialization formats, and I've been thinking for some time about ASN.1 compilers (since I maintain one). I've wanted to implement a flatbuffers / JSONB style codec for ASN.1 borrowing ideas from OER. You've given me something to think about! When you have a schema (e.g., an ASN.1 module) you don't really need a B-tree -- the encoded data, if it's encoded in a convenient way, is the B-tree already, but accessing the encoded data by traversal path rather than decoding into nice in-memory structures sure would be a major improvement in codec performance!

show 2 replies
the_dukelast Friday at 9:35 PM

Would love a Rust implementation of this.

gritzkolast Saturday at 5:46 AM

Sorry, but who are you? Your accounts have no history.