SQLite JSON at full index speed using generated columns

350 points • by upmostly • yesterday at 1:25 PM • 105 comments • view on HN

Comments

It is also possible to encode JSON documents directly as a serialized B-tree. Then you can construct iterators on it directly, and query internal fields at indexed speeds. It is still a serialized document (possible to send over a network), though now you don't need to do any parsing, since the document itself is already indexed. It is called the Lite³ format.

Disclaimer: I am working on this.

https://github.com/fastserial/lite3

➕ show 3 replies

srameshc • yesterday at 6:07 PM

I love SQLite and this is in no way I'm making a point devaluing SQLite, Author's method is excellent approach to get analytical speed out of SQLite. But I am loving DuckDB for similar analytical workloads as it is built for such tasks. DuckDB also reads from single file, like SQLite and DuckDB process large data sets at extreme speeds. I work on my macbook m2 and I have been dealing with about 20 million records and it works fast, very fast.

Loading data into DuckDB is super easy, I was surprised :

SELECT avg(sale_price), count(DISTINCT customer_id) FROM '/my-data-lake/sales/2024/*.json';

and you can also load into a JSON type column and can use postgres type syntax col->>'$.key'

➕ show 3 replies

jelder • yesterday at 2:44 PM

I thought this was common practice, generated columns for JSON performance. I've even used this (although it was in Postgres) to maintain foreign key constraints where the key is buried in a JSON column. What we were doing was slightly cursed but it worked perfectly.

➕ show 6 replies

upmostly • yesterday at 1:30 PM

I was inspired to write this blog post after reading bambax's comment on a HN post back in 2023: https://news.ycombinator.com/item?id=37082941

kevinsync • yesterday at 5:42 PM

Hilariously, I discovered this very technique a couple weeks ago when Claude Code presented it out of the blue as an option with an implemented example when I was trying to find some optimizations for something I'm working on. It turned out to be a really smart and performant choice, one I simply wasn't aware of because I hadn't really kept up with new SQLite features the last few years at all.

Lesson learned: even if you know your tools well, periodically go check out updated docs and see what's new, you might be surprised at what you find!

➕ show 2 replies

tracker1 • yesterday at 3:52 PM

As others mention, you can create indexes directly against the json without projecting in to a computed column... though the computed column has the added benefit of making certain queries easier.

That said, this is pretty much what you have to do with MS-SQL's limited support for JSON before 2025 (v17). Glad I double checked, since I wasn't even aware they had added the JSON type to 2025.

➕ show 2 replies

bambax • today at 2:14 AM

Opening an article on HN, seeing one of my comments quoted at the top, and then finding out the whole article is about that one comment: that's a first!

> So, thanks bambax!

You're most welcome! And yes, SQLite is awesome!!

➕ show 1 reply

Lex-2008 • yesterday at 2:14 PM

interesting, but can't you use "Index On Expression" <https://sqlite.org/expridx.html>?

i.e. something like this: CREATE INDEX idx_events_type ON events(json_extract(data, '$.type'))?

i guess caveat here is that slight change in json path syntax (can't think of any right now) can cause SQLite to not use this index, while in case of explicitly specified Virtual Generated Columns you're guaranteed to use the index.

➕ show 4 replies

ellisv • yesterday at 3:12 PM

I wish devs would normalize their data rather than shove everything into a JSON(B) column, especially when there is a consistent schema across records.

It's much harder to setup proper indexes, enforce constraints, and adds overhead every time you actually want to use the data.

➕ show 7 replies

AlexErrant • yesterday at 3:12 PM

I was looking for a way to index a JSON column that contains a JSON array, like a list of tags. AFAIK this method won't work for that; you'll either need to use FTS or a separate "tag" table that you index.

➕ show 4 replies

verytrivial • yesterday at 6:31 PM

If you replace JSON with XML in this model it is exactly what the "document store" databases from the 90s and 00s were doing -- parsing at insert and update time, then touching only indexes at query time. It is indeed cool that sqlite does this out of the box.

simonw • yesterday at 2:43 PM

Tiny bug report: I couldn't edit text in those SQL editor widgets from my iPhone, and I couldn't scroll them to see text that extended past the width of the page either.

➕ show 2 replies

Seattle3503 • yesterday at 4:57 PM

It says full speed, but no benchmarks were performed to verify if performance was really equivalent.

pawelduda • yesterday at 3:53 PM

I've been coding a lot of small apps recently, and going from local JSON file storage to SQLite has been a very natural path of progression, as data's order of magnitude ramps up. A fully performant database which still feels as simple as opening and reading from a plain JSON file. The trick you describe in the article is actually an unexpected performance buffer that'll come in handy when I start hitting next bottleneck :) Thank you

rrmdp • yesterday at 8:22 PM

The fact the DB is portable is amazing, I use it for all my projects now but didn't know about this JSON feature

focusgroup0 • yesterday at 4:41 PM

Would this be a good fit for migrating from mongo --> sqlite? A task I am dreading

➕ show 1 reply

zackify • today at 12:57 AM

I love jsonb support in sqlite.

Particularly with drizzle, it means I can use sqlite on device with expo-sqlite, and store our data format in a single field, with very little syntax, and the schema and queries all become fully type safe.

Also being able to use the same light orm abstraction server side with bun:sqlite is huge.

mcluck • yesterday at 2:24 PM

Very cool article. To really drill it home, I would have loved to see how the query plan changes. It _looks_ like it should Just Work(tm) but my brain refuses to believe that it's able to use those new indexes so flawlessly

meindnoch • yesterday at 3:15 PM

In the 2nd section you're using a CREATE TABLE plus three separate ALTER TABLE calls to add the virtual columns. In the 3rd section you're using a single CREATE TABLE with the virtual columns included from the get go.

Why?

➕ show 3 replies

oars • yesterday at 7:47 PM

Great article with clear instructions - could be quite useful if I need to do stuff with storing JSON in SQLite in the future.

moregrist • yesterday at 3:25 PM

Generated columns are pretty great, but what I would really love is a Postgres-style gin index, which dramatically speeds up json queries for unanticipated keys.

rcarmo • yesterday at 6:38 PM

I've been using this trick for a while, and it actually got me to do quite a bit without an ORM (just hacking a sane models.py with a few stable wrappers and calling it a day)

➕ show 1 reply

bushbaba • yesterday at 2:58 PM

For smaller datasets (100s of thousands of rows) I don’t see why you wouldn’t just use json columns with generated column/index where needed

➕ show 1 reply

mring33621 • yesterday at 2:40 PM

IIRC, Vertica had/has a similar feature.

➕ show 2 replies

N_Lens • yesterday at 3:18 PM

What a neat trick, I love SQLite as well.

baq • yesterday at 3:06 PM

My understanding is Snowflake works kinda like that behind the scenes right?

javantanna • yesterday at 6:08 PM

Your website looks like supermemory.ai , BTW its pretty cool

maxpert • yesterday at 7:12 PM

LOL what are the odds, I posted in `Show HN` about Marmot today https://github.com/maxpert/marmot/releases/tag/v2.2.0 and in my head I was thinking exact same thing for supporting MySQL's JSON datatype. At some level I am starting to feel, I might as well be able to expose a full MongoDB compatible protocol that let's you talk to tables as collections, solving this problem once it for all!

But this technique I guess is very common now.

stacktraceyo • yesterday at 6:31 PM

Can I do this with pocket base?

dmezzetti • yesterday at 6:15 PM

I love this feature. I've long used json_extract to create dynamic columns with txtai sql: https://neuml.github.io/txtai/embeddings/query/#dynamic-colu...

You can do the same with DuckDB and Postgres too.

jb_rad • yesterday at 5:29 PM

Dude what? This is incredible knowledge. I had been fearing this exact problem for so long, but there is an elegant out of the box solution. Thank you!!

groundzeros2015 • yesterday at 4:53 PM

> We've got an embedded SQLite-in-the-browser component on our blog

What?

➕ show 1 reply

pipe01 • yesterday at 3:52 PM

MongoDB is dead, long live MongoDB

alt Hacker News

SQLite JSON at full index speed using generated columns

Comments