I love SQLite and this is in no way I'm making a point devaluing SQLite, Author's method is excellent approach to get analytical speed out of SQLite. But I am loving DuckDB for similar analytical workloads as it is built for such tasks. DuckDB also reads from single file, like SQLite and DuckDB process large data sets at extreme speeds. I work on my macbook m2 and I have been dealing with about 20 million records and it works fast, very fast.
Loading data into DuckDB is super easy, I was surprised :
SELECT avg(sale_price), count(DISTINCT customer_id) FROM '/my-data-lake/sales/2024/*.json';
and you can also load into a JSON type column and can use postgres type syntax col->>'$.key'
I thought this was common practice, generated columns for JSON performance. I've even used this (although it was in Postgres) to maintain foreign key constraints where the key is buried in a JSON column. What we were doing was slightly cursed but it worked perfectly.
I was inspired to write this blog post after reading bambax's comment on a HN post back in 2023: https://news.ycombinator.com/item?id=37082941
Hilariously, I discovered this very technique a couple weeks ago when Claude Code presented it out of the blue as an option with an implemented example when I was trying to find some optimizations for something I'm working on. It turned out to be a really smart and performant choice, one I simply wasn't aware of because I hadn't really kept up with new SQLite features the last few years at all.
Lesson learned: even if you know your tools well, periodically go check out updated docs and see what's new, you might be surprised at what you find!
As others mention, you can create indexes directly against the json without projecting in to a computed column... though the computed column has the added benefit of making certain queries easier.
That said, this is pretty much what you have to do with MS-SQL's limited support for JSON before 2025 (v17). Glad I double checked, since I wasn't even aware they had added the JSON type to 2025.
Opening an article on HN, seeing one of my comments quoted at the top, and then finding out the whole article is about that one comment: that's a first!
> So, thanks bambax!
You're most welcome! And yes, SQLite is awesome!!
interesting, but can't you use "Index On Expression" <https://sqlite.org/expridx.html>?
i.e. something like this: CREATE INDEX idx_events_type ON events(json_extract(data, '$.type'))?
i guess caveat here is that slight change in json path syntax (can't think of any right now) can cause SQLite to not use this index, while in case of explicitly specified Virtual Generated Columns you're guaranteed to use the index.
I wish devs would normalize their data rather than shove everything into a JSON(B) column, especially when there is a consistent schema across records.
It's much harder to setup proper indexes, enforce constraints, and adds overhead every time you actually want to use the data.
I was looking for a way to index a JSON column that contains a JSON array, like a list of tags. AFAIK this method won't work for that; you'll either need to use FTS or a separate "tag" table that you index.
If you replace JSON with XML in this model it is exactly what the "document store" databases from the 90s and 00s were doing -- parsing at insert and update time, then touching only indexes at query time. It is indeed cool that sqlite does this out of the box.
Tiny bug report: I couldn't edit text in those SQL editor widgets from my iPhone, and I couldn't scroll them to see text that extended past the width of the page either.
It says full speed, but no benchmarks were performed to verify if performance was really equivalent.
I've been coding a lot of small apps recently, and going from local JSON file storage to SQLite has been a very natural path of progression, as data's order of magnitude ramps up. A fully performant database which still feels as simple as opening and reading from a plain JSON file. The trick you describe in the article is actually an unexpected performance buffer that'll come in handy when I start hitting next bottleneck :) Thank you
The fact the DB is portable is amazing, I use it for all my projects now but didn't know about this JSON feature
Would this be a good fit for migrating from mongo --> sqlite? A task I am dreading
I love jsonb support in sqlite.
Particularly with drizzle, it means I can use sqlite on device with expo-sqlite, and store our data format in a single field, with very little syntax, and the schema and queries all become fully type safe.
Also being able to use the same light orm abstraction server side with bun:sqlite is huge.
Very cool article. To really drill it home, I would have loved to see how the query plan changes. It _looks_ like it should Just Work(tm) but my brain refuses to believe that it's able to use those new indexes so flawlessly
In the 2nd section you're using a CREATE TABLE plus three separate ALTER TABLE calls to add the virtual columns. In the 3rd section you're using a single CREATE TABLE with the virtual columns included from the get go.
Why?
Great article with clear instructions - could be quite useful if I need to do stuff with storing JSON in SQLite in the future.
Generated columns are pretty great, but what I would really love is a Postgres-style gin index, which dramatically speeds up json queries for unanticipated keys.
I've been using this trick for a while, and it actually got me to do quite a bit without an ORM (just hacking a sane models.py with a few stable wrappers and calling it a day)
For smaller datasets (100s of thousands of rows) I don’t see why you wouldn’t just use json columns with generated column/index where needed
What a neat trick, I love SQLite as well.
My understanding is Snowflake works kinda like that behind the scenes right?
Your website looks like supermemory.ai , BTW its pretty cool
LOL what are the odds, I posted in `Show HN` about Marmot today https://github.com/maxpert/marmot/releases/tag/v2.2.0 and in my head I was thinking exact same thing for supporting MySQL's JSON datatype. At some level I am starting to feel, I might as well be able to expose a full MongoDB compatible protocol that let's you talk to tables as collections, solving this problem once it for all!
But this technique I guess is very common now.
Can I do this with pocket base?
I love this feature. I've long used json_extract to create dynamic columns with txtai sql: https://neuml.github.io/txtai/embeddings/query/#dynamic-colu...
You can do the same with DuckDB and Postgres too.
Dude what? This is incredible knowledge. I had been fearing this exact problem for so long, but there is an elegant out of the box solution. Thank you!!
> We've got an embedded SQLite-in-the-browser component on our blog
What?
MongoDB is dead, long live MongoDB
It is also possible to encode JSON documents directly as a serialized B-tree. Then you can construct iterators on it directly, and query internal fields at indexed speeds. It is still a serialized document (possible to send over a network), though now you don't need to do any parsing, since the document itself is already indexed. It is called the Lite³ format.
Disclaimer: I am working on this.
https://github.com/fastserial/lite3