logoalt Hacker News

Show HN: Honker – Postgres NOTIFY/LISTEN Semantics for SQLite

266 pointsby russellthehippoyesterday at 11:53 AM68 commentsview on HN

Comments

russellthehippoyesterday at 12:14 PM

Hey HN, I built this. Honker adds cross-process NOTIFY/LISTEN to SQLite. You get push-style event delivery with single-digit millisecond latency without a damon/broker, using your existing SQLite file. A lot of pretty high-traffic applications are just Framework+SQLite+Litestream on a VPS now, so I wanted to bring a sixer to the "just use SQLite" party.

SQLite doesn't run a server like Postgres, so the trick is moving the polling source from interval queries on a SQLite connection to a lightweight stat(2) on the WAL file. Many small queries are efficient in SQLite (https://www.sqlite.org/np1queryprob.html) so this isn't really a huge upgrade, but the cross-language result is pretty interesting to me - this is language agnostic as all you do is listen to the WAL file and call SQLite functions.

On top of the store/notify primitives, honker ships ephemeral pub/sub (like pg_notify), durable work queues with retries and dead-letter (like pg-boss/Oban), and event streams with per-consumer offsets. All three are rows in your app's existing .db file and can commit atomically with your business write. This is cool because a rollback drops both.

This used to be called litenotify/joblite but I bought honker.dev as a joke for my gf and I realized that every mq/task/worker have silly names: Oban, pg-boss, Huey, RabbitMQ, Celery, Sidekiq, etc. Thus a silly goose got its name.

Honker waddles the same path as these giants and honks into the same void.

Hopefully it's either useful to you or is amusing. Standard alpha software warnings apply.

show 7 replies
JoelJacobsonyesterday at 4:26 PM

Shameless plug: In the upcoming release of PostgreSQL 19, LISTEN/NOTIFY has been optimized to scale much better with selective signaling, i.e. when lots of backends are listening on different channels, patch: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...

russellthehippotoday at 10:03 AM

[Response to feedback]

Thanks all for your feedback, responses, and discussion. I've done a PR here taking your suggestions into account:

https://github.com/russellromney/honker/pulls/1

The PR implements a three-layer polling architecture: - PRAGMA data_version every 1ms - stat every 100ms - retry connection to handle blips

1. PRAGMA data_version every 1ms replaces stat-based (size, mtime) change detection. This is SQLite's own commit counter: monotonic, immune to clock skew, correctly handles WAL truncation and rolled-back transactions. ~3µs nonblocking query. Credit to ncruces for pointing to this. This is not done for performance but for correctness as it is slightly slower. tuo-lei also pointed out truncation risk, which turned out to be more real than i thought.

Interesting note: I found in testing that the C API's SQLITE_FCNTL_DATA_VERSION does not work cross-connection. So for now honker continues paying the cost of going through the VFS layer which vlovich123 pointed out and now we tradeoff explicitly.

2. Reconnect-on-error: if the data_version query fails (disk blip, NFS hiccup, corrupted connection), honker tries to reconnect and wakes subscribers as a precaution. zbentley pointed me in this direction.

3. stat identity check every 100ms: compares (dev, ino) against startup values to detect file replacement (atomic rename, litestream restore, volume remount). data_version can't catch this because it polls through the open fd, which follows the original inode even after replacement. Credit to zbentley for the file-replacement scenarios.

Again - thanks for the discussion, honker got better because of it and I learned some stuff. See you round

Retr0idyesterday at 12:59 PM

Couldn't you use inotify (and/or some cross-platform wrapper) to watch for WAL changes without polling?

Oxlamarrtoday at 10:00 AM

Very cool. Is the bottleneck under load mostly SQLite write throughput, or the WAL notification layer?

show 1 reply
tuo-leiyesterday at 2:48 PM

atomic commit with the business data is the selling point over separate IPC. external message passing always has the 'notification sent but transaction rolled back' problem and that gets messy.

one thing i'm curious about: WAL checkpoint. when SQLite truncates WAL back to zero, does the stat() polling handle that correctly? feels like there's a window where events could get lost.

nzoschkeyesterday at 3:02 PM

Thanks for this!

I have a proliferation of small apps backed by SQLite. And most of these need a queue and scheduler.

I home rolled some stuff for it but was always pining for the elegance of the Postgres solutions.

Will give this a spin very soon

ArielTMyesterday at 12:43 PM

kqueue/FSEvents is tempting here, but Darwin drops same-process notifications. If you've got a publisher and listener in the same process the listener just never fires. Nasty thing to chase. stat polling looks gross but it's the only thing that actually works everywhere.

What happens on WAL checkpoint? When the file shrinks back, does that trigger a wakeup, or does the poller filter size drops?

aldielshalatoday at 2:16 AM

Nice project. I'm also working on something that pushes SQLite well beyond its typical use case. It's encouraging to see more people exploring what SQLite can really do.

robertlagrantyesterday at 5:09 PM

If I'm using SQLAlchemy, can this integrate? It seems to want to make the db connection itself.

PunchyHamsteryesterday at 1:52 PM

Wouldn't processes on same machine be able to use different IPCs that don't even touch file ? It's neat but I have feeling in vast majority of cases just passing address to one of the IPC methods would be faster and then SQLite itself would only be needed for the durable parts.

show 1 reply
agentbonnybbyesterday at 8:15 PM

This is the kind of skill-level tool I wish existed earlier — I hit the exact pain point running a daily-chronicle site off SQLite + a static deploy a week ago. Ended up with a crude polling loop because the alternatives all wanted me to install Postgres for a single notification semantic.

Question: any thoughts on what breaks first when a single process has 10k+ concurrent listeners? I'm curious whether the SQLite side can sustain what Postgres does cheaply.

show 1 reply
zbentleytoday at 1:34 AM

Very neat! I like this a lot, nice work.

After peeking the source, a few possible areas of improvement:

- You can use `fstat` and keep a file handle around, likely further improving performance (well, reducing the performance hit to other users of the filesystem by not resolving vfs nodes). If you do this, you'll have to check for file deletions.

- If you do stick with stat(2), it might be a good idea to track the inode number from the stat result in addition to the time,size tuple. That handles the "t,s = 1,2; honker gets SIGSTOPped/CRIU'd; database file replaced; honker started again", as well as renameat/symlink-swap fiddling. Changing inode probably should just trigger a crash.

- Also check the device number from the stat call. It sounds fringe, but the number of weird hellbugs I've dealt with in my career caused by code continually interacting with a file at the same time as something else mounted an equivalent path "over" the directory the file was originally in is nonzero.

- It's been a few years since I fought with this, but aren't there edge cases here if the system clock goes backwards? IIRC the inode timestamp isn't monotonic--right? There are various strategies for detecting clock adjustment, of various reliability, that you could use here, if so. Just checking if the mtime-vs-system-clock diff is negative is a start.

That covers the more common of the "vanishingly uncommon but I've still seen 'em" cases related to file modification detection. Whether you choose to cope with people messing with the file via utime(2) is up to you (past a point, it feels like coping with malicious misuse rather than edge cases). But since your code runs in a loop, you're well-positioned to do that (and detect drift/manipulations of the system clock): track a monotonic clock and use it to approximate the elapsed wall time between honker poller ticks (say it fast with an accent, and you get https://www.bbc.com/news/world-latin-america-11465127); if the timestamp reported by (f)stat(2) ever doesn't advance at the same rate, fall back to checksumming the file, or crashing or something. But this is well into the realm of abject paranoia by now.

It's been a decade or so since I worked in this area, so some of that knowledge is likely stale; you probably know a lot more than I do after developing this library even before considering how out-of-date my knowledge might be. When I worked on this stuff, I remember that statx(2) was going to solve all the problems any day now, and then didn't. More relevant, I also remember that the lsyncd (https://github.com/lsyncd/lsyncd) and watchman (https://github.com/facebook/watchman) codebases were really good sources of "what didn't I think of" information in this area.

But seriously, again, nice work! Those are nitpicks; this is awesome as-is!

show 1 reply
nodesocketyesterday at 4:07 PM

Awesome. I’m currently using AWS SQS which invokes lambda functions for asynchronous tasks like email sends, but Honker seems like a great local replacement.

Any conflicts or issues when running Litestream as well?

jiusanzhoutoday at 7:01 AM

[dead]

yasirlatifyesterday at 4:46 PM

[dead]

0x1da49yesterday at 1:29 PM

[dead]

adrianojuniortoday at 1:28 AM

[dead]

mastermanas1234yesterday at 3:31 PM

[dead]

faraway9911yesterday at 6:48 PM

[dead]

GangstaAgentsyesterday at 12:28 PM

[flagged]