This is the actual problem:
"Kamal runs blue-green deploys — it starts a new container, health-checks it, then stops the old one. During the switchover, both containers are running. Both mount ultrathink_storage. Both have the SQLite files open."
WAL mode requires shared access to System V IPC mapped memory. This is unlikely to work across containers.
In case anybody needs a refresher:
https://en.wikipedia.org/wiki/Shared_memory
https://en.wikipedia.org/wiki/CB_UNIX
https://www.ibm.com/docs/en/aix/7.1.0?topic=operations-syste...
The SQLite documentation says in strong terms not to do this. https://sqlite.org/howtocorrupt.html#_filesystems_with_broke...
See more: https://sqlite.org/wal.html#concurrency
This thread in the SQLite forum should be instructive: https://sqlite.org/forum/forumpost/90d6805c7cec827f
> WAL mode requires shared access to System V IPC mapped memory.
Incorrect. It requires access to mmap()
"The wal-index is implemented using an ordinary file that is mmapped for robustness. Early (pre-release) implementations of WAL mode stored the wal-index in volatile shared-memory, such as files created in /dev/shm on Linux or /tmp on other unix systems. The problem with that approach is that processes with a different root directory (changed via chroot) will see different files and hence use different shared memory areas, leading to database corruption."
> This is unlikely to work across containers.
I'd imagine sqlite code would fail if that was the case; in case of k8s at least mounting same storage to 2 containers in most configurations causes K8S to co-locate both pods on same node so it should be fine.
It is far more likely they just fucked up the code and lost data that way...
> This is unlikely to work across containers.
Why not?
Ooh new historical Unix variant I had never heard of.. neat!
Thanks for this, the anecdote with the lost data was very concerning to me.
I think you're exactly right about the WAL shared memory not crossing the container boundary. EDIT: It looks like WAL works fine across Docker boundaries, see https://news.ycombinator.com/item?id=47637353#47677163
I don't know much about Kamal but I'd look into ways of "pausing" traffic during a deploy - the trick where a proxy pretends that a request is taking another second to finish when it's actually held in the proxy while the two containers switch over.
From https://kamal-deploy.org/docs/upgrading/proxy-changes/ it looks like Kamal 2's new proxy doesn't have this yet, they list "Pausing requests" as "coming soon".