Only by a weird definition of "scalable". The first sentence says: > Counterintuit...

crazygringo • today at 3:44 PM • 4 replies • view on HN

Only by a weird definition of "scalable". The first sentence says:

> Counterintuitively, large DELETEs add work to the database.

There is nothing counterintuitive about this. It takes just as much work to delete a row as it takes to insert a row. Why wouldn't it? Obviously you have to do almost all the same operations: write a log, write the deletion, update indices, replicate it, etc.

And yes, it's a well-known trick for all major relational databases (not just Postgres) that if you want to delete 90% of rows from a large table, it's much faster to just copy the rows you want to keep to a new table, run DROP TABLE on the old table, and rename the new table to the old table. Since DROP TABLE is ~instantaneous, mainly involving table-level metadata.

DELETE scales just fine, in the sense that if you are constantly inserting and deleting individual rows, DELETE scales the same as INSERT.

Basic database functionality is designed around the assumption of lots of small transactions. Whenever you have to do something involving millions of rows at once, you generally need to investigate solutions that work well in "bulk". E.g. loading rows directly from a file rather than with SQL, adding indices only after the data has been loaded rather than before, disabling foreign key checks on large operations (if you know by design that the keys are valid)... and yes, taking advantage of DROP TABLE instead of DELETE. This doesn't mean small transactions aren't scalable, it just means bulk operations are qualitatively different and benefit from their own solutions. And DELETE is no different from INSERT in this regard.

Replies

setr • today at 3:56 PM

> It takes just as much work to delete a row as it takes to insert a row. Why wouldn't it? Obviously you have to do almost all the same operations: write a log, write the deletion, update indices, replicate it, etc.

It takes far more work to delete/update than insert. My recent example is updating ~2TB of text data was about 40x slower than inserting 12TB (was trying to correct some large text truncation that occurred during migration into PG, ended up being faster to redo).

➕ show 1 reply

echoangle • today at 3:46 PM

> And yes, it's a well-known trick for all major relational databases (not just Postgres) that if you want to delete 90% of rows from a large a table, it's much faster to just copy the rows you want to keep to a new table, run DROP TABLE on the old table, and rename the new table to the old table.

Dumb question but why does the optimizer not just do that in secret then? Seems like something that should be detectable with some heuristics.

➕ show 6 replies

Retr0id • today at 4:30 PM

> if you are constantly inserting and deleting individual rows, DELETE scales the same as INSERT

Technically correct, but for a small table with a high churn rate, the performance characteristics may be surprising in that the "n" in most big-O calculations includes all inserts since the last VACUUM, not the actual number of resident rows.

winterbloom • today at 4:24 PM

how does that solution work if the table that is dropped has foreign key constraints?

➕ show 1 reply

alt Hacker News

Replies