logoalt Hacker News

Diggsey05/04/20251 replyview on HN

I think your logic is flawed (even if PostgreSQL may behave this way in practice).

First, let's make sure we're talking about the same two examples.

A:

    DELETE FROM task_queue
    WHERE id = (
      SELECT id FROM task_queue
      WHERE queue_group_id = 15
      LIMIT 1
      FOR UPDATE SKIP LOCKED
    )
B:

    DELETE FROM task_queue
    WHERE id IN (
      SELECT id FROM task_queue
      WHERE queue_group_id = 15
      LIMIT 1
      FOR UPDATE SKIP LOCKED
    )
You seem to be saying that B may exhibit the problem whilst A does not. (ie. there is a different between using `=` vs `IN`). I would like to see some documentation justifying that.

Here's my logic:

- In both cases, the outer query is run once.

- In both cases the outer WHERE clause is evaluated for each row in `task_queue`.

- In both cases, it is up to the optimizer whether the result of the subquery is materialized or not.

- In both cases, if the subquery is not materialized, multiple rows in the outer query may match the condition.

In practice, it may be that the optimizer always materializes uncorrelated subqueries on the RHS of an `=` expression. My contention is whether that is a formal guarantee.


Replies

swid05/04/2025

Most of what you say I agree with. But if this outer query is run only once with version A there is a caveat with where it says “where id = X”.

This cannot match more than one X at a time. So that forces the inner query to be run once, as we can only have one id, and running it twice may produce 2.

I am not sure though to be honest.

We are too deep for me to reply now, but to your next comment I didn't mean only one row, but only one id. It is easy for a small difference in word choice to get things wrong.

I think if it did return two rows; ie limit 2- the query with = will fail. Hmmm maybe that will happen even with limit 1 under certain plans. I wouldn’t trust it.

show 1 reply