logoalt Hacker News

porridgeraisintoday at 5:27 AM4 repliesview on HN

It doesn't matter. The output of a query cannot depend on the plan. All plans should generate semantically equivalent output necessarily. Notice I say semantically equivalent because obviously Select random() can return different numbers each time. But it should still be semantically, one single random number.

In this case, the number of rows changing given the same input dataset, is a bug.


Replies

Sesse__today at 9:37 AM

The output of a query absolutely will depend on the plan. In ANSI SQL you're right it cannot (up to ordering), which is why ANSI SQL doesn't have anything like random() or nondeterministic user functions. But nearly all databases support something that is outside of that standard because it's genuinely useful, and then you get into poorly-defined situations that you need to deal with ad-hoc as an implementer.

E.g., for a very simple case, in SELECT * FROM a,b WHERE random() < 0.5, would you push random() down through the join or not? To one table? Both tables? Evaluate it once at the start of a query because it depends on neither a nor b? What if it's random() < a.x? Different databases have different and often poorly-defined semantics here.

swidtoday at 5:57 AM

It is allowed to be a planner choice though! Very surprising, but if you understand the docs, it will follow that it is not actually a bug. It could change in the future and has changed in the past apparently - but that change was to give the planner more opportunity to optimize queries by not materializing parts of the query and inlining that part into the parent.

Regarding selects: [0]: “A key property of WITH queries is that they are normally evaluated only once per execution of the primary query… However, a WITH query can be marked NOT MATERIALIZED to remove this guarantee. … By default, a side-effect-free WITH query is folded into the primary query if it is used exactly once in the primary query’s FROM clause.

Regarding CTEs: [1]: “A useful property of WITH queries is that they are normally evaluated only once per execution of the parent query… However, the other side of this coin is that the optimizer is not able to push restrictions from the parent query down into a multiply-referenced WITH query"

Now, in either case - if you don't want the planner to inline the query - you might have to be explicit about it (I think since postgres 10?), or otherwise - yes, the output of the query will depend on the plan and this is allowed based on the docs.

[0]: https://www.postgresql.org/docs/current/sql-select.html

[1]: https://www.postgresql.org/docs/current/queries-with.html

swidtoday at 6:13 AM

I wrote this to a deleted comment, but even if the CTE was materialized, the subquery of the CTE would still not be...

For instance, with the stand alone query:

  DELETE … WHERE id IN (
    SELECT id … LIMIT 1 FOR UPDATE SKIP LOCKED
  )
  
the planner is free to turn that IN ( subquery ) into a nested‐loop semi‐join, re-executing the subquery as many times as it deems optimal. Therefore it can delete more than 1 row.
show 1 reply