It's not undefined behavior, it's the fact that the uncorrelated subquery within the CTE doesn't specify an ordering, therefore it cannot be implicitly materialized / evaluated once. Postgres documentation is clear here [1]:
> If sorting is not chosen, the rows will be returned in an unspecified order.
The original query from TFA could've instead just had the uncorrelated subquery moved into a materialized CTE.
[1] https://www.postgresql.org/docs/current/queries-order.html
It doesn't matter. The output of a query cannot depend on the plan. All plans should generate semantically equivalent output necessarily. Notice I say semantically equivalent because obviously Select random() can return different numbers each time. But it should still be semantically, one single random number.
In this case, the number of rows changing given the same input dataset, is a bug.
I see what you saying, but it is very subtle, wouldn’t you agree?
Under a different plan, the inner query would only be evaluated once, it is hard to mentally parse how it will first find the rows with the group id and then pass into the sub query.
And still I am not sure how using a CTE or not in the manner in the post is supposed to avoid this issue, so now I’m a bit skeptical it does. I see how a sort would.
I hope if the sub query was its own CTE, the limit would be applied correctly, but am no longer sure… before this post I wouldn’t have questioned it.
Edit to say - now I see you need to explicitly use AS MATERIALIZED if you bump the subquery to a CTE. Someone should really write a better blog post on this… it raises an interesting case but fails to explain it… they probably have not even solved it for themselves.