Its worth reading this follow-up LKML post by Andres Freund (who works on Postgres): https://lore.kernel.org/lkml/yr3inlzesdb45n6i6lpbimwr7b25kqk...
Note that it's just not a single post, and there's additional further information in following the full thread. :)
> Maybe we should, but requiring the use of a new low level facility that was introduced in the 7.0 kernel, to address a regression that exists only in 7.0+, seems not great.
Completely right. This sounds like a communication failure. Maybe Linux maintainers should pick a few applications that have "priority support" and problems with these applications are also problems with Linux itself. Breaking Postgres is a serious regression.
Reminds me of a situation where Fedora couldn't be updated if you had Wine installed and one side of the argument was "user applications are user problem" while the other was "it's Wine, like come on".
Funny how "use hugepages" is right there on the table and 99% of users ignore it.
AIUI in that thread they're saying "0.51x" the perf on a 96-core arm64 machine and they're also saying they cannot reproduce it on a 96-core amd64 machine.
So it's not going to affect everybody both running PostgreSQL and upgrading to the latest kernel. Conditions seems to be: arm64, shitloads of core, kernel 7.0, current version of PostgreSQL.
That is not going to be 100% of the installed PostgreSQL DBs out there in the wild when 7.0 lands in a few weeks.
.. which confirms all of my stereotypes. Looks like the AWS engineer who reported it used a m8g.24xlarge instance with 384 GB of RAM, but somehow didn't know or care to enable huge pages. And once enabling them, the performance regression disappears.
>If this somehow does end up being a reproducible performance issue (I still suspect something more complicated is going on), I don't see how userspace could be expected to mitigate a substantial perf regression in 7.0 that can only be mitigated by a default-off non-trivial functionality also introduced in 7.0.