> My hypothesis is that some of this a perceived quality drop due to "luck of the draw" where it comes to the non-deterministic nature of [LLM] output.
I think you must have learned that they’re more nondeterministic than you had thought, but then wrongly connected your new understanding to the recent model degradation. Note: they’ve been nondeterministic the whole time, while the widely-reported degradation is recent.
I wonder how well the "good" versions worked if you threw awkward edge cases at it.
Er, no, I am fully aware that LLMs have always been non-deterministic.