You buy a wooden dinner table, it is fully functional and looks perfect. It’s sturdy. You have dinner on it and it survives a few spills.
A few months later you find out it is made of PU foam and printed waxed paper. A misplaced knee could bring it down. It’s likely to completely fall apart in a year. Is that irrelevant?
Yes it is relevant and testable. It's exactly what I meant by "a measurable increase in quality of the final product". In fact a proper test harness would reveal that problem. You are forgetting that with LLMs, testing software does not have to end at the usual unit/integration/e2e level.