I want to measure performance drift over time.
Having access to the reasoning text and output would help with performance measurement.