We should clarify 'Scaling up' here. Does higher token consumption actually correlate with better accuracy, or are we just increasing overhead?