What are you basing how good they are on? Personal experience or some benchmarks?
Benchmarks, we have internal ones testing reasoning fine-tuned v/s frontier + prompts
For some use cases it can be parity performance at 1/20th the cost up to exceeds at 1/10th the cost. Trade-off is ofc narrow applicability
Benchmarks, we have internal ones testing reasoning fine-tuned v/s frontier + prompts
For some use cases it can be parity performance at 1/20th the cost up to exceeds at 1/10th the cost. Trade-off is ofc narrow applicability