"LLMs are good at SQL" is quite the assertion. My experience with LLM generated SQL in OLTP and OLAP platforms has been a mixed bag. IMO analytics/SQL will always be a space that needs a significant weight of human input and judgement in generating. Probably always will be due to the critical business decisions that can be made from the insights.
"LLMs are good at [task I'm not good enough at to tell the LLM is bad at]" is becoming common
> IMO analytics/SQL will always be a space that needs a significant weight of human input and judgement in generating.
Isn't that precisely what is done when prompting?
> My experience with LLM generated SQL in OLTP and OLAP platforms has been a mixed bag
Models are evolving fast. If your experience is older than a few months, I encourage you to try again.
I mean this with the best intentions: it's seriously mind boggling. We started doing this with Sonnet 4.0 and the relevance was okay at best. Then in September we shifted to Sonnet 4.5 and it's been night and day.
Every single model released since then (Opus 4.5, 4.6) has meaningfully improved the quality of results
What we learned while building this is every token matters in the context, we spend lot of time watching logs of agent sessions, changing the tool params, errors returned by tools, agent prompts, etc...
We noticed for example the importance of letting the model pull from the context, instead of pushing lots of data in the prompt. We have a "complex" error reporting because we have to differentiate between real non-retryable errors and errors that teach the model to retry differently. It changes the model behavior completely.
Also I agree with "significant weight of human input and judgement", we spent lots of time optimizing the index and thinking about how to organize data so queries perform at scale. Claude wasn't very helpful there.