logoalt Hacker News

wencyesterday at 6:24 AM2 repliesview on HN

I don't quite agree.

SQL queries are one area where correctness matters a lot, because downstream applications rely on them to be correct. If you get a query wrong, especially in an ETL process, you can generate a lots of garbage data for a very long time (I'm speaking from lived experience). It might take a long time to correct (via backfill) and sometimes the original data might no longer be available.

I use LLMs to generate ideas for writing complex SQL, but I always need to understand the query 100% first before I deploy. Mistakes are costly, and sometimes irreversible. I need to trust but verify, which requires deep SQL knowledge.

To write correct SQL, you not only have to know the syntax, but also the schema (easily provided to an LLM) and the expected data values (you can provide a sampling to an LLM which helps, but domain knowledge helps even more). There are lots of surprising things in data that only reveal themselves via visual inspection. (though I think with MCO, an LLM will one day be able to inspect the data itself by sampling the database)


Replies

8notetoday at 12:09 AM

this suggests that i should expect that dara coming out of SQL is likely to be wrong, since lots of people writing it dont have the deep knowledge, and its very hard to verify correctness of the results since there is limited to no ground truth.

if i already expect it to be making at least some of a mess, why not have AI as part of the setup?

show 1 reply
vascoyesterday at 10:05 AM

Testing SQL for correctness is besides the point of how the SQL was generated. It's important of course to review and test anything. My point is SQL writing is particularly good already with current state of the art of assistants for it to be worth it to spend much time hand crafting complex queries.

show 1 reply