logoalt Hacker News

cheevlyyesterday at 8:31 PM1 replyview on HN

GPT literally generates perfect code for me in languages that do not exist anywhere in its training set, so I’m not sure how you’ve achieved this level of failure.


Replies

joferyesterday at 9:07 PM

Try working in anything domain specific outside of common CRUD patterns. E.g. scientific software development where you describe a problem + give data. I have yet to see a single example of feeding in a problem in natural language involving a specific scientific domain that wasn't pretty catastrophically incorrect.

But yeah, if you want to feed it math and get code, it's reasonably okay with that. All LLMs I've used seem bad at understanding things that don't look like broad human knowledge. I've seen this same general issue across many different models. (And to be fair, geology, geophysics, and remote sensing are what I'm testing, and their semi-rare niches.)

It's also quite dangerous because it's not obvious that what it's doing is complete hallucinations unless you actually are a domain expert. Things _sound_ reasonable. E.g. "this is likely feature X" which _does_ exist, but is absolutely _not_ relevant to the problem or present in the input dataset.

But my current employer is pushing this exact thing (human language + scientific data + LLM -> advanced analysis of scientific data by LLM -> business decisions) and it _really_ worries me. It often gives the rough equivalent of "Start the procedure by severing the patient's aorta. Once they stop moving, you can deal with the hangnail". Just in very reasonable sounding language. And a lot of people don't know any better, because most users aren't domain experts.

show 2 replies