I've used general purpose LLM AI (e.g. run-of-the-mill Claude, GPT etc) heavily to draft legal documents. The biggest trap is the hallucinated citation. It will easily insert an absolutely authentic sounding quotation from another case that perfectly proves the point you are trying to make, then it'll make up an authentic name for it, e.g. United States v. Shenzhou Electronics Inc or whatever. You can get really comfortable after checking its output a few times and getting no false citations, and then BAM, it'll put three in the next motion it writes.
Any lawyer who isn't using LLMs for research is behind the curve, though. They are unbelievable at finding niche cases you would never have found on your own. Previously it was a lot of exact search term matching, which is inherently useless for a lot of legal research. I need something that can search on vaguer terms, which AI can do incredibly well. Just check the results. I'm sure the LLMs from Lexis Nexis/Westlaw are probably better than the general purpose ones.
LLMs make fantastic paralegals. If you're doing any legal work, you should be using it, even if it's just to shoot ideas at. Have it play devil's advocate. My friend always has it play the other party's lawyer to see what all the counter-arguments are going to be.
Just like you would with software development. If you care about what you are creating, CHECK THE OUTPUT.
>The biggest trap is the hallucinated citation
The "biggest problem" being the one thing that is trivial to verify against concrete databases is a bit convenient don't you think?
I think it's more likely that it makes mistakes evenly but the one thing that you are able to check with certainty is the only place you discover the errors.
Just because the citation exists, what the LLM says it stands for and what it actually stands for are not the same.
For testing, I've asked (admittedly last-gen) LLMs to generate legal opinions regarding issues in commercial English civil litigation, and I received back cases where the citation is real, but the area of law (family law) is not relevant as family courts apply a very different set of procedural rules.
(If you squint a bit, they sometimes might be relevant... and could be useful for a particularly creative litigator to make a novel argument on behalf of a very risk tolerant client. But you would very much want to go read those cases and think quite hard about them.)
Seems companies like Thomson Reuters or other legal services have incentive to build LLM with RAG over legal cases texts and robust hallucinations detection on reference
I think the paralegal analogy is right, but with one important difference: a human paralegal usually knows when they are unsure, or at least can be trained to flag uncertainty
Chatgpt regularly hallucinates entire cases whole cloth or fabricates an entirely different fact pattern for a given case. Perplexity does much better at citing its sources and providing accurate quotes, at least in my experience.
A legal professional can be personally liable for not finding the most recent case-law.
The knowledge cut off gap means the models sometimes don't know about the most recent case-law, in a given situation.
I've seent his happen multiple times now. Accountants and legal professionals advising clients based on outdated information assembled through chat-gtp, claude and copilot.
Professionals drafting letters and missing recent case-law which handles their exact case. It's unreliable.So it can save you some work; but it can't save you all of the work. And in some cases its mistakes really force you to redo all the work, and more, to be thorough and have confidence in the result.
> The biggest trap is the hallucinated citation. It will easily insert an absolutely authentic sounding quotation from another case that perfectly proves the point you are trying to make, then it'll make up an authentic name for it, e.g. United States v. Shenzhou Electronics Inc or whatever.
Naive question from an outsider: aren't there searchable databases of cases (with complete text) so that citations could be checked automatically, either by the same or an independent agent?