IME so far (as both a lawyer and a software engineer), LLM error rates when drafting code and legal documents are reasonably comparable, but it's more problematic in the legal context because legal documents do not benefit from many of the structural safeguards available for code. For legal documents, there are no automated tests, no static typing, no test environments, no logging/observability instrumentation, no sandboxing.
The time lag between drafting and "deployment" also makes for much less effective, much more expensive debugging loops. You can deploy your code to prod in seconds, see an error pop up in the logs, and immediately start debugging. But it will take at a minimum days and frequently as long as several years before an error in a contract or a court filing will be detected, and often the error is beyond correction at that point. Thus, the errors are both more difficult to detect and to resolve.
And the consequences of error are often much greater, both because they are not correctable and because a legal error may risk someone's life, liberty, or substantial property. Although that's not categorically the case, obviously bugs in certain safety critical systems can be as bad or even worse than legal mistakes. But in general, most software is lower stakes than most legal writing.
On the flip side, LLMs do seem to do a better job with basic style and structure for legal documents compared to code. Things like following IRAC format, citing assertions of law (although hallucination remains an issue), and writing comprehensible sentences. These would be the equivalents in code to best practices like good comments, cohesion, consistent use of design patterns, test coverage, clear variable names, DRY, etc. Although the better performance on those more qualitative metrics may just be because even the longest legal documents are typically simpler in structure and have fewer lines of text than a large, complex codebase. Or maybe it's because LLMs are trained on natural language text more than on code. Or because natural language is more forgiving than code, in that minor variation in diction or grammar is unlikely to have any significant effect on how the document is interpreted, whereas even single character errors in code can have enormous effects.
This is a very good comment. But notice how even in software engineering there is still disagreement about these structural safeguards.
So yes, we can say the LLM created bad code when it does not compile or fails prewritten tests.
But experts might disagree what good comments, good cohesion, appropriate use of design patterns, appropriate test coverage or clear variable names are.
So what are we suppossed to train the LLMs towards? Somebody still has to decide what "good" is.
Hidden gem of a comment, thanks for writing
Well this is largely the fault of law itself. especially english style law. A legal, parseable code, in which not every single tiny municipality (some less than 1 square mile) has their own set of rules and laws, not all published or available - but which citizens are expected to abide by of course - how could we expect AI to do well and not some typical TV southern lawyer who knows the judge?
There is also one thing I would like to add, and you can correct me if you disagree: coding benefits much more from thorough planning. Now, I exclusively work by first writing a plan that has well-defined steps and goals, which can of course change over time.
It seems to me like it would be more difficult to achieve with legal documents and, in my experience at least, writing a concrete plan has been the decisive factor that make my AI coding robust (plus all that you mentionned).