> The LLM generated writing obviously felt significantly better than my own writing.
A general pattern for LLMs is that they look really good at things you are bad at. What that means is that if you find yourself thinking of its output as significantly better than yours in a particular domain, there's a high chance that you are not equipped to judge that quality effectively.
I partly write words for a living. Claude is really really bad at writing prose that doesn’t make me want to vomit.
I rarely write code, and only once for a living. But I feel like I’m a superhuman and one step away from being a zillionaire when Claude gives me a bunch of code it has written in seconds. I WILL CHANGE THE WORLD!!
And then I remember that Claude can’t write words that don’t make me want to break things and I’m good at writing words but bad at writing code.
So then I delete the code and go back to doing more profitable things than being the next zuckerfuck.
I don't disagree about the probability, but the current frontier models are not completely useless for writing even in areas where I have significant knowledge. I would not have said that a year ago. You have to watch them like a hawk -- they are good at spitting out plausible sounding nonsense that is hard even for an expert to discern. But the dice roll going on behind the scenes is continually more biased towards being correct/useful than not.
Honestly, I can't fathom thinking that LLM writing is even remotely passable. People that think this should honestly read more. One book a month is hardly an aspirational goal. You don't even have to read Melville or Hemingway or Chaucer or Shakespeare, just pick up any popular NYT best seller, and it'll be significantly better than anything an LLM can generate.
> A general pattern for LLMs is that they look really good at things you are bad at.
Naah I disagree with this. I think LLM's are good at gas-lighting you into thinking that good writing only comes in one flavor. And LLMs prefer a very "textbook/technical-manual" coded flavor of writing because maybe that way they are more useful to us humans. But human writing is not just about crafting the most elegant sentences. Sometimes great writing is just this doggo-drawing meme:
https://knowyourmeme.com/photos/2160304-the-winner-of-this-c...
You can triangulate. Ask it the same thing in different ways and with different LLMs. Operate in domains where the output is verifiable, like in the sciences but in terms of numerical computing. Study the output, graph it, learn it, reason with it, rinse, and repeat until your mental model makes practical sense.
This is true, but what is also true is that with each new generation of models (and not just for code generation) it becomes less and less true.
That's because LLM output is "average"; so if you're below, it will obviously look better than what you can do, and vice-versa. It will be interesting to see what happens when current LLM output becomes the bottom, as everyone worse has pulled themselves up to that level.
the other day there was a hackernews comment about ai-generated music, and this poster claimed that a friend generated ai music and got as much enjoyment as actual ones composed by musicians. I suppose this falls under the same camp..
So what does this mean in practice, though?
Let's say you are correct.
You ask an LLM to write something for you, and to you it looks really, really good. So based on your conjecture, that means I am not a very good writer.
Ok, but how does that change what I should do? If I am not a very good writer, that means an LLM IS actually better than me, even if it might not be objectively good to an expert writer.
My two choices are to keep producing my own crappy writing, or use an LLM to create better (but not great) writing.
Wouldn't it make sense to use an LLM?
It seems to me your premise leads you to the same conclusion you would reach even if your premise was false; if me thinking an LLM is good at a task means I am very bad at that task, I am probably better off having an LLM do it. On the other hand, if you are wrong, and I think an LLM is good at something because it actually IS good at that thing, then I should also use the LLM to do the task.
Either way, the LLM is better than me at the thing.
I dabble in drawing and I find LLM images (and maybe some non LLM one) abhorrent. As for why, the reason I can think of are: no consistency (perspective, small details, and color theory) and too much details making it a visual noise. In most painting, the artist will have a subject that is most detailed (to draw the eyes) and from there, the lost of details will follow some kind of logic. This is how you pinpoint what the artist is most interested in. LLM looks like a filter applied to a montage of pictures.
> What that means is that if you find yourself thinking of its output as significantly better than yours in a particular domain, there's a high chance that you are not equipped to judge that quality effectively.
This is why code generation is a disaster waiting to happen. Hunderds of thousands of "programmers" with no idea of what they are pushing to production.
Mnemonic: geLL-Mann amnesia effect
Cuts very close to the Dunning-Kruger effect.
It's basically just another instance of Gell-Mann Amnesia. Ask an LLM to discuss a topic you are an expert on, and you will realise it is full of errors, but ask it to discuss a topic you know nothing about and you will, mysteriously, assume it is very intelligent and correct.
> A general pattern for LLMs is that they look really good at things you are bad at.
This is true for coding, too, which I think, to a large degree, might explain the polarized differences in opinions on HN about the quality of LLM-produced code. You have the 1. "AI produces code better than I could possibly write, one shots things it would take me days to do, and has made me 10X more productive!" camp, and you have the 2. "AI constantly produces poor code needing rework, makes mistakes, has to be babysat, and ultimately costs me time!" camp, with a spectrum in between those. How could the output of the same product be seen so differently? Well, I have bad news for camp 1...