Honest question: Does anyone know about any quantitative study or analysis on productivity gains using code assistants? Asking for numbers comparing between the "pre AI era" and now.
Also, I have the impression that LLMs bring some gains or benefits for individuals but not relevant enough at the organization level.
I believe it is very hard to quantify „productivity“. I’m sure that for suitable definitions you can find gains from coding assistants. Personally I get more code written and more features implemented. Yet I’m very wary of coding assistants because I believe they deal a fatal blow to my ability to understand the system. All LLM generated code is (at best!) code that was written by an intern which I just helped with the design and reviewed (unless productivity expectations cut down my review time and I get LLM assistance for reviews too). My grasp on the inner working of that code is much more tenuous than had I written it myself. I will never become an expert by just reviewing code and prompting.
For a while this is not a problem: I can work with my current mental model. But every generated PR erodes my expertise a little bit. Eventually my mental model won’t fit anymore.
So how much of that model maintenance should I count into my productivity metric? Does that even matter or will the next model be able to reason well enough that my mental model doesn’t matter?
Followup question: Does anyone know about any quantitative study or analysis on productivity without using code assistants? (as a baseline)
Here's the big one that was being passed around months ago, which nowadays usually gets dismissed out of hand because of when they did it (while ignoring the relative finding): https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
Here's a slightly more recent one focused more on comprehension/learning than productivity: https://www.anthropic.com/research/AI-assistance-coding-skil...
Metr attempted to redo that first one to get trends over time, but couldn't recruit enough developers to get reliable results for it.