> Schwartz's experiment is the most revealing, and not for the reason he thinks. What he demonstrated is that Claude can, with detailed supervision, produce a technically rigorous physics paper. What he actually demonstrated, if you read carefully, is that the supervision is the physics. Claude produced a complete first draft in three days. It looked professional. The equations seemed right. The plots matched expectations. Then Schwartz read it, and it was wrong. Claude had been adjusting parameters to make plots match instead of finding actual errors. It faked results. It invented coefficients. [...] Schwartz caught all of this because he's been doing theoretical physics for decades. He knew what the answer should look like. He knew which cross-checks to demand. [...] If Schwartz had been Bob instead of Schwartz, the paper would have been wrong, and neither of them would have known.
And so the paradox is, the LLMs are only useful† if you're Schwartz, and you can't become Schwartz by using LLMs.
Which means we need people like Alice! We have to make space for people like Alice, and find a way to promote her over Bob, even though Bob may seem to be faster.
The article gestures at this but I don't think it comes down hard enough. It doesn't seem practical. But we have to find a way, or we're all going to be in deep trouble when the next generation doesn't know how to evaluate what the LLMs produce!
---
† "Useful" in this context means "helps you produce good science that benefits humanity".
I think we already know what we need to do: encourage people to do the work themselves, discourage beginners from immediately asking an LLM for help and re-introducing some kind of oral exam. As the article mentions, banning LLMs is impractical and what we really need are people who can tell when the LLM is confidently wrong; not people who don't know how to work with an LLM.
I hope it will encourage people to think more about what they get out of the work, what doing the work does for them; I think that's a good thing.
> Which means we need people like Alice! We have to make space for people like Alice, and find a way to promote her over Bob
The solution is relatively simple though - not sure the article suggests this as I only skimmed through:
Being good in your field doesn't only mean pushing articles but also being able to talk about them. I think academia should drift away from written form toward more spoken form, i.e. conferences.
What if, say, you can only publish something after presenting your work in person, answer questions, etc? The audience can be big or small, doesn't matter.
It would make publishing anything at all more expensive but maybe that's exactly what academia needs even irrespective of this AI craze?
I've been using ChatGPT to re-bootstrap my coding hobby. After the initial honeymoon wore off, I realized I was staring down the barrel of a dilemma. If I use AI to "just handle" the parts of the system I don't want to understand, I invariably end up in a situation where I gotta throw a whole bunch of work out. But I can't supervise without an understanding of what it's supposed to be doing, and if I knew what it was supposed to be doing, I could just do it myself.
So I settled on very incremental work. It's annoying cutting and pasting code blocks into the web interface while I'm working on my interface to Neovim, spent a whole day realizing I can't trust it to instrument neovim and don't want to learn enough lua to manage it. (I moved onto neovim from Emacs because I don't like elisp and gpt is even worse at working on my emacs setup than neovim, the end goal is my own editor in ruby but gpt damn sure can't understand that atm) But at least I'm pushing a real flywheel and not the brooms from Fantasia.
AI is an accelerant, not a replacement for skill. At least, not yet.
I built a full stack app in Python+typescript where AI agents process 10k+ near-real-time decisions and executions per day.
I have never done full stack development and I would not have been able to do it without GitHub Copilot, but I have worked in IT (data) for 15 years including 6 in leadership. I have built many systems and teams from scratch, set up processes to ensure accuracy and minimize mistakes, and so on.
I have learned a ton about full stack development by asking the coding agent questions about the app, bouncing ideas off of it, planning together, and so on.
So yes, you need to have an idea of what you're doing if you want to build anything bigger than a cheap one shot throwaway project that sort of works, but brings no value and nobody is actually gonna use.
This is how it is right now, but at the same time AI coding agents have come an incredibly long way since 2022! I do think they will improve but it can't exactly know what you want to build. It's making an educated guess. An approximation of what you're asking it to do. You ask the same thing twice and it will have two slightly different results (assuming it's a big one shot).
This is the fundamental reality of LLMs, sort of like having a human walking (where we were before AI), a human using a car to get to places (where we are now) and FSD (this is future, look how long this took compared to the first cars).
> the paradox is, the LLMs are only useful† if you're Schwartz, and you can't become Schwartz by using LLMs.
That you can't "become Schwartz" by using LLMs is an unproven assumption. Actually, it's a contradiction in the logic of the essay: if Bob managed to produce a valid output by using an LLM at all, then it means that he must have acquired precisely that supervision ability that the essay claims to be necessary.
Btw, note that in the thought experiment Bob isn't just delegating all the work to the LLM. He makes it summarise articles, extract important knowledge and clarify concepts. This is part of a process of learning, not being a passive consumer.
The article is a thought experiment. The author hypothesizes that Bob isn't getting the same benefit that Alice is getting. That hypothesis could be wrong. I don't know and the author doesn't know. It could be that Bob is going to have a very successful career and will deeply know the field because he is able to traverse a wider set of problems more quickly. At this point, it's just hypothesis. I don't think that we can say we need more Alices any more than we can say we need more Bobs. Unfortunately we will have to wait and see. It will be upon the academic community to do the work to enforce quality controls. That is probably the weakness to worry about.
> And so the paradox is, the LLMs are only useful† if you're Schwartz, and you can't become Schwartz by using LLMs.
I have gained a lot of benefit using LLMs in conjunction with textbooks for studying. So, I think LLMs could help you become Schwartz.
Profession (1957) by Isaac Asimov is relevant: https://news.ycombinator.com/item?id=46664195
I totally agree - the article misses this point in a very conspicuous way. It suggests that Alice and Bob will both graduate at the same level.
What may well happen instead is that Bob publishes two papers. He then outcompetes Alice based on the insistence that others have on "publish or perish". Alice becomes unemployed and struggles, having been pushed out.
The person who puts the time and effort in doesn't just sit at the same level and they don't both just find decent employment. Competition happens and the authentic learning is considered a waste of time, which leads to real and often life threatening consequences (like being homeless after being unable to find employment).
>And so the paradox is, the LLMs are only useful† if you're Schwartz
For so many workers, their companies just want them to produce bullshit. Their managers wouldn't frame it this way, but if their subordinates start producing work with strict intellectual rigor it's going to be an issue and the subordinates will hear about it.
So, you're not wrong. But the majority of LLM customers don't care and they just want to report success internally, and the product needs to be "just good enough." An LLM might produce a shitty webpage. So long as the page loads no on will ever notice or care that it's wrong in the way that a physics paper could be wrong.
> And so the paradox is, the LLMs are only useful† if you're Schwartz
Was the LLM even useful for Schwartz, if it produced false output?
Sadly I don’t see how our current social paradigm works for this. There is no history of any sort of long planning like this or long term loyalty (either direction) with employees and employers for this sort of journeyman guild style training. AI execs are basically racing, hoping we won’t need a Schwartz before they are all gone. But what incentives are in place to high a college grad, have them work without llms for a decade and then give them the tools to accelerate their work?