Honestly I've not found a huge amount of value from the "science".
There are plenty of papers out there that look at LLM productivity and every one of them seems to have glaring methodology limitations and/or reports on models that are 12+ months out of date.
Have you seen any papers that really elevated your understanding of LLM productivity with real-world engineering teams?
The writing on this website is giving strong web3 vibes to me / doesn't smell right.
The only reason I'm not dismissing it out of hand is basically because you said this team was worth taking a look at.
I'm not looking for a huge amount of statistical ceremony, but some detail would go a long way here.
What exactly was achieved for what effort and how?
But the absence of papers is precisely the problem and why all this LLM stuff has become a new religion in the tech sphere.
Either you have faith and every post like this fills you with fervor and pious excitement for the latest miracles performed by machine gods.
Or you are a nonbeliever and each of these posts is yet another false miracle you can chalk up to baseless enthusiasm.
Without proper empirical method, we simply do not know.
What's even funnier about it is that large-scale empirical testing is actually necessary in the first place to verify that a stochastic processes is even doing what you want (at least on average). But the tech community has become such a brainless atmosphere totally absorbed by anecdata and marketing hype that no one simply seems to care anymore. It's quite literally devolved into the religious ceremony of performing the rain dance (use AI) because we said so.
One thing the papers help provide is basic understanding and consistent terminology, even when the models change. You may not find value in them but I assure you that the actual building of models and product improvements around them is highly dependent on the continual production of scientific research in machine learning, including experiments around applications of llms. The literature covers many prompting techniques well, and in a scientific fashion, and many of these have been adopted directly in products (chain of thought, to name one big example—part of the reason people integrate it is not because of some "fingers crossed guys, worked on my query" but because researchers have produced actual statistically significant results on benchmarks using the technique) To be a bit harsh, I find your very dismissal of the literature here in favor of hype-drenched blog posts soaked in ridiculous language and fantastical incantations to be precisely symptomatic of the brain rot the LLM craze has produced in the technical community.
No, I agree! But I don’t think that observation gives us license to avoid the problem.
Further, I’m not sure this elevates my understanding: I’ve read many posts on this space which could be viewed as analogous to this one (this one is more tempered, of course). Each one has this same flaw: someone is telling me I need to make a “organization” out of agents and positive things will follow.
Without a serious evaluation, how am I supposed to validate the author’s ontology?
Do you disagree with my assessment? Do you view the claims in this content as solid and reproducible?
My own view is that these are “soft ideas” (GasTown, Ralph fall into a similar category) without the rigorous justification.
What this amounts to is “synthetic biology” with billion dollar probability distributions — where the incentives are setup so that companies are incentivized to convey that they have the “secret sauce” … for massive amounts of money.
To that end, it’s difficult to trust a word out of anyone’s mouth — even if my empirical experiences match (along some projection).