Scaling laws vs combinatorial explosion, who wins? In personal experience claude does exceedingly well on mundane code (do a migration, add a field, wire up this UI) and quite poorly on code that has likely never been written (even if it is logically simple for a human). The question is whether this is a quantitative or qualitative barrier.
Of course it's still valuable. A real app has plenty of mundane code despite our field's best efforts.
Combinatorial explosion? What do you mean? Again, your experiences are true, but they are improving with each release. The error rate on tasks continues to go down, even novel tasks (as far as we can measure them). Again this is where verifiable domains come in -- whatever problems you can specify the model will improve on them, and this improvement will result in better generalization, and improvements on unseen tasks. This is what I mean by taking your observations of today, ignoring the rate of progress that got us here and the known scaling laws, and then just asserting there will be some fundamental limitation. My point is while this idea may be common, it is not at all supported by literature and the mathematics.