and run an outdated model for 3 years while progress is exponential? what is the point of that
Bake in a Genius Bar employee, trained on your model's hardware, whose entire reason for existence is to fix your computer when it breaks. If it takes an extra 50 cents of die space but saves Apple a dollar of support costs over the lifetime of the device, it's worth it.
Is progress still exponential? Feels like its flattening to me, it is hard to quantify but if you could get Opus 4.2 to work at the speed of the Taalas demo and run locally I feel like I'd get an awful lot done.
Yeah, the space moves so quickly that I would not want to couple the hardware with a model that might be outdated in a month. There are some interesting talking points but a general purpose programmable asic makes more sense to me.
It won’t stay exponential forever.
> what is the point of that
Planned obsolescence? /s
Jokes aside, they can make the "LLM chip" removable. I know almost nothing is replaceable in MacBooks, but this could be an exception.
When output is good enough, other considerations become more important. Most people on this planet cannot afford even an AI subscription, and cost of tokens is prohibitive to many low margin businesses. Privacy and personalization matter too, data sovereignty is a hot topic. Besides, we already see how focus has shifted to orchestration, which can be done on CPU and is cheap - software optimizations may compensate hardware deficiencies, so it’s not going to be frozen. I think the market for local hardware inference is bigger than for clouds, and it’s going to repeat Android vs iOS story.