logoalt Hacker News

rfv6723today at 3:07 AM5 repliesview on HN

Apple AI team keeps going against the bitter lesson and focusing on small on-device models.

Let's see how this would turn out in longterm.


Replies

peepeepoopoo137today at 4:07 AM

"""The bitter lesson""" is how you get the current swath of massively unprofitable AI companies that are competing with each other over who can lose money faster.

show 1 reply
yorwbatoday at 7:58 AM

They took a simple technique (normalizing flows), instantiated its basic building blocks with the most general neural network architecture known to work well (transformer blocks), and trained models of different sizes on various datasets to see whether it scales. Looks very bitter-lesson-pilled to me.

That they didn't scale beyond AFHQ (high-quality animal faces: cats, dogs and big cats) at 256×256 is probably not due to an explicit preference for small models at the expense of output resolution, but because this is basic research to test the viability of the approach. If this ever makes it into a product, it'll be a much bigger model trained on more data.

EDIT: I missed the second paper https://arxiv.org/abs/2506.06276 where they scale up to 1024×1024 with a 3.8-billion-parameter model. It seems to do about as well as diffusion models of similar size.

janalsncmtoday at 7:38 AM

The bitter-er lesson is that distillation from bigger models works pretty damn well. It’s great news for the GPU poor, not great for the guys training the models we distill from.

sipjcatoday at 3:41 AM

somewhat hard to say how the cards fall when the cost of 'intelligence' is coming down 1000x year over year while at the same time compute continues to scale. the bet should be made on both sides probably

show 1 reply
echelontoday at 3:28 AM

Edge compute would be clutch, but Apple feels a decade too early.

show 1 reply