logoalt Hacker News

macleginnyesterday at 3:16 PM0 repliesview on HN

They are poor at generalising from a small number of examples; this is why the real generalisation power is achieved in pre-training.