Wish they gave some numbers for total GPU hours to train this model, seems comparatively tiny when c...

OliverGuy • today at 7:14 AM • 1 reply • view on HN

Wish they gave some numbers for total GPU hours to train this model, seems comparatively tiny when compared to LLMs so interested to know how close this is to something trainable by your average hobbyist/university/small lab

Replies

OliverGuy • today at 7:27 AM

Edit, it looks like the paper does

TPUv5e with 16 tensor cores for 2 days for the 200M param model.

Claude reckons this is 60 hours on a 8xA100 rig, so very accessibile compared to LLMs for smaller labs

alt Hacker News

Replies