if you understood the article, please correct my understanding - they created a new training datas...

dwa3592 • today at 2:27 PM • 1 reply • view on HN

if you understood the article, please correct my understanding -

they created a new training dataset which also has computation solving step by step (multiplying two numbers or playing sudoku) and then trained a transformer on it- as a result, the model performs the computation(multiplying two numbers) "inside" itself instead of calling calculator (or python)?

++ And they also figured out how to make attention faster?

Replies

YeGoblynQueenne • today at 2:57 PM

I can't see anything about "training a transformer". I'm trying to understand if e.g. the Sudoku solver was learned from examples (in which case, what examples?) or whether it was manually coded and then "compiled" into weights.

➕ show 2 replies

alt Hacker News

Replies