logoalt Hacker News

InfiniteLouptoday at 12:59 PM2 repliesview on HN

I was always curious about how Tay worked technically, since it was build before the Transformers era.

Was it based on a specific scientific paper or research?

The controversy surrounding it seemed to have polluted any search for a technical breakdown or a discussion, or the insights gained from it.


Replies

mapmeldtoday at 3:56 PM

People have tried to suss this out on the ML subreddit, and it is confusing. Most of the worst messages from Tay were just people discovering a "repeat after me: __" function, so it's hard just to figure out which Tay messages to consider as responses of the model.

There seems to have been interest in a model which would pick up language and style of its conversations (not actually learning information or looking up facts). If you haven't trained an LSTM model before - you could train on Shakespeare's plays and get out ye olde English in a screenplay format, but from line to line there was no consistency in plot, characters, entrances and exits, etc. in a way which you'd expect after GPT-2. Twitter would be good for keeping a short-form conversation. So I believe Tay and the Watson that appeared on Jeopardy are more from this 'classical NLP' thinking and not proto-LLMs, if that makes sense.