I was always curious about how Tay worked technically, since it was build before the Transformers era.
Was it based on a specific scientific paper or research?
The controversy surrounding it seemed to have polluted any search for a technical breakdown or a discussion, or the insights gained from it.
People have tried to suss this out on the ML subreddit, and it is confusing. Most of the worst messages from Tay were just people discovering a "repeat after me: __" function, so it's hard just to figure out which Tay messages to consider as responses of the model.
There seems to have been interest in a model which would pick up language and style of its conversations (not actually learning information or looking up facts). If you haven't trained an LSTM model before - you could train on Shakespeare's plays and get out ye olde English in a screenplay format, but from line to line there was no consistency in plot, characters, entrances and exits, etc. in a way which you'd expect after GPT-2. Twitter would be good for keeping a short-form conversation. So I believe Tay and the Watson that appeared on Jeopardy are more from this 'classical NLP' thinking and not proto-LLMs, if that makes sense.