I suspect it was considered many times, but the sheer computation scale would make it feel like obscene brute force. It feels like the right shape but too wild to think about implementing.
I think as time went on, and hardware got better, it seemed more reasonable to actually think about a viable implementation of what I think was a widespread intuition anyone in ML had that everything's context is everything.
It just seemed like a theoretical thing until hardware caught up. Maybe. Perhaps I'm applying a retrospective excuse to why it took so long.
I suspect it was considered many times, but the sheer computation scale would make it feel like obscene brute force. It feels like the right shape but too wild to think about implementing.
I think as time went on, and hardware got better, it seemed more reasonable to actually think about a viable implementation of what I think was a widespread intuition anyone in ML had that everything's context is everything.
It just seemed like a theoretical thing until hardware caught up. Maybe. Perhaps I'm applying a retrospective excuse to why it took so long.