it's not going to happen with LLMs unless ram + storage gets several orders of magnitude cheaper like, yesterday
informatics aren't magic, you'll never be able to compress """knowledge""" into a small model in a way equivalent to the 1.5 TB model
This will happen, but reconfiguring the infrastructure of the entire planet to train LLMs and run them over networks might be the "bubble", the megalomania.
I agree. But I also think the future is some kind of hybrid approach where agents run locally, what they can, and then call out to the cloud for what they can't.