Correct, you can think of this like NUMA or a distributed system where you have compute colocated with storage. It’s a special purpose accelerator for very specific problems that have been optimized to take advantage of such an architecture.
It’s also not my proposal. The industry is exploring ways to cut down the energy requirements to do AI - 80-90% of the memory consumption is just moving memory back and forth across the memory controller. It has to read a row from a bank into a row buffer, access the specific cell being requested and then shuttle it over the bus to the compute and then write the data back to the cells. The current idea is to maybe do the processing on the entire row buffer but you could imagine scaling that up to do it at the bank level. The challenge is manufacturing complexity since DRAM is made different, heat from the ALU, etc.
[1] https://semiconductor.samsung.com/news-events/tech-blog/hbm-...