Tangential. I'm a newb, can you name the concept of partitioning weights so we dont need to load whole thing?
Do you mean model sharding?
Do you mean model sharding?