Does this sort of thing scale? Would a 30B or higher model see similar performance/memory gains under this scheme?