M4 Max with 128GB of memory.
M4 max should work at 120GB for ANE and 500+ for GPU. So GPU will be 3-4 times faster for anything over 1-3B. ANE is likely as fast for prefill due to higher FLOPs
M4 max should work at 120GB for ANE and 500+ for GPU. So GPU will be 3-4 times faster for anything over 1-3B. ANE is likely as fast for prefill due to higher FLOPs