And that still performs worse than entry-level Nvidia gaming cards.
Apple isn't serious about AI and needs to figure their AI story out. Every other big tech company is doing something about it.
Not for inferencing. M3 Ultra runs big LLMs twice as fast as RTX 5090.
https://creativestrategies.com/mac-studio-m3-ultra-ai-workst...
RTX 5090 only has 32GB RAM. M3 Ultra has up to 512 GB with 819 GB/sec bandwidth. It can run models that will not fit on an RTX card.
EDIT: Benchmark may not be properly utilizing the 5090. But the M3 Ultra is way more capable than an entry level RTX card at LLM inferencing.
Not true. It performs 20-30% better than a RTX A6000 (I have both). Except it has more than 10 times the VRAM. For a comparison with newer Nvidia cards, benchmarks say it does substantially better than a 5070ti, a bit better than a 4080, and a bit worse than a 5080. But once again, it got 30 times the vram amount of the mentioned cards, which for AI workloads are just expensive toys due to lack of vram indeed.
They're basically second place behind NVIDIA for model inference performance and often the only game in town for the average person if you're trying to run larger models that wont fit in the 16 or 24gb of memory available in top-shelf NVIDIA offerings.
I wouldn't say Apple isn't serious about AI, they had the forethought to build the shared memory architecture with the insane memory bandwidth needed for these types of tasks, while at the same time designing neural cores specifically for small on-device models needed for future apps.
I'd say Apple is currently ahead of NVIDIA in just sheer memory available - which for doing training and inference on large models, it's kinda crucial, at least right now. NVIDIA seems to be purposefully limiting the memory available in their consumers cards which is pretty short sighted I think.