To really take advantage of those gpu cores you need memory bandwidth. Modern transformer based LLMs are really bandwidth hungry. I am really happy to see this first push. NVIDIA having discrete GPU/memory/etc is an option, but not great for a lot of different reasons. Unified memory architectures like what AMD and Apple have are the way to go for the future. Put 256GB of ram on the main board and be able to access it at speed for LLM use please.