Something that is also cool with these cards is proper SR-IOV without hassle. Arc pro cards make for nice graphical acceleration devices for vms. I know ai gets all the hype but I also appreciate being able to accelerate multiple workstations with a single gpu and still get decent frametimes.
Hi Intel, I'm itching to buy an Xe3P! Or, Nova Lake? Crescent Island? Celestial? Jaguar Shores?
Whatever the hell you name it doesn't matter to me, I just want a workstation with one of them bad boys attached to 160GB of RAM for legit inference power!
I've been saving my money not paying for Claude Code so I can run my own agentic coding setup at home on yours. Please don't charge too much for the workstation class card if you can at all manage it. Maybe give us a discount to preorder? Please don't price a regular consumer like me out of the market!
Also, I am speculating integer based models will become hot due to lower memory and power requirements. Will the Xe3P be able to do integer-based math inference to use all that RAM to even greater effect?
Time to first token is a very important performance metric, as I figured out using a Mac Studio M3 Ultra (that is quite slow on this aspect).
But 32GB for a TDP of 230W is perhaps not super interesting. Especially because you probably want to have more than one card. It's a lot of heat. You could use the cards for heating up a building, but heatpumps exist.
Intel Arc B70 when released, can only produce 1/3 of the token of RTX PRO 4500. Well, it also cost 1/3 of RTX PRO 4500.
It lacked software support the for the primary target application, running LLM. The officially supported vllm fork is 6 version behind mainline. It did not run the latest hot new open models on huggingface. Parallel two of B70 reduce token rate, not improve it. So, the software behind B70 is basically so far behind.
Here are some llama.cpp benchmarks for it: https://www.phoronix.com/review/intel-arc-pro-b70-linux/3
I was looking into this for LLMs but it's clearly a graphics-processing focused card. The memory bandwidth is too low for that much RAM to be useful in an LLM context. The 5090 I have has the same amount of RAM but far more bandwidth and that makes it much more useful.
For those that use Blender, in their section about Blender:
> We hope that, in the future, there will be real options other than NVIDIA for GPU-based rendering, as it is an area where competition is nearly non-existent.
And Checking opendata.blender.org, a NVIDIA GeForce RTX 4080 Laptop GPU scores 5301.8, while Intel Arc Pro B70 is still at 3824.64.
So there is still a bit more to go before Intel GPUs perform close to NVIDIA's.
Is Intel still making GPUs? I have heard so many conflicting things about will they/won't they stay in the market.
I would like one for the vram but I am sure they will be unobtainable after the initial stock sells out as I assume they were produced before the RAM prices went up.
It should be possible to use the VRAM as extra swap space, when you're not using it for AI or gaming or anything else. 32GB is already more than a lot of computers have as just regular RAM, even sufficient to hold an OS installation:
https://www.tomshardware.com/news/lightweight-windows-11-run...
It is weird that the reviewer does not mention RTX PRO 6000 96GB, but mentioned RTX PRO 5000 72GB. 72GB RTX PRO 5000 is a special order, and much less people are aware of it. RTX PRO 6000 is known by mostly everyone in the LLM world.
I cannot understand why would a tech reviewer do that.
How should I update my simplistic understanding that decode is bw-bound with these results that show the B70 decoding faster than a 4090 (about 50% more bw)?
Can you use those AI cards for gaming too?
Or the makers intentionally nerf them, in order to better segment the markets/product lines?
$950 for 23TF fp32? Have GPU performance grew in past 5-10 years at all?
Can we not have a PCIe card that's ASIC (and isn't GPU) with even DDR 4 or DDR 5 memory (Let's say 128 GB) onboard and being able to shove four of them on a consumer grade motherboard and then being utilized in parallel?
Noob question.
These seem amazing for hobbyist, but that TDP given the perf might be an issue deploying a lot of them
It looks like, if one can afford it, the R9700 is worth the extra money.
I read that Intel is getting out of the dGPU space, but then again, their iGPUs are really getting good. I can't understand why they'd give up the space when the AI market is so insane.
Why are they still using their old Xe2/Battlemage architecture rather than their new Xe3/Celestial? They already used it in their Panther Lake chipset.
From what I've read the Intel drivers are terrible and holding back using them for LLMs.
this review was essentially pointless, they reviewed the card for a ton of workloads nobody in their right mind would pick it for, and left out the only use case where it makes sense. great job?
There's a tradeoff between dense models and MoEs on memory usage vs. compute for the same quality.
For example, Qwen3.5 27B and Qwen3.5 122B A10B have similar average performance across benchmarks. The 122B is much faster to run than the 27B (generates more tokens at the same compute). The 27B, on the other hand, uses ~4x less VRAM at low context lengths (less difference at high context lengths).
Right now, different hardware seems to be suited to different points in the dense vs. MoE balance. On one extreme is hardware like the DGX Spark and Strix Halo which have a lot of memory compared to compute performance and memory bandwidth, and are best-suited for MoE workflows. On the other extreme you have cards like RTX 5090 which have very high performance for the price but rather little memory, and is best suited for dense models.
The Arc Pro B70 seems to be the awkward middle. With 1-2 of these, you can run a ~30B dense model slowly, probably not fast enough to be useful interactively (you'd probably need a 5090 or 2x 3090 for that). Or, you can run a MoE model at high throughput, but probably not enough quality to support agentic workflows that actually use your throughput.