A little feedback to AMD executives about the current status of ROCm here:
(1) - Supporting only Server grade hardware and ignoring laptop/consumer grade GPU/APU for ROCm was a terrible strategical mistake.
A lot of developers experiments first and foremost on their personal laptop first and scale on expensive, professional grade hardware later. In addition: some developers simply do not have the money to buy server grade hardware.
By locking ROCm only to server grade GPUs, you restrict the potential list of contributors to your OSS ROCm ecosystem to few large AI users and few HPC centers... Meaning virtually nobody.
A much more sensible strategy would be to provide degraded performance for ROCm on top of consummer GPUs, and this is exactly what Nvidia with CUDA is doing.
This is changing but you need to send a clear message there. EVERY new released device should be properly supported by ROCm.
- (2) Supporting only the two last generations of architecture is not what customers want to see.
https://rocm.docs.amd.com/projects/install-on-linux/en/docs-...
People with existing GPU codebase invests significant amount of effort to support ROCm.
Saying them two years later: "Sorry you are out of update now!" when the ecosystem is still unstable is unacceptable.
CUDA excels to backward compatibility. The fact you ignore it entirely plays against you.
(3) - Focusing exclusively on Triton and making HIP a second class citizen is non-sensical.
AI might get all the buzz and the money right now, we go it.
It might look sensible on the surface to focus on Python-base, AI focused, tools like Triton and supporting them is definitively necessary.
But there is a tremendous amount of code that is relying on C++ and C to run over GPU (HPC, simulation, scientific, imaging, ....) and that will remain there for the multiple decades to come.
Ignoring that is loosing, again, custumers to CUDA.
It is currently pretty ironic to see such a move like that considering that AMD GPUs currently tend to be highly competitive over FP64, meaning good for these kind of applications. You are throwing away one of your own competitive advantage...
(4) - Last but not least: Please focus a bit on the packaging of your software solution.
There has been complained on this for the last 5 years and not much changed.
Working with distributions packagers and integrating with them does not cost much... This would currently give you a competitive advantage over Nvidia..
Additional points, CUDA is polyglot, and some people do care about writing their kernels in something else other than C++, C or Fortran, without going through code generation.
NVidia is acknowledging Python adoption, with cuTile and MLIR support for Python, allowing the same flexibility as C++, using Python directly even for kernels.
They seem to be supportive of having similar capabilities for Julia as well.
The IDE and graphical debuggers integration, the libraries ecosystem, which now are also having Python variants.
As someone that only follows GPGPU on the side, due to my interests in graphics programming, it is hard to understand how AMD and Intel keep failing to understand what CUDA, the whole ecosystem, is actually about.
Like, just take the schedule of a random GTC conference, how much of it can I reproduce on oneAPI or ROCm as of today.
> Supporting only Server grade hardware and ignoring laptop/consumer grade GPU/APU for ROCm was a terrible strategical mistake. A lot of developers experiments first and foremost on their personal laptop first and scale on expensive, professional grade hardware later.
NVIDIA is making the same mistake today by deprioritizing the release of consumer-grade GPUs with high VRAM in favour of focusing on server markets.
They already have a huge moat, so it's not as crippling for them to do so, but I think it presents an interesting opportunity for AMD to pick up the slack.
There actually isn't any locking involved. I can take a new, officially unsupported version of ROCm and just use it with my 7900 XT despite my card not being officially supported and it works. It's just that AMD doesn't feel that they need to invest the resources to run their test suite against my card and bless it as officially supported. And maybe if I was doing something other than running PyTorch I'd run into bugs. But it's just laziness, not malice.
> Working with distributions packagers and integrating with them does not cost much... This would currently give you a competitive advantage over Nvidia..
Packaging is actually a huge amount of effort if you try to package for all distros.
So the common long-standing convention is to use a "vendored software" approach. You design everything to install into /opt/foo/, and you provide a simple install script to install everything, from one (or several) giant zips/tarballs. It's very old and dumb but it works quite well. Easy to support from company perspective, just run your dumb installer on a couple distros once in a while. Don't depend on distro-specific paths, use basic autodetection to locate and load libraries/dependencies.
Once you do that, it is actually easier for distros to package your software for you. They make one basic package that runs the installer, then they carve up the resulting files into sub-packages based on path. Then they just iterate on that over time as bugs come in (as users try to install just package X.a, which really needs files from X.b).
But you need to hire people with expertise in the open source world to know all this, and most companies don't. Maybe there's just not a lot of us left out there. Or, more likely, they just don't understand that wider support + easier use = more adoption.