Can you elaborate on "resistance against cuda"? What were people clinging to instead?
In the beginning, valid claims of 100x to 1,000x for genuine workloads due to HW level advances enabled by CUDA were denied stating that this ignored CPU and memory copy overhead, or it was only being measure relative to single core code etc. No amount of evidence to the contrary was sufficient for a lot of people who should have known better. And even if they believed the speedups, they were the same ones saying Intel would destroy them with their roadmap. I was there. I rolled my eyes every single time but then AI happened and most of them (but not all of them) denied ever spouting such gibberish.
Won't name names anymore, it really doesn't matter. But I feel the same way about people still characterizing LLMs as stochastic parrots and glorified autocomplete as I feel about certain CPU luminaries (won't name names) continuing to state that GPUs are bad because they were designed for gaming. Neither sorts are keeping up with how fast things change.
IMO it was mostly that people didn't want to rewrite (and maintain) their code for a new proprietary programming model they were unfamiliar with. People also didn't want to invest in hardware that could only run code written in CUDA.
Lots of people wanted (and Intel tried to sell, somewhat succesfully) something they could just plug-and-play and just run the parallel implementations they'd already written for supercomputes using x86. It seemed easier. Why invest all of this effort into CUDA when Intel are going to come and make your current code work just as fast as this strange CUDA stuff in a year or two.
Deep learning is quite different from the earlier uses of CUDA. Those use cases were often massive, often old, FORTRAN programs where to get things running well you had to write many separate kernels targeting each bit. And it all had to be on there to avoid expensive copies between GPU and CPU, and early CUDA was a lot less programmable than it is now, with huge performance penalties for relatively small "mistakes". Also many of your key contributers are scientists rather than profressional programmers who see programming as getting in the way of doing what they acutally want to do. They don't want to spend time completely rewriting their applications and optimizing CUDA kernels, they want to keep on with their incremental modifications to existing codebases.
Then deep learning came along and researchers were already using frameworks (Lua Torch, Caffe, Theano). The framework authors only had to support the few operations required to get Convnets working very fast on GPUs, and it was minimal effort for researchers to run. It grew a lot from there, but going from "nothing" to "most people can run their Convnet research" on GPUs was much eaiser for these frameworks than it was for any large traditional HPC scientific application.