I have been trying since February to get someone at AMD to shipped tuned Tensile kernels in the rcom-libs for the gfx1201. They are used by Ollama but no one on the Developer Discord knows who is responsible for that. It has been pretty frustrating and it shows that AMD has an organizational problem to overcome in addition to all the things technically that they want rocm to do.
Have you filed anything at github? https://github.com/zichguan-amd seems to be one of the main people for that...
or https://github.com/harkgill-amd