> directly to PTX
Weird. There's a recent NVIDIA MLIR that is quite good and fast. Or they could target the even easier and more recent/fashionable tile IR [1] used by CuTile [2] (a little bit higher level but significantly easier to target, only loses on epilogue fusion and similar).