Here is a pytorch optimizer that can maintain a matrix as orthogonal throughout optimization:
https://github.com/adrianjav/pogo — POGO: A Proximal One-step Geometric Orthoptimizer
https://arxiv.org/abs/2602.14656 — An Embarrassingly Simple Way to Optimize Orthogonal Matrices at Scale; Adrián Javaloy, Antonio Vergari
That's useful, but wouldn't help with this particular experiment because they orthogonalize activations, not weights