I ran Ms-01s with 100GBE, copper DACs in my kubernetes cluster. Killed the NVME drives in that tiny box. I'd bet the same issue doing this with FW. And I wasn't even pushing 100GBE very hard at all, it was mostly for fun.
AI + 100GBE (under load) + tiny box = unreliable and eead very quickly.
How many MS-01s did you have clustered?
And could you not use something like an N5 + iSCSI for storage?