In review, I think it might have been the workflow versioning being strange, and the lack of any heartbeating/crash detection for longer running activities