They are definitely good for local inference as evidenced by the pretty amazing performance increase on Apple silicon when used with whisper.cpp, maybe other frameworks that utilize coreml? I think they’re sorta purpose built for doing matrix math.