It wasn't designed to do well on MMMLU, it's a general model designed for deterministic task like OCR, object detection, STT and more and a by product of that is great language abilities. It still has a transformer backbone giving great language skills while being good at other stuff.
See the full benchmark: https://interfaze.ai/leaderboards