We use Gemini for some specific tasks. It is often unavailable due to capacity limits or other downtime.
It's probably the best multimodal model I've worked with (if somebody knows a better one for audio analysis, please let me know!)