I'm also very impressed at the output given the lack of image support.
They picked a task that heavily favors a model that can do multi-modal with images, and GLM still came within striking distance.
What I'm hearing from this article is that the next generation of open models that includes better multi-modal support are basically no-brainers for adoption.
Seems like a HUGE win for Z.ai and open models in general here.