None of the deepseek models are multimodal. How are you guys able to use it in daily work without image input?
For example it's just so natural to share screenshots in a chat.
...like how we were using LLMs just a little while ago?
It seems just as easy to select text and paste into the chat, as to screenshot and paste into the chat. At least when not on phone, eg doing coding.
But YMMV if you're doing visual design. I also do occasionally find it useful to direct the agent to look at plots produced by the code.
I just never do that.