Couple of thoughts:
1. I’d wager that given their previous release history, this will be open‑weight within 3-4 weeks.
2. It looks like they’re following suit with other models like Z-Image Turbo (6B parameters) and Flux.2 Klein (9B parameters), aiming to release models that can run on much more modest GPUs. For reference, the original Qwen-Image is a 20B-parameter model.
3. This is a unified model (both image generation and editing), so there’s no need to keep separate Qwen-Image and Qwen-Edit models around.
4. The original Qwen-Image scored the highest among local models for image editing in my GenAI Showdown (6 out of 12 points), and it also ranked very highly for image generation (4 out of 12 points).
Generative Comparisons of Local Models:
https://genai-showdown.specr.net/?models=fd,hd,kd,qi,f2d,zt
Editing Comparison of Local Models:
https://genai-showdown.specr.net/image-editing?models=kxd,og...
I'll probably be waiting until the local version drops before adding Qwen-Image-2 to the site.
It's crazy to think there was a fleeting sliver of time during which Midjourney felt like the pinnacle of image generation.
I recently tried out LMStudio on Linux for local models. So easy to use!
What Linux tools are you guys using for image generation models like Qwen's diffusion models, since LMStudio only supports text gen.
The Chinese vertical typography is sadly a bit off. If punctuation marks are used at all, they should be the characters specifically designed for vertical text, like ︒(U+FE12 PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP).
I use gen-AI to produce images daily, but honestly the infographics are 99% terrible.
LinkedIn is filled with them now.
I liked their comic panels example and tried it using their chat at: https://chat.qwen.ai/
When I used the exact prompt the post - the chat works. It gives me the exact output from the blog post.
Then I used Google Translate to understand the prompt format. The prompt is: A 4x6 panel comic, four lines, six panels per line. Each panel is separated by a white dividing line.
The first row, from left to right: Panel 1: Panel 2: .....
and when I try to change the inputs the comic example fails miserably. It keeps creating random grids - sometimes 4x5 other times 4x6 but then by third row the model will get confused and the output has only 3 panels. Other times English dialogue is replaced with Chinese dialogue. so, not very reliable in my books.
The "horse riding man" prompt is wild:
"""A desolate grassland stretches into the distance, its ground dry and cracked. Fine dust is kicked up by vigorous activity, forming a faint grayish-brown mist in the low sky. Mid-ground, eye-level composition: A muscular, robust adult brown horse stands proudly, its forelegs heavily pressing between the shoulder blades and spine of a reclining man. Its hind legs are taut, its neck held high, its mane flying against the wind, its nostrils flared, and its eyes sharp and focused, exuding a primal sense of power. The subdued man is a white male, 30-40 years old, his face covered in dust and sweat, his short, messy dark brown hair plastered to his forehead, his thick beard slightly damp; he wears a badly worn, grey-green medieval-style robe, the fabric torn and stained with mud in several places, a thick hemp rope tied around his waist, and scratched ankle-high leather boots; his body is in a push-up position—his palms are pressed hard against the cracked, dry earth, his knuckles white, the veins in his arms bulging, his legs stretched straight back and taut, his toes digging into the ground, his entire torso trembling slightly from the weight. The background is a range of undulating grey-blue mountains, their outlines stark, their peaks hidden beneath a low-hanging, leaden-grey, cloudy sky. The thick clouds diffuse a soft, diffused light, which pours down naturally from the left front at a 45-degree angle, casting clear and voluminous shadows on the horse's belly, the back of the man's hands, and the cracked ground. The overall color scheme is strictly controlled within the earth tones: the horsehair is warm brown, the robe is a gradient of gray-green-brown, the soil is a mixture of ochre, dry yellow earth, and charcoal gray, the dust is light brownish-gray, and the sky is a transition from matte lead gray to cool gray with a faint glow at the bottom of the clouds. The image has a realistic, high-definition photographic quality, with extremely fine textures—you can see the sweat on the horse's neck, the wear and tear on the robe's warp and weft threads, the skin pores and stubble, the edges of the cracked soil, and the dust particles. The atmosphere is tense, primitive, and full of suffocating tension from a struggle of biological forces."""
I found the horse revenge-porn image at the end quite disturbing.
> Qwen-Image-2.0 not only accurately models the “riding” action but also meticulously renders the horse’s musculature and hair > https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/Qwe...
What the actual fuck
The text rendering is quite impressive, but is it just me or do all these generated 'realistic' images have a distinctly uncanny feel to it. I can't quite put my finger on it what it is, but they just feel off to me.
When I tried Qwen-Image-2512 I could not even get it to spell correctly. And often the letters would be garbled anyways.
The complex prompt following ability and editing is seriously impressive here. They don't seem to be much behind OpenAI and Google. Which is backed op by the AI Arena ranking.
Why is the only image featuring non-Asian men the one under the horse?
image generation kind of reminds me of video games or any cgi in general.. the progress is undeniable, and yet with every milestone it seems the last gap to "photorealism" is infinitely wide
My response to the horse image: https://i.postimg.cc/hG8nJ4cv/IMG-5289-copy.jpg
So, I've just gave it this prompt:
"Analyze this webpage: https://en.wikipedia.org/wiki/1989_Tiananmen_Square_protests...
Generate an infographic with all the data about the main event timeline and estimated number of victims.
The background image should be this one: https://en.wikipedia.org/wiki/Tank_Man#/media/File :Tank_Man_(Tiananmen_Square_protester).jpg
Improve the background image clarity and resolution."
I've received an error:
"Oops! There was an issue connecting to Qwen3-Max. Content Security Warning: The input file data may contain inappropriate content."
I wonder if locally running the model they published in December does have the same censorship in place (i.e. if it's already trained like this), or if they implement the censorship by the Chinese regimen in place for the web service only.
[dead]
[flagged]
when the horsey tranq hits
Another closed model dressed up as "coming soon" open source. The pattern is obvious: generate hype with a polished demo, lock the weights, then quietly move on. Real open source doesn't need a press release countdown.
I've seen many comments describing the "horse riding man" example as extremely bizarre (which it actually is), so I'd like to provide some background context here. The "horse riding man" is a Chinese internet meme originating from an entertainment awards ceremony, when the renowned host Tsai Kang-yong wore an elaborate outfit featuring a horse riding on his back[1]. At the time, he was embroiled in a rumor about his unpublicized homosexual partner, whose name sounded "Ma Qi Ren" which coincidentally translates to "horse riding man" in Mandarin. This incident spread widely across Chinese internet and turned into a meme. So they used "horse riding man" as an example isn't entirely nonsensical, though the image per se is undeniably bizarre and carries an unsettling vibe.
[1] The photo of the outfit: https://share.google/mHJbchlsTNJ771yBa