>Real work
This part should have featured something about real work. But instead it features a paragraph about one-shot bs that creates "something".
Unless your work is to create thousands wordpress tremplates to sell - this is not a "real work".
Give it a repository (any kind of OSS project will do for an example) and a github issue requesting a knew feature or describing a confirmed bug. (you can and probably should write a prompt for LLM shough, don't just provide the issue itself)
And then whatch it go.
And then judge the result and it's quality.
Sorry, but from my experience 27B is just useless. You do get a result and some times it does work, but most of the times it is not event on junior dev level. And it takes it a lot of time to do the thing, unless you have an extremely expensive machine.
If your expectation is to treat it as a coworker, then you're right.
If your expectation is to treat it as a tool, then you're wrong.
I guess that's where the disconnect lies.