Can anyone explain what’s the story here? Is this just a re-skinned qwen? Who is deepreinforce-ai an...

kennywinker • yesterday at 6:23 PM • 3 replies • view on HN

Can anyone explain what’s the story here? Is this just a re-skinned qwen? Who is deepreinforce-ai and why isn’t this model listed on their website?

How does it self-improve, does the model change on disk - or just during a single context run it gets better?

Replies

simonw • yesterday at 6:29 PM

It doesn't self-improve, that's a misleading headline.

As far as I can tell they trained it by running their own reinforcement learning on top of Qwen and Gemma 4 (not sure how they combined weights from both, or if they used Qwen as the basis and Gemma 4 to help train?) - so the "self-improving" is about their training process, not how you use the weights.

➕ show 4 replies

v3ss0n • yesterday at 9:15 PM

Clickbait title.

alt Hacker News

Replies