The previous version of this model has been pretty bad, but claimed to adhere to copyright laws. However, based on my testing, that's not true either. So in my view this is completely useless.
As long as the following remains true, this release ends up a bigger contribution to science at large than most other models trained "behind closed doors":
> Fully open model: open weights + open data + full training details including all data and training recipes
It uses fineweb, which is derived from Common Crawl, which is an unlicensed scrape of web pages.
I'm curious how you test; could you explain? Do you have a set of factoids that should be subject to copyright, but are somehow literally (whole work) generated by the model in question?
So far the smallest model I have actually seen behave in a way that feels consistent with the contemporary LLM chat experience is Gemma 4 12B. (The QAT build particularly). The E4B model is not bad — it has a good conversational flow, it responds well if nudged — but the 12B model feels capable.
Nothing below that really seems to be good for anything other than training for specific tasks. I have not been impressed by the earlier Apertus 8B model, which doesn't feel like it really responds to nudges.
I am a strong believer in smaller models, so I might try one of these out of curiosity to see if it might do useful things in limited contexts.