logoalt Hacker News

pdyctoday at 2:52 AM3 repliesview on HN

thanks, i tested it, failed in strawberry test. qwen 3.5 0.8B with similar size passes it and is far more usable.


Replies

cztomsiktoday at 7:59 PM

I hope you are kidding, how is that a test of any capabilities? it's a miracle that any model can learn strawberry because it cannot see the actual characters and ALSO, it's likely misspelled a lot in the corpus. I've been playing with this model and I'm pleasantly surprised, it certainly knows a lot, quite a lot for 1.1G

algoth1today at 9:09 AM

Does asking it to think step by step, or character by character, improves the answer? It might be a tokenization+unawareness of its own tokenization shortcomings

show 1 reply
selcukatoday at 4:12 AM

Interesting. Qwen 3.5 0.8B failed the test for me.