logoalt Hacker News

ninjagooyesterday at 8:48 AM7 repliesview on HN

I wonder if the providers are doing everyone, themselves included, a huge disservice by providing free versions of their models that are so incompetent compared to the SOTA models that these types of q&a go viral because the ai hype doesn't match the reality for unpaid users.

And it's not just the viral questions that are an issue. I've seen people getting sub-optimal results for $1000+ PC comparisons from the free reasoning version while the paid versions get it right; a senior scientist at a national lab thinking ai isn't really useful because the free reasoning version couldn't generate working code from a scientific paper and then being surprised when the paid version 1-shotted working code, and other similar examples over the last year or so.

How many policy and other quality of life choices are going to go wrong because people used the free versions of these models that got the answers subtly wrong and the users couldn't tell the difference? What will be the collective damage to the world because of this?

Which department or person within the provider orgs made the decision to put thinking/reasoning in the name when clearly the paid versions have far better performance? Thinking about the scope of the damage they are doing makes me shudder.


Replies

yipbubyesterday at 8:52 AM

I used a paid model to try this. Same deal.

show 1 reply
polarbearballsyesterday at 1:59 PM

I have paid versions of Chat-GPT and Anthropic and set them both to the best model and they both told me to walk.

Claude told me: "Walk! At 25 meters, you'd barely get the car started before you arrived. It's faster and easier on foot — plus you avoid the awkwardness of driving a dirty car just a few seconds down the road."

show 2 replies
janlukacsyesterday at 10:28 AM

How much is the real (non-subsidized) cost of the "paid" plans? Does anyone in the world have an answer for this?

show 1 reply
kakacikyesterday at 11:55 AM

At work, paid gitlab duo (which is supposed to be a blend of various top models) gets more complex codebase hilariously wrong every time. Maybe our codebase is obscure for it (but it shouldn't be, standard java stuff with usual open source libs) but it just can't actually add value for anything but small snippets here and there.

For me litmus paper for any llm is flawless creation of complex regexes from a well formed prompt. I don't mean trivial stuff like email validation but rather expressions on limits of regex specs. Not almost-there, rather just-there.

TZubiriyesterday at 8:53 AM

I don't think 100% adoption is necessarily the ideal strategy anyways. Maybe 50% of the population seeing AI as all powerful and buying the subscription vs 50% of the population still being skeptics, is a reasonable stable configuration. 50% get the advantage of the AI whereas if everybody is super intelligent, no one is super intelligent.

Their loss

show 1 reply
hxbdgyesterday at 1:10 PM

[dead]

dist-epochyesterday at 11:31 AM

> a senior scientist at a national lab thinking ai isn't really useful because the free reasoning version couldn't generate working code

I would question if such a scientist should be doing science, it seems they have serious cognitive biases

show 1 reply