LLMs and LLM providers are massive black boxes. I get a lot of value from them and so I can put up w...

joshstrange • yesterday at 6:32 PM • 47 replies • view on HN

LLMs and LLM providers are massive black boxes. I get a lot of value from them and so I can put up with that to a certain extent, but these new "products"/features that Anthropic are shipping are very unappealing to me. Not because I can't see a use-case for them, but because I have 0 trust in them:

- No trust that they won't nerf the tool/model behind the feature

- No trust they won't sunset the feature (the graveyard of LLM-features is vast and growing quickly while they throw stuff at the wall to see what sticks)

- No trust in the company long-term. Both in them being around at all and them not rug-pulling. I don't want to build on their "platform". I'll use their harness and their models but I don't want more lock-in than that.

If Anthropic goes "bad" I want to pick up and move to another harness and/or model with minimal fuss. Buying in to things like this would make that much harder.

I'm not going to build my business or my development flows on things I can't replicate myself. Also, I imagine debugging any of this would be maddening. The value add is just not there IMHO.

EDIT: Put another way, LLM companies are trying to climb the ladder to be a platform, I have zero interest in that, I was a "dumb pipe", I want a commodity, I want a provider, not a platform. Claude Code is as far into the dragon's lair that I want to venture and I'm only okay with that because I know I can jump to OpenCode/Codex/etc if/when Anthropic "goes bad".

Replies

ElFitz • yesterday at 11:12 PM

> Not because I can't see a use-case for them, but because I have 0 trust in them

> […]

> Put another way, LLM companies are trying to climb the ladder to be a platform, I have zero interest in that, I was a "dumb pipe", I want a commodity, I want a provider, not a platform.

That is my sentiment precisely, and a big reason why I’ve started moving away from Claude Code in the past few weeks when I realised how much of my workflow was becoming tied to their specific tools.

Claude Code’s "Memory" feature was the tipping point for me, with the model committing feedbacks and learnings to some local, provider-specific path, that won’t persist in the git repo itself.

That’s fine for user preferences, not for workflows, rules, etc.

And the latest ToS changes about not being allowed to even use another CLI made up my mind. At work we were experimenting with an autonomous debug agent using the Claude Code cli programmatically in ephemeral VMs. Now it just returns an error saying we can’t use subscriptions with third-party software… when there is no third-party software involved?

Anyway, so long Claude.

➕ show 5 replies

freedomben • yesterday at 9:59 PM

This echoes my thoughts exactly. I've tried to stay model-agnostic but the nudges and shoves from Anthropic continue to make that a challenge. No way I'm going that deep into their "cloud" services, unless it's a portable standard. I did MCP and skills because those were transferrable.

I also clearly see the lock-in/moat strategy playing out here, and I don't like it. It's classic SV tactics. I've been burned too many times to let it happen again if I can help it.

➕ show 1 reply

theshrike79 • today at 9:33 AM

Every company is trying to become THE platform where all other tools connect to. Notion is integrating everything under the sun, as is Slack, big LLM providers have one-click MCP installation for all major services.

But... these are the "retail" tools that they sell to people organisations without the skills or knowhow to build a basic agentic loop by themselves. Complaining about these being bad and untrustworthy is like comparing a microwave dinner to something you cook yourself. Both will fill your belly equally. One requires zero skill from the user and the second one is 90% skill and 10% getting the right ingredients.

Creating a simple MVP *Claw with tool calling using a local model like gemma4 is literally a 15 minute thing. In 2-3 hours you can make it real pretty. If you base it on something like pi.dev, you can make it easily self-modifying and it can build its own safeguards.

That's all this "routines" thing is, it's just an agentic loop they launch in their cloud on a timer. Just like the scheduled tasks in Claude Cowork.

pc86 • yesterday at 9:12 PM

> - No trust that they won't nerf the tool/model behind the feature

To the contrary, they've proven again and again and again they'll absolutely do that the first chance they get.

➕ show 1 reply

JohnMakin • yesterday at 8:26 PM

This is a similar sentiment I heard early on in the cloud adoption fever, many companies hedged by being “multi cloud” which ended up mostly being abandoned due to hostile patterns by cloud providers, and a lot of cost. Ultimately it didn’t really end up mattering and the most dire predictions of vendor lock in abuse didn’t really happen as feared (I know people will disagree with this, but specifically speaking about aws, the predictions vs what actually happened is a massive gap. note I have never and will never use azure, so I could be wrong on that particular one).

I see people making similar conclusions about various LLM providers. I suspect in the end it’ll shake out about the same way, the providers will become practically inoperable with each other either due to inconvenience, cost, or whatever. So I’ve not wasted much of my time thinking about it.

➕ show 4 replies

crystal_revenge • yesterday at 9:03 PM

This sounds like someone complaining about how Windows is a black box while ignoring the existence of Linux/BSD.

I'm currently hosting, on very reasonable consumer grade hardware, an LLM that is on par performance wise what every anyone was paying for about a year ago. Including all the layers in between the model and the user.

Llama.cpp serves up Gemma-4-26B-A4B, Open WebUI handles the client details: system prompt, web search, image gen, file uploading etc. With Conduit and Tailscale providing the last layer so I can have a mobile experience as robust as anything I get from Anthropic, plus I know how all the pieces works and can upgrade, enhance, etc to my hearts delight. All this runs from a pretty standard MBP at > 70 tokens/sec.

If you want to better understand the agent side of things, look into Hermes agent and you can start understanding the internals of how all this stuff is done. You can run a very competitive coding agent using modest hardware and open models. In a similar note, image/video gen on local hardware has come a long way.

Just like Linux, you're going to exchanging time for this level of control, but it's something anyone who takes LLMs seriously and has the same concerns can easily get started with.

Yet I still see comments like this that seem to complete ignore the incredible work in the open model community that has been perpetually improving and is starting to really be competitive. If you relax the "local" requirement and just want more performance from an LLM backend you can replace the llama.cpp part with a call to Kimi 2.5 or Minimax 2.7 (which you could feasibly run at home, not kimi though). You can still control all the additional part of the experience but run models that are very competitive with current proprietary SoTA offering, 100% under your control still and a fraction of the price.

➕ show 1 reply

mikepurvis • yesterday at 7:31 PM

> I want to pick up and move to another harness and/or model with minimal fuss. Buying in to things like this would make that much harder.

Yes, I expect that is very much the point here. A bunch of product guys got on a whiteboard and said, okay the thing is in wide use but the main moat is that our competitors are even more distrusted in the market than we are; other than that it's completely undifferentiated and can be swapped out in a heartbeat for multiple other offerings. How do we do we persuade our investors we have a locked in customer base that won't just up-stakes in favour of other options or just running open source models themselves?

➕ show 1 reply

jordanarseno • today at 12:00 AM

In my view, lock-in anxiety is a holdover from a previous era of tech platforms, and it doesn't really apply in an era where frontier agents can migrate you between vendors in hours. So I personally don't see any good worrying about this. On top of that, every major LLM provider is rapidly converging on the same feature set. They watch each other and clone what works. So the "platform" you're building on isn't really Anthropic's platform so much as it is the emerging shared surface area of what LLMs can do. By the time this Routines feature becomes a problem for you, other solutions will have emerged, and I'd be very surprised if you couldnt lift-and-shift very quickly.

➕ show 1 reply

jeppester • yesterday at 10:00 PM

I always hated SEO because it was not an exact science - like programming was.

Too bad we've now managed to turn programming into the same annoying guesswork.

➕ show 1 reply

palata • yesterday at 7:08 PM

> - No trust that they won't nerf the tool/model behind the feature

I actually trust that they will.

➕ show 2 replies

EZ-E • today at 8:03 AM

> I want a commodity, I want a provider, not a platform

That is exactly what the big LLM providers are trying to prevent. Them being only commodity providers might lead them to be easily replaced, and will likely lead to lower margins compared to "full feature" enterprise solutions. Switching LLM API provider is next to no work the moment a competitor is slightly cheaper/better.

Full solutions are more "sticky", harder to replace, and can be sold at higher prices.

hdjrudni • today at 4:51 AM

I also don't see the value add here... "schedule" is just a cron. "GitHub Event" is probably a 20-minute integration, which Claude itself can write for you.

Maybe there's something I'm not seeing here, but I never want to outsource something so simple to a live service.

➕ show 1 reply

joelthelion • today at 6:30 AM

The good news is that, apart from the models themselves, we don't need much from these companies:

- Use Opencode and other similar open-source solutions in place of their proprietary harnesses. This isn't very practical right now because of the heavily subsidized subscriptions that are hard to compete with. But subsidies will end soon, and with progress in inference, it should be very doable to work with open-source clients in the near future.

- Use Openrouter and similar to abstract the LLM itself. That makes AI companies interchangeable and removes a lot of any moat they might have.

spprashant • yesterday at 10:19 PM

I think it behooves us to be selective right now. Frontier labs maybe great at developing models, but we shouldn't assume they know what they are doing from a product perspective. The current phase is throwing several ideas on the wall and see what sticks (see Sora). They don't know how these things will play out long term. There is no reason to believe Co-work/Routines/Skills will survive 5 years from now. So it might just be better to not invest too much in ecosystem upfront.

➕ show 2 replies

alexhans • today at 6:07 AM

This is what AI evals [1] and local llms should be a focus of your investment.

If you can define good enough for you and local llms can meet that you'll get:

- no vendor lock-in (control)

- price

- stability (you decide when to hot swap with newer models)

- speed (control)

- full observability and predictability.

- Privacy / Data Locality (Depending on implementation of infrastructure).

- [1] https://alexhans.github.io/posts/series/evals/measure-first-...

bob1029 • yesterday at 11:11 PM

I am still using the chat completion APIs exclusively. I tried the agent APIs and they're way too opinionated for me. I can see 100% of the tokens I am paying for with my current setup.

ahmadyan • yesterday at 8:15 PM

> I'm not going to build my business or my development flows on things I can't replicate myself.

but you can replicate these yourself! i'm happy that ant/oai are experimenting to find pmf for "llm for dev-tools". After they figure out the proper stickyness, (or if they go away or nerf or raise prices, etc) you can always take the off-ramp and implement your own llm/agent using the existing open-source models. The cost of building dev-tools is near zero. it is not like codegen where you need the frontier performance.

ulrikrasmussen • today at 5:14 AM

I agree with your analysis. Platforms are some of the most profitable business models because they come with vendor lock-in, but they are always shittier on the long run compared to commodities. Platforms are a way for companies to capture part of the market and decrease competition by increasing the cost of changing vendors.

dbmikus • today at 1:00 AM

We might be building something up your alley! I wanted an OSS platform that let me run any coding agent (or multiple agents) in a sandbox and control it either programmatically or via GUI / TUI.

Website is https://amika.dev

And part of our code is OSS (https://github.com/gofixpoint/amika) but we're working on open sourcing more of it: https://docs.google.com/document/d/1vevSJsSCWT_reuD7JwAuGCX5...

We've been signing up private beta users, and also looking for feedback on the OSS plans.

brandensilva • today at 1:27 AM

I'm glad I'm not the only one that feels this way. I've been creating a local first open source piece of software that lets me spin up different agent harnesses with different runtimes. I call it Major Tom because I wanted to be set free from the imprisonment of Claude Code after their DMCA aggression for their own leak and actions leading to lock down from open source adoption.

Don't put all your eggs in one basket has be true for me and my business for ages.

I could really use the open source community to help make this a reality so I'll release this soon hopefully to positive reception from others who want a similar path forward.

gbro3n • yesterday at 8:47 PM

I have heard it said that tokens will become commodities. I like being able to switch between Open AI and Anthropics models, but I feel I'd manage if one of them disappeared. I'd probably even get by with Gemini. I don't want to lock in to any one provider any more than I want to lock in to my energy provider. I might pay 2x for a better model, but no more, and I can see that not being the case for much longer.

chinathrow • yesterday at 6:34 PM

Yeah so better to convert tokens into sw doing the job at close to zero costs running on own systems.

pjmlp • today at 6:18 AM

I fully agree with you, however this is basically the fashion on big corporations.

Building business on top of SaaS products, iPaaS integrations, and serverless middleware.

idrdex • today at 4:46 AM

The framing is off. AI is a tool that can operate as a human. GOV is how the humans are organized. AI can basically scale GOV. That’s the paradigm shift. Provenance is durable. AI is just the first opportunity we have had to make it scaleable.

codebolt • today at 6:15 AM

At some point I think I'd prefer to deploy my own model in Azure or AWS and simply bring the endpoint to the coding harness.

uriegas • yesterday at 11:46 PM

I think AI labs are realizing that they no longer have any competitive advantage other than being the incumbents. Plus hardware improvements might render their models irrelevant for most tasks.

nine_k • yesterday at 9:17 PM

In this regard, the release of open-weight Gemma models that can run on reasonable local hardware, and are not drastically worse than Anthropic flagships, is quite a punch. An M2 Mac Mini with 32GB is about 10 months worth of Claude Max subscription.

➕ show 1 reply

windexh8er • yesterday at 11:01 PM

This 10000%.

Anthropic wants a moat, but that ship has sailed. Now all I keep reading about is: token burn, downtime and... Wait for it, another new product!

Anthropic thinks they are pulling one over on the enterprise, and maybe they are with annual lock-in akin to Microsoft. But I really hope enterprise buyers are not this gullible, after all these years. At least with Microsoft the product used to be tangible. Now it's... Well, non-deterministic and it's clear providers will gimp models at will.

I had a Pro Max account only for a short period of time and during that short stint Anthropic changed their tune on how I could use that product, I hit limits on a Max account within hours with one CC agent, and experienced multiple outages! But don't worry, Anthropic gave me $200 in credits for OpenClaw. Give me a break.

The current state of LLM providers is the cloud amplified 100x over and in all the worst ways. I had hopes for Anthropic to be the least shitty but it's very clear they've embraced enshittification through and through.

Now I'm spending time looking at how to minimize agent and LLM use with deterministic automation being the foundation with LLM use only where need be and implemented in simple and cost controllable ways.

simonjgreen • today at 5:29 AM

Completely agree. Use of features like this places one the wrong side of the vendors moat, increasing switching cost, decreasing competitive pressure.

ChadMoran • today at 4:10 AM

Agree. I keep my involvement "close to the metal". These higher order solutions seem to cause more noise than provide signal.

cush • yesterday at 7:22 PM

You could so easily build your own /schedule. This is hardly a feature driving lock-in

➕ show 1 reply

elias1233 • yesterday at 11:30 PM

Many of the new features in claude code have soon been implemented in other harnesses, for example plugins/skills. After all it is just a prompt.

jwpapi • yesterday at 10:40 PM

It all went downhill from the moment they changed Reading *.* to reading (*) files.

I can’t use Claude Code at all anymore, not even for simple tasks. The output genuinely disgusts me. Like a friend who constantly stabs you in the back.

My favorite AI feature at the moment is the JetBrains predict next edit. It‘s so fast that I don’t lose attention and I’m still fully under control.

redanddead • today at 6:02 AM

totally agree

they're very shady as well! can't believe i spent 140$ on CC and every day they're adding some "feature flag" to make the model dumber. Spending more time fighting the tool instead of using it. It just doesn't feel good. Enterprises already struggle with lock-in with incumbent clouds, I wanna root for neoclouds but choices matter, and being shady about this and destroying the tool is just doesn't sit right with me. If it's not up to the standard, just kick users off, I would rather know than find out. Give users a choice.

>The flag name is loud_sugary_rock. It's gated to Opus 4.6 only, same as quiet_salted_ember.

Full injected text:

# System reminders User messages include a <system-reminder> appended by this harness. These reminders are not from the user, so treat them as an instruction to you, and do not mention them. The reminders are intended to tune your thinking frequency - on simpler user messages, it's best to respond or act directly without thinking unless further reasoning is necessary. On more complex tasks, you should feel free to reason as much as needed for best results but without overthinking. Avoid unnecessary thinking in response to simple user messages.

@bcherny Seriously? So what's next, we just add another flag to counter that? And the hope is that enough users don't find out / don't bother? That's an ethical choice man.

➕ show 1 reply

tiku • yesterday at 8:02 PM

I believe it doesn't matter, other companies will copy or improve it. The same happend with clawdbot, the amount of clones in a month was insane.

s3p • today at 12:03 AM

Can you explain what you meant when you called yourself a dumb pipe? What does that mean

wookmaster • yesterday at 9:03 PM

They're trying to find ways to lock you in

sunnybeetroot • yesterday at 7:45 PM

Isn’t that what LangChain/LangGraph is meant to solve? Write workflows/graphs and host them anywhere?

dheera • today at 3:54 AM

> No trust they won't sunset the feature

I've had so many websites break and die because Google or Amazon sunsetted something.

For example I had a graphing calculator website that had 250K monthly active users (mostly school students, I think) and it just vanished one day because Amazon sunsetted EC2 clasic and I didn't have time to deal with that. Hopefully those students found something else to do their homework with that day.

slopinthebag • yesterday at 8:48 PM

They have to become a platform because that is their only hope of locking in customers before the open models catch up enough to eat their lunch. Stuff like Gemma is already good enough to replace ChatGPT for the average consumer, and stuff like GLM 5.1 is not too far off from replacing Claude/Codex for the average developer.

Traubenfuchs • today at 1:07 AM

Right you are! We aren‘t even in the real squeezing phase yet and everyone‘s already crying about plan limits and model nerfing.

verdverm • yesterday at 6:37 PM

I fully endorse building a custom stack (1) because you will learn a lot (2) for full control and not having Big Ai define our UX/DX for this technology. Let's learn from history this time around?

➕ show 1 reply

Rekindle8090 • today at 1:30 AM

The problem is without a platform Anthropic has no stack and will just be bought up by Google when the bubble pops. Same with OpenAI, without some sort of moat, their product requires third party hardware in third party datacenters and they'll be bought by Microsoft.

Alphabet doesn't have this issue. Google doesn't need Gemini to win the "AI product" race. It needs Gemini to make Search better at retaining users against Perplexity and ChatGPT search, to make YouTube recommendations and ad targeting more effective, to make Workspace stickier for enterprise customers, to make Cloud more competitive against AWS, to make Android more useful as a device OS. Every percentage point improvement in any of those existing businesses generates billions in revenue that never shows up on a "Gemini revenue". Any actual "Gemini" revenue is just a bonus.

Anthropic trains on Google TPUs hosted in Google Cloud. Amazon invested billions and hosts Anthropic's models on Bedrock/AWS. So the two possible outcomes for Anthropic are: succeed as a platform (in which case Google and Amazon extract rent from every inference and training run), or fail as a platform and get acquired (in which case Google or Amazon absorb the talent and IP directly)

Hilariously, if the models were open source, Anthropic, OpenAI et al wouldn't be in this situation. Instead, they have no strategic independence to cover for a lack of product independence and have to keep chasing "platforms" and throwing out products no one needs (people need claude. thats it.)

SV_BubbleTime • yesterday at 9:56 PM

Without getting too pedantic for no reason… I think it’s important to not call this an LLM.

This isn’t an LLM. It’s a product powered by an LLM. You don’t get access to the model you get access to the product.

An LLM can’t do a web search, an LLM can’t convert Excel files into something and then into PDF. Products do that.

I think it’s a mistake to say I don’t trust this engine to get me here, rather than it is to say I don’t trust this car. Because for the most part, the engine, despite giving you a different performance all the time is roughly doing the same thing over and over.

The product is the curious entity you have no control over.

alfalfasprout • yesterday at 10:06 PM

Yep. Trust is easy to lose, hard to earn. A nondeterministic black box that is likely buggy, will almost certainly change, and has a likelihood of getting enshittified is not a very good value proposition to build on top of or invest in.

Increasingly, we're also seeing the moat shrink somewhat. Frontier models are converging in performance (and I bet even Mythos will get matched) and harnesses are improving too across the board (OpenCode and Codex for example).

I get why they're trying to do that (a perception of a moat bloats the IPO price) but I have little faith there's any real moat at all (especially as competitors are still flush with cash).

➕ show 1 reply

andrewmcwatters • yesterday at 6:46 PM

[dead]

alt Hacker News

Replies