Super useful exercise. My gut tells me that someone will soon figure out how to build micro-LLMs for...

znnajdla • yesterday at 6:22 AM • 12 replies • view on HN

Super useful exercise. My gut tells me that someone will soon figure out how to build micro-LLMs for specialized tasks that have real-world value, and then training LLMs won’t just be for billion dollar companies. Imagine, for example, a hyper-focused model for a specific programming framework (e.g. Laravel, Django, NextJS) trained only on open-source repositories and documentation and carefully optimized with a specialized harness for one task only: writing code for that framework (perhaps in tandem with a commodity frontier model). Could a single programmer or a small team on a household budget afford to train a model that works better/faster than OpenAI/Anthropic/DeepSeek for specialized tasks? My gut tells me this is possible; and I have a feeling that this will become mainstream, and then custom model training becomes the new “software development”.

Replies

allovertheworld • yesterday at 11:46 AM

It just doesn’t work that way, LLMs need to be generalised a lot to be useful even in specific tasks.

It really is the antithesis to the human brain, where it rewards specific knowledge

➕ show 5 replies

teleforce • yesterday at 6:54 AM

This is possible but not for training but fine-tuning the existing open source models.

This can be mainstream, and then custom model fine-tuning becomes the new “software development”.

Please check out this new fine-tuning method for LLM by MIT and ETH Zurich teams that used a single NVIDIA H200 GPU [1], [2], [3].

Full fine-tuning of the entire model’s parameters were performed based on the Hugging Face TRL library.

[1] MIT's new fine-tuning method lets LLMs learn new skills without losing old ones (news):

https://venturebeat.com/orchestration/mits-new-fine-tuning-m...

[2] Self-Distillation Enables Continual Learning (paper):

https://arxiv.org/abs/2601.19897

[3] Self-Distillation Enables Continual Learning (code):

https://self-distillation.github.io/SDFT.html

ManlyBread • yesterday at 12:29 PM

>someone will soon figure out how to build micro-LLMs for specialized tasks that have real-world value

You've just reinvented machine learning

willio58 • yesterday at 7:53 AM

Hank Green in collaboration with Cal Newport just released a video where Cal makes the argument for exactly that, that for many reasons not least being cost, smaller more targeted models will become more popular for the foreseeable future. Highly recommend this long video posted today https://youtu.be/8MLbOulrLA0

ghm2199 • yesterday at 2:03 PM

Economics of producing goods(software code) would dictate that the world would settle to a new price per net new "unit" of code and the production pipeline(some wierd unrecognizable LLM/Human combination) to go with it. The price can go to near zero since software pipeline could be just AI and engineers would be bought in as needed(right now AI is introduced as needed and humans still build a bulk of the system). This would actually mean software engineering does not exist as u know it today, it would become a lot more like a vocation with a narrower defied training/skill needed than now. It would be more like how a plumber operates: he comes and fixes things once in a while a needed. He actually does not understand fluid dynamics and structural engineering. the building runs on auto 99% of the time.

Put it another way: Do you think people will demand masses of _new_ code just because it becomes cheap? I don't think so. It's just not clear what this would mean even 1-3 years from now for software engineering.

This round of LLM driven optimizations is really and purely about building a monopoly on _labor replacement_ (anthropic and openai's code and cowork tools) until there is clear evidence to the contrary: A Jevon's paradoxian massive demand explosion. I don't see that happening for software. If it were true — maybe it will still take a few quarters longer — SaaS companies stocks would go through the roof(i mean they are already tooling up as we speak, SAP is not gonna jus sit on its ass and wait for a garage shop to eat their lunch).

asim • yesterday at 8:36 AM

This is my gut feeling also. I forked the project and got Claude to rewrite it in Go as a form of exploration. For a long time I've felt smaller useful models could exist and they could also be interconnected and routed via something else if needed but also provide streaming for real time training or evolution. The large scale stuff will be dominated by the huge companies but the "micro" side could be just as valuable.

killerstorm • yesterday at 12:33 PM

You're missing the point.

Karpathy has other projects, e.g. : https://github.com/karpathy/nanochat

You can train a model with GPT-2 level of capability for $20-$100.

But, guess what, that's exactly what thousands of AI researchers have been doing for the past 5+ years. They've been training smallish models. And while these smallish models might be good for classification and whatnot, people strongly prefer big-ass frontier models for code generation.

the_arun • yesterday at 6:26 AM

If we can run them on commodity hardware with cpus, nothing like it

otabdeveloper4 • yesterday at 7:02 AM

We had good small language models for decades. (E.g. BERT)

The entire point of LLMs is that you don't have to spend money training them for each specific case. You can train something like Qwen once and then use it to solve whatever classification/summarization/translation problem in minutes instead of weeks.

➕ show 2 replies

npn • yesterday at 6:55 AM

what gut? we are already doing that. there are a lot of "tiny" LLMs that are useful: M$ Phi-4, Gemma 3/3n, Qwen 7B... There are even smaller models like Gemma 270M that is fine tuned for function calls.

they are not flourish yet because of the simple reason: the frontier models are still improving. currently it is better to use frontier models than training/fine-tuning one by our own because by the time we complete the model the world is already moving forward.

heck even distillation is a waste of time and money because newer frontier models yield better outputs.

you can expect that the landscape will change drastically in the next few years when the proprietary frontier models stop having huge improvements every version upgrade.

➕ show 1 reply

maipen • yesterday at 12:11 PM

That would only produce a model that you can ask questions to.

systima • yesterday at 10:39 AM

[dead]

alt Hacker News

Replies