logoalt Hacker News

gkcnlryesterday at 3:45 AM1 replyview on HN

It seems like everybody is focused on "LLM"s, a.k.a Large Language Models. One interesting addition to that is fine-tuned- small parameter, distilled, context-dependent small language models that:

1- Do a particular task with great capability (due to its constrained, limited scope) 2- Do it in such a way, it integrates gracefully in your workflow without ever requiring you to know you are using an LM.

There is a difference between outsourcing your workflow to AI and actually utilizing it.

Check this: https://www.distillabs.ai/blog/we-benchmarked-12-small-langu...


Replies

fennecfoxyyesterday at 9:42 AM

Eh I think the small model thing is kind of a no-go.

Reason being is that many workloads for AI are dynamically mixed, where training from multiple subjects comes into play and you just can't know exactly what mix will be required for each task ahead of time.

I was hoping loras would do this for us as well but they don't really seem to have worked out for llms (compared to in the image/video diffusion space).

Perhaps some future model will have some sort of "core" that can load/unload portions of itself dynamically at runtime. Like go for a very horizontal architecture/hundreds of MoE and unload/load those paths/weights once a parent value meets or exceeds some minimum, hmmm.