If you are going to go to the bother of fine tuning for trivial problems like subject classification then I think you'll find Scikit Learn with a SGDClassifier on 2-grams will do probably just as well and be under 1MB for the trained classifier.
You can train it in under a minute, and it will work perfectly well on embedded devices.
Small LLMs are good choices for text classification in two cases:
- If you next to provide in-context examples and classifier based on them.
- Your classification goes beyond simple subject-type classifiers. For example, multiple choice question answering is classification where small LLM will work but traditional ML methods won't/
A small transformer like BERT or variants is a better fit. It only takes a few examples, which can be generated synthetically using an LLM.
Trains quickly and classifies speedily on modern hardware.
Had a lot of fun doing stuff like this years ago, before LLMs were a thing.
there are models between 2-grams and 600m param models that would be good options. i don't expect a 2-gram to do very well here. also i'm not sure why this model isn't a fine choice if it solves their problem
Not with 800 examples. If you are going to consider an ngram model, I think you are better off getting a frontier llm to write you an absurd regex.