Does marginalia_nu not use embedding models as part of search? I guess I assumed it would. If you have embeddings anyway, decision trees on the embedding vector (e.g. catboost) tend to work pretty well. Fine-tuning modernbert works even better but probably won't meet the criteria of "really fast and run well on CPUs". That said, the approach described in the article seems to work well enough and obviously provides extremely cheap inference
It does not use any transformer models right now. I've made experiments with BERT-adjacent methods, but not found them fast enough to be useful. Basically, whatever approach is used, it needs to do inference at ~10us latencies to either make real-time result filtering viable, or <1ms not add unreasonable overhead to processing-time result labeling.