logoalt Hacker News

An NSFW filter for Marginalia search

50 pointsby speckxtoday at 3:58 PM9 commentsview on HN

Comments

ChadNauseamtoday at 7:55 PM

Does marginalia_nu not use embedding models as part of search? I guess I assumed it would. If you have embeddings anyway, decision trees on the embedding vector (e.g. catboost) tend to work pretty well. Fine-tuning modernbert works even better but probably won't meet the criteria of "really fast and run well on CPUs". That said, the approach described in the article seems to work well enough and obviously provides extremely cheap inference

show 1 reply
marginalia_nutoday at 6:13 PM

This was a very meandering project, and trying to corral it into some sort of coherent narrative was a bit of an undertaking on its own. Hopefully it makes some sense.

show 1 reply
8organicbitstoday at 6:52 PM

Have you seen many examples of websites labeling themselves, perhaps using rating meta tags (<meta name="rating" ...>)? Self-labeling seems valuable in some ways, but I don't think I've seen it catch on.

show 1 reply
GenericDevtoday at 7:16 PM

[dead]