I'm interested in trying a moderation scheme that puts the power in the receiving client, with opinionated defaults. Let the client filter for itself, mute users for itself, and make users invisible to itself. Have default settings that make sense for the app, but let the user override them.
Use a cheap purpose-built LLM like OpenAI's free moderation endpoint to classify the text and send the original text plus the classification to clients, and let clients choose what to do with it, with opinionated defaults appropriate to the app.
Maybe you still need to identify persistent bad actors rather than acting only on content. But still, allow clients to decide what to do with that information.
I suppose my thinking is that strong default automatic moderation that's invisible to offenders is a requirement for a project like this to be able to offer a welcoming experience to users, but putting the power in an LLM and fixed filter lists feels very wrong. So my thought is to use those things to give the client power. But maybe that makes no difference if nobody changes settings away from defaults anyway.
on the other hand, if someone is making their own site, they may care about the user impression their site gives to visitors.
i would think they should have the freedom on the site they build and host to choose the impression they give. and sure, if they choose to let their square be filled with noise rather than signal, that’s absolutely their choice. but they also may choose for it to be filled with signal rather than noise. the key thing is the site owner should have the choice to give whatever impression they want for their creation.
again, if they want that impression to be hijacked by noisy trolls, they can choose that too.