I'm surprised nobody else has commented on this. This is a very straightforward and useful thing for a small locally runnable model to do.
From a compliance POV it's not enough. For example: "<NAME PERSON ONE> is president of the United States" is still identifiable even though the name has been redacted.
Since you can't be 100% certain that a filter redacts all personal data, you'd have to make sure that you have measures in place which allow OpenAI to legally process personal data on your behalf. Otherwise you'd technically have a data breach (from a GDPR pov).
And if OpenAI can legally process personal data on your behalf, why bother filtering if processing with filtering is also compliant?
For the confused: this link must have gotten revived or something, I posted this comment a few days ago. Looks like it's getting the accolades I claim it deserves now.
Same here, this is an incredibly useful thing to have in the toolkit
And also something that it’s dangerous to try to do stochastically.