logoalt Hacker News

LatencyKillstoday at 1:48 AM2 repliesview on HN

Couldn't this be used to locate private data in unstructured text without having to rely on other means of PII detection?

1. Pass the raw text through the filter to obtain the spans.

2. Map all the spans back to the original text.

Now you have all the PII information.


Replies

Everdred2dxtoday at 2:30 AM

Yep, and already has been done.

https://github.com/chiefautism/privacy-parser

yjftsjthsd-htoday at 3:57 AM

If you have the redacted and unredacted versions, then you can diff them; that seems unsurprising? Unless I'm really misunderstanding "spans"?

show 1 reply