logoalt Hacker News

nltoday at 3:51 AM1 replyview on HN

I'm no where near as smart as OpenAI of course, but I did build https://tools.nicklothian.com/webner/index.html that uses a BERT based named-entity-recognition model running in your browser to do a subset of PII redaction.

It works pretty well for the use cases I was playing with.

The OpenAI model is small enough that I might enhance my tool to use it.


Replies

stingraycharlestoday at 4:50 AM

I just used it on a document, but the amount of false positives this generates make it faily difficult to use?

I fed it a ~ 100 line markdown document, took about 10 seconds, and it decided that "matter" (as in, frontmatter), "end" (as in, frontend), MCP (as in, mcp server) are organizations.

Most of them don't even make grammatical sense, e.g. "Following the discussion in <PERSON_1>, blahblah".

Brings me back to what NLP was like a decade ago. I always thought spaCy was a very nice project in that space.

show 1 reply