logoalt Hacker News

Aurornisyesterday at 9:32 PM2 repliesview on HN

> I've seen people claiming that they have a bunch of paid accounts that they use to fetch the pages, which is, of course, ridiculous.

The curious part is that they allow web scraping arbitrary pages on demand. So if a publisher could put in a lot of arbitrary requests to archive their own pages and see them all coming from a single account or small subset of accounts.

I hope they haven't been stealing cookies from actual users through a botnet or something.


Replies

xurukefiyesterday at 9:39 PM

Exactly. If I was an admin of a popular news website I would try to archive some articles and look at the access logs in the backend. This cannot be too hard to figure out.

coppsilgoldtoday at 12:05 AM

You don't even need active measures. If a publisher is serious about tracing traitors there are algorithms for that (which are used by streamers to trace pirates). It's called "Traitor Tracing" in the literature. The idea is to embed watermarks following a specific pattern that would point to a traitor or even a coalition of traitors acting in concert.

It would be challenging to do with text, but is certainly doable with images - and articles contain those.