logoalt Hacker News

ralferootoday at 2:32 PM1 replyview on HN

(complete sidetrack)

I think this graph is a great illustration about how anonymising data is hard. It's very easy to isolate individual authors from this list, because there are clear diagonal lines because the year and age are increasing in lockstep. This also suggests there aren't actually that many authors in this collection, because of these strong diagonals everywhere.

There's probably also some erroneous data here with a bunch of points representing material written by people at age 34 between about 1920 and 1940 (an obvious horizontal line) when most of the rest of the graph doesn't show any strong horizontal bias for a specific age.


Replies

Gander5739today at 3:07 PM

> This also suggests there aren't actually that many authors in this collection

There are 200 according to the website.