logoalt Hacker News

teraflopyesterday at 4:17 PM1 replyview on HN

We should have learned this lesson 20 years ago when researchers were able to deanonymize a lot of the Netflix Prize dataset, which contained nothing except movie ratings and their associated dates.

https://arxiv.org/abs/cs/0610105

If movie ratings are vulnerable to pattern-matching from noisy external sources, then it should be obvious that location data is enormously more vulnerable.


Replies

totetsutoday at 3:00 AM

> In contrast to previous attacks on micro-data privacy [22], our de-anonymization algorithm does not assume that the attributes are divided a priori into quasi-identifiers and sensitive attributes. Examples include anonymized transaction records (if the adversary knows a few of the individual's purchases, can he learn all of her purchases?), recommendation and rating services (if the adversary knows a few movies that the individual watched, can he learn all movies she watched?), Web browsing and search histories (12], and so on. In such datasets, it is impossible to tell in advance which attributes might be available to the adversary;

Is Location data highly dimensional though?