It's exceptionally difficult to avoid the data being de-anonymised. If an 'anonymised&#x...

michaelt • yesterday at 10:32 PM • 2 replies • view on HN

It's exceptionally difficult to avoid the data being de-anonymised.

If an 'anonymised' medical record says the person was born 6th September 1969, received treatment for a broken arm on 1 April 2004, and received a course of treatment in 2009 after catching the clap on holiday in Thailand - that's enough bits of information to uniquely identify me.

And medical researchers are usually very big on 'fully informed consent' so they can't gloss over that reality, hide it in fine print or obsfucate it with flowerly language. They usually have to make sure the participants really understand what they're agreeing to.

It might still work out fine, of course - 95% of people's medical histories don't contain anything particularly embarrassing, so you might be able to get plenty of participants anyway.

Replies

jjgreen • today at 8:43 AM

... received a course of treatment in 2009 after catching the clap on holiday in Thailand

Yeah, sorry about that

yosame • today at 12:26 AM

In my experience with health data, the dates are usually offset by a random but constant amount for each person (e.g. id 12345 will have all their dates shifted by +5 weeks) to avoid identification by dates.

Unfortunately the sequence of treatments and locations are usually enough to identify someone, especially if it's a rarer condition.

➕ show 1 reply

alt Hacker News

Replies