Member-only story
Why ‘Anonymized Data’ Isn’t So Anonymous
Cleaning data of ‘personally identifying information’ is harder than you might think

In 2015, Latanya Sweeney, a researcher who studies data anonymization and privacy, published research specifically targeting the deanonymization of HIPAA-protected data in Washington. In that state, (and many others), it is possible for companies and individuals to purchase anonymized medical record data. Sweeney purchased data through legal channels that included, as she noted, “virtually all hospitalizations occurring in the state in a given year” and myriad details about those hospital visits, including diagnoses, procedures, the attending physician, a summary of charges, how the bill was paid, and more. The records were anonymous in that they did not contain the patients’ name or address, but they did include patients’ five-digit U.S. postal codes.
Then, using an archive of Washington state news sources, Sweeney searched for any article printed in 2011 that contained the word “hospitalized.” The search turned up 81 articles. By analyzing the newspaper articles and the anonymized dataset, Sweeney “uniquely and exactly matched medical records in the state database for 35 of the 81 news stories,” she wrote. Those news stories also contained the patient’s name, effectively nullifying the anonymization efforts for these 35 patients.
Data powers the modern world. Data about us controls which news, movies, and advertisements we see. Data determines which of our friends’ posts arrive in our social media feeds. Data drives which potential romantic partners appear in our dating apps. Scientific research, which has long been data focused, continues to push further into the realm of big data. Researchers compile and process massive datasets — and the platforms of surveillance capitalism are right there with them.
Much of this data is sensitive. Google’s data stockpile can include your complete search history over time. Depending on what you search for, it might reveal a bout of depression, a private kink, a medical condition, and much more. Facebook’s stockpile of our past behavior, comments, and photos is quite revealing for many people. Few of us would be comfortable giving a new acquaintance the complete history of our credit card activity. Our…