Why ‘Anonymized Data’ Isn’t So Anonymous

Cleaning data of ‘personally identifying information’ is harder than you might think

Tyler Elliot Bettilyon
OneZero

--

Credit: Pattanaphong Khuankaew / EyeEm/Getty Images

InIn 2015, Latanya Sweeney, a researcher who studies data anonymization and privacy, published research specifically targeting the deanonymization of HIPAA-protected data in Washington. In that state, (and many others), it is possible for companies and individuals to purchase anonymized medical record data. Sweeney purchased data through legal channels that included, as she noted, “virtually all hospitalizations occurring in the state in a given year” and myriad details about those hospital visits, including diagnoses, procedures, the attending physician, a summary of charges, how the bill was paid, and more. The records were anonymous in that they did not contain the patients’ name or address, but they did include patients’ five-digit U.S. postal codes.

Then, using an archive of Washington state news sources, Sweeney searched for any article printed in 2011 that contained the word “hospitalized.” The search turned up 81 articles. By analyzing the newspaper articles and the anonymized dataset, Sweeney “uniquely and exactly matched medical records in the state database for 35 of the 81 news stories,” she wrote. Those news stories also contained the patient’s name, effectively nullifying the anonymization efforts…

--

--

Tyler Elliot Bettilyon
OneZero

A curious human on a quest to watch the world learn. I teach computer programming and write about software’s overlap with society and politics. www.tebs-lab.com