A DNA Database Containing Data From 23andMe and Ancestry Is Vulnerable to Attacks
People uploaded their DNA to GEDmatch to find relatives, but now their personal data could be accessed by hackers
By one estimate, more than 26 million people have mailed their saliva in a plastic tube to get their DNA analyzed by genetic testing companies like 23andMe, AncestryDNA, MyHeritage, and Family Tree DNA. And more than a million of them have also uploaded their genetic information to a popular third-party website called GEDmatch to see what DNA they have in common with others in the database.
Now, computer scientists at the University of Washington have revealed that using GEDmatch comes with serious security risks. In a paper posted this week, researchers demonstrated that it’s possible to extract genetic details of any individual in the database, leaving their data vulnerable to leaks or hacks. As more people take DNA tests and third-party genetic genealogy databases grow, the risk of new kinds of biological and cyber attacks also increases. In the wrong hands, a person’s genetic data can be used for discrimination or extortion, and the implications are even greater if entire databases are leaked.
“GEDmatch has over 1 million genomes in their database, so this creates the potential for a major data leak,” Yaniv Erlich, chief scientific officer at MyHeritage and an associate professor of computer science at Columbia University who was not involved with the work, told OneZero.
GEDmatch, which was launched in 2010 as a free service, was initially used by genealogists to identify relatives. By allowing people to upload the raw DNA file they get from genetic testing companies, the site has helped more than 10,000 adoptees identify their biological parents, according to founder Curtis Rogers. It has also been useful in solving crimes. In April 2018, the database received widespread attention when news went public that law enforcement used it to solve the high-profile Golden State Killer case.
GEDmatch is designed to show users high-level genetic information, like the general location on a chromosome where two people match. It is not intended to reveal fine-grained details about individual variations in other users’ DNA, known as single nucleotide polymorphisms (SNPs), which carry very personal information about an individual’s health and heritage. GEDmatch uses graphics and visualizations to show how much of a user’s genetic profile matches that of another person. But by downloading these graphics from the site and altering their resolution, the researchers were able to uncover sensitive genetic markers that leak about 92% of a person’s SNPs with 98% accuracy, according to the analysis.
While the average user might not be able to reveal the data, someone with a background in computer security or genomics could do it fairly easily, says Peter Ney, a co-author of the study and a postdoctoral researcher at the University of Washington. “This is obviously a pretty severe privacy issue,” he said.
“This has implications that we don’t understand yet.”
Though the federal Genetic Information Nondiscrimination Act protects individuals in the United States from genetic discrimination in health insurance and employment, it doesn’t apply to life insurance, disability insurance, long-term care insurance, or businesses that have fewer than 15 employees. In the hands of a bad actor, genetic data could be used to extort or blackmail individuals. And there are national security concerns if another nation-state gained access to a whole database containing the genetic information of hundreds of thousands of people.
“This whole area is so new that it’s hard to tell exactly what might be possible,” Ney says. “This has implications that we don’t understand yet.”
Edward You, a supervisory special agent in biological countermeasures at the FBI, has speculated that genetic databases could be hacked by nation-states to discriminate against certain groups of people, make targeted bioweapons, or get a head start on scientific and medical advancements.
Ney and co-authors Louis Ceze and Tadayoshi Kohno were also able to upload fake genetic profiles to GEDmatch that were designed to look like they came from real people. Currently, there are really no restrictions on the kind of data users can upload to the site, according to Ney. This means someone could create a bogus genetic profile to impersonate a relative and use it to defraud victims or make a person’s anonymous genetic data more difficult for law enforcement to identify.
These newly revealed security problems, as well as similar weaknesses raised in another paper posted this month by researchers at the University of California, Davis, could deter users from making their profiles available to police.
Ney says he and his co-authors alerted GEDmatch about these security issues a few months ago, but it’s unclear whether the creators of the database have done anything to fix them. GEDmatch did not respond to a request for comment.
Erlich says GEDmatch users can put pressure on the company to fix the problem. They can also activate “research mode” on their DNA kits, which allows them to use the database but hides their data from other users. People also have the option of deleting their data completely from GEDmatch. “The problem is if everyone adopts these strategies, nobody can find anybody on GEDmatch,” Erlich says.
Ney says he has no reason to think that other genetic databases, like those maintained by 23andMe and AncestryDNA, pose the same security risks to users. But he says other third-party services that allow users to upload DNA data should be aware of these vulnerabilities. One possible security measure that DNA testing companies could take is to put a digital signature on every file that it would give out to customers. If third-party services accepted only uploaded files that were authentically generated by a testing company, this could help mitigate some of the risks Ney and his co-authors found.
But for now, Ney says, users’ genetic information remains vulnerable. “Services at the moment don’t really have ways to distinguish real files that are generated from testing companies from those that are digitally manipulated.”