Reengineering Life

Researchers Just Took a Major Step Toward Decoding the Entire Human Genome

17 years after the Human Genome Project, researchers unlocked the X chromosome

Reengineering Life is a series from OneZero about the astonishing ways genetic technology is changing humanity and the world around us.

By the time the Human Genome Project ended its 13-year run in 2003, it had mapped about 90% of our entire genetic code. But some of the remaining parts have proved difficult to decode. As DNA reading technology has improved over the years, those gaps are gradually filling in and researchers are getting closer to building a complete picture of the genome. But there are still about 100 or so regions that are incomplete, including a handful of sections in the X chromosome.

Now, for the first time, scientists have produced an end-to-end map of the X chromosome. The achievement, which could help scientists better understand a number of genetic conditions, was published July 14 in the journal Nature.

Mapping these regions in the X chromosome and elsewhere has stumped researchers because they contain lots of repetitive DNA segments, making them a challenge to sequence. These segments can repeat for thousands or even millions of DNA letters, also known as bases.

“Assembling or putting these pieces together was impossible until only recently,” Karen Miga, a DNA biologist at the University of Californa, Santa Cruz and an author on the new paper, tells OneZero.

Using advanced sequencing technology, researchers focused on decoding the X chromosome because most humans have at least one. People born genetically female typically have a set of X chromosomes, and those born genetically male usually have an X and a Y.

“The X chromosome is of interest in the human medical genetics and genomics field for having association with a lot of traits and diseases,” Miga says. For instance, the X chromosome is linked to color blindness, Duchenne muscular dystrophy, and hemophilia.

“Assembling or putting these pieces together was impossible until only recently.”

Miga worked with Adam Phillippy, an investigator at the National Human Genome Research Institute using technology known as long-read sequencing, which is able to read long stretches of DNA bases at a time. The human genetic code is incredibly long — about 6 billion bases — and DNA sequencing machines can’t read all those bases at once. Instead, researchers have to chop the genome into smaller pieces of hundreds of bases to analyze those smaller bits one at a time. Once that’s done, they then have to assemble them back together.

In the initial stages of the Human Genome Project, scientists could read only about 500 letters at a time. In the mid-2000s, sequencing technology became more accurate, but it slowed down the read time of DNA to about 100 to 200 letters at a time. By 2010, new technology came onto the market that could read about 10,000 bases at once. On older sequencing technology, these repetitive DNA sections yielded short pieces that look almost identical, with few clues on how to fit the pieces together.

Now, improvements in sequencing mean some machines can read about 100,000 or more bases simultaneously. Using two different instruments, the team analyzed the X chromosome from a special cell type with two identical X chromosomes. Then Phillippy and his team used a new computer program they developed to assemble the many segments.

On the X chromosome, they were able to fill in a gap located at the center of the chromosome, as well as a number of units of genes. In doing so, the researchers didn’t find any big surprises — there were no new, undiscovered genes, for example. But they did discover many variations that can exist within these repetitive sections of the genome. These include individual DNA letters that might be swapped, deleted, or inserted or bigger sections of DNA that get copied, moved, or inverted. Such alterations are known to cause genetic diseases.

Further study of these previously unmapped areas could open up new regions of the genome where researchers can search for potential links between these variations and genetic diseases with unknown causes.

“You could be turning a blind eye to some of the richest sequence diversity that exists in the human population, and some of that sequence diversity that you’re not looking at could it be correlated with disease in a way we’ve never been able to study before,” Miga says.

The researchers are turning to other incomplete chromosomes next — in hopes of finally assembling the first complete human genome.

Former staff writer at Medium, where I covered biotech, genetics, and Covid-19 for OneZero, Future Human, Elemental, and the Coronavirus Blog.

Sign up for Pattern Matching

By OneZero

A newsletter that puts the week's most compelling tech stories in context, by OneZero senior writer Will Oremus. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

The undercurrents of the future. A publication from Medium about technology and people.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store