Two hundred years ago, Jean-François Champollion and his colleagues used the Rosetta Stone to help them translate previously undecipherable bits of Egyptian hieroglyphics. Today, half a dozen years after the release of the first draft of the human genome, deciphering our own genetic code – and understanding its implications for healthy development – remains a challenge. Researchers from Lawrence Livermore and Lawrence Berkeley national laboratories, the Department of Energy Joint Genome Institute (DOE JGI) and the University of Chicago are chipping away incrementally at parsing this complex volume of data.

In a study presented in an advance online publication of the journal Genome Research, the scientists describe a novel computational approach to translating DNA sequence data into functional “signatures” corresponding to specific tissues of the body.

“Starting from a single cell, to the development of the entire organism, the three-billion-letter human genetic code carries a wealth of information,” said Ivan Ovcharenko, a bioinformatics scientist in Lawrence Livermore’s Computation Directorate and senior author of the paper. “However, only about two percent of the genome is required to encode the whole spectrum of proteins. So how can we cut through the rest of so-called ‘junk’ DNA to identify regulatory elements concealed in the vast landscape of the human genome and characterize their function?”

Every cell in every tissue has the same uniform code that dictates and precisely regulates how all of the genome’s 30,000-some genes are instructed to perform the myriad functions of the organism. Much of this orchestration, however, happens outside the boundaries of genes, in what is called “noncoding” DNA. Once dismissed as “junk,” these regions are now known to be home to critical regulatory elements such as enhancers, which interact with genes to increase their expression. But tracking down these enhancers can be difficult.

“These non-functional stretches present one of the major challenges of the post-genome sequencing era,” said Ovcharenko, “because enhancers can be found all over the place – inside the genes they regulate or barricaded before or after the genes they control.” What makes it additionally puzzling, he said, is that in some instances enhancers can be located millions of nucleotide bases (units of DNA) away from the genes they regulate.

“Evolutionary studies based on comparing DNA from different species – humans, mice and other vertebrates – provide us with some clues on how to identify gene regulatory elements, but how to understand the function of gene regulatory elements based purely on their sequence has remained an open question,” he said.

So Ovcharenko and his colleagues – Len Pennacchio of DOE JGI, Gabriela Loots of Livermore’s Biosciences and Biotechnology Division and Marcelo Nóbrega of the Genomics Division at Lawrence Berkeley – directed the computational resources of the national laboratory system to analyze a massive experimental gene expression, or microarray, dataset generated by the Genomics Institute of the Novartis Research Foundation in San Diego. By conducting sequence-pattern searches, they were able to decipher the code of groups of tissue-specific regulatory elements hidden in the noncoding part of the human genome and to identify signatures associated with specific tissues.

“We devised a form of genetic encryption for elements regulating the expression of human genes and developed a method to detect these elements de novo directly from the genomic sequence,” said Ovcharenko. By merging three separate analytic factors, they were able to assess a score, called “Enhancer Identification,” to determine a level of confidence that a particular signature is implicated in a particular location and gene expression event.

“For the first time,” said Ovcharenko, “we’ve shown that it is possible to identify signatures of tissue-specific gene regulatory elements located at large distances away from genes they regulate. This is a first, but very promising step on our way towards deciphering the gene regulatory landscape of the human and other complex genomes.”

The team generated functional signatures for almost 80 different human tissues, including heart, liver, and brain.

“You can think of these tissue-specific gene regulatory elements as a treasure map for uncovering the link between gene human regulatory networks, signaling pathways, and mutations perturbing protein expression, which could lead to disease,” said Ovcharenko.

“While work remains to further refine the power of such methodologies, identifying noncoding sequences with high confidence predictions of where they will express in the human body is likely to accelerate various fields forward,” said lead author Len Pennacchio. “For instance, the human genetics research community is excited by the prospect of generating genome sequence of individuals in a clinical setting to see what changes might explain a particular condition, such as heart disease. Currently, most of the efforts are focused on coding sequence, the familiar, well-characterized features.

“Gene regulatory elements are as likely to harbor mutations that play a role in human disease as the genes themselves, but we’ve been stymied in our efforts to locate them,” he said. “This method opens up the dark recesses of the genome and highlights functional aspects in noncoding DNA. This method may ultimately have relevance for mutation screening, helping to parse the more cryptic elements of the human genetic code and to reveal their role in disease.”

Contact: Gabriela Loots, 925-423-0923,

Bang Wong/ClearScience

In a study presented in an advance online publication of the journal Genome Research, the scientists describe a novel computational approach to translating DNA sequence data into functional "signatures" corresponding to specific tissues of the body.