Nielsen's Lab:

What is the difference between a human and a chimpanzee? What plants comprised the ancient flora of Greenland? When and where did the wooly mammoth roam? And what do these questions have to do with each another? To start with, they can all be answered using slender strands of DNA. And using computational methods devised by IB professor Rasmus Nielsen.

"I'm interested how you can use statistics and computational tools to elucidate evolution," says Nielsen. In evolutionary biology, "you want to know about what happened in the past. In some sense, you can formulate this as a classical statistical problem of inference"--you start with some data, and you draw conclusions from those data. "I wanted to learn about evolution," says Nielsen, "but I realized many evolutionary problems are really fundamentally statistical problems."

Nielsen develops computational methods for comparing DNA sequences, and applies these methods to a variety of genomic research projects.

"In genomics there's tons of data right now," says Nielsen, because DNA sequencing has recently become very inexpensive. Says Nielsen, "you can re-sequence a human genome for a few thousand dollars these days. The question is: what to do with that data... What we can learn about evolution from data like that?"

Nielsen recently worked on a project comparing human and chimpanzee genomes, to see which parts of the genomes have evolved since humans and chimps diverged, about 5 million years ago. He and his collaborators identified the genes that have undergone selection. These genes have evolved "fast, very fast, since the divergence of humans and chimpanzees," and have much higher mutation rates than what we would expect.

Several types of genes showed evidence of selection. Genes involved in morphological changes, such as hair structure, have evolved since humans and chimps diverged. Immune-defense-related genes have evolved--this is not surprising, since immune systems and pathogens are in an ongoing evolutionary arms race. Nielsen found that genes related to sensory perception, including olfaction, have also evolved; these genes are very important to organisms' fitness because they concern the ability to locate changing food sources. Some of the results were "a bit surprising," says Nielsen; "We didn't find that the genes that are active in the brain evolved very fast. They are in fact very conserved... The changes in our cognitive abilities probably have to do with the regulation of the genes, rather than the proteins themselves."

Nielsen's current project looks at recent evolution in humans--"recent" meaning within the last 10,000 years. So far, the entire genomes of three humans have been sequenced. There are plans to sequence a thousand more, but for now, a sample size of three can tell us quite a bit: "The fantastic and remarkable thing is that we can actually say a lot about [human evolution], just from these few individuals," says Nielsen. This is because "you have the whole genomes," and that is a whole lot of data.

Of the three human genomes, two are of European descent and one is of Asian descent--and they can tell us a lot about the history of human migration. Many people believe that Asia and Europe have not shared gene pool for 65,000 years, since the last wave of people left Africa. By comparing the three genomes and looking at the distribution and lengths of the sequences that are identical, Nielsen and his collaborators will be able to tell whether there has been human migration, via trade routes or across the grasslands of northern Asia, over the past 5,000 years.

This branch of Nielsen's research uses DNA from the present to elucidate the evolutionary past. But Nielsen also analyses ancient DNA--from ice cores drilled by his collaborators nearly 3 kilometers below the surface of Greenland's ice sheet. The silty ice at the bottom of the ice cores contains organic material, which has been ground up and compressed over hundreds of thousands of years. There are no recognizable cells or individual pollen grains, but there are fragments of DNA, which can reveal which species lived in Greenland, over a million years ago.

"It's like a time machine," says Nielsen. "You can go back and look at what the environment was, without ever having any fossils, just from having the DNA." Ancient Greenland, it turns out, was once a northern Boreal forest, home to alder, spruce, pine and yew trees, quite different from the vast ice sheet that covers Greenland today.

Nielsen is also examining ancient mammoth DNA, preserved in the permafrost soils of Siberia. He and his collaborators reconstructed the distribution of mammoths using DNA in soil samples, and determined when mammoths went extinct in different locations throughout their range. But determining whether soil samples contain mammoth DNA is complicated. Says Nielsen, "Given all the things that we know can happen to DNA" during the sequencing process, such as contamination with PCR fragments drifting from the elephant lab down the hall, for instance, and "given all the other animals that have been [where the sample was collected], how certain are we that it's mammoth DNA? That's where I come in." Nielsen uses computational statistics to determine just how certain we are that a DNA ample is actually mammoth DNA. His methods are an alternative to a widely used technique called BLAST--Basic Local Alignment Search Tool.

BLAST takes an unidentified DNA sequence, say a possible mammoth sample, and compares it to thousands and thousands of sequences in GenBank, a giant public database of DNA sequences. It figures out which sequences are similar. If the closest match came from a mammoth, then the sample in question is likely from a mammoth too. But BLAST can make mistakes.

To look for mammoth DNA, scientists cut out a particular section of DNA using mammoth-specific primers--they are like tiny scissors that search for a section of DNA that looks like mammoth. The primers snip on either side of this section, so by necessity the beginning and end of this DNA section looks like mammoth. Just by chance, the middle of the section might look like mammoth too--leading to a BLAST misidentification.

Nielsen and his collaborators have developed a new method that uses Bayesian statistics to build a phylogenetic tree, using the putative mammoth sample DNA sequence and the sequences in GenBank. It calculates the probability that the sample sequence belongs in the mammoth clade, taking into account uncertainties in the phylogeny, the model of evolution, and any missing data. This method quantifies the statistical uncertainty that the DNA sample is actually from a mammoth.

As DNA sequencing continues to get cheaper and easier, it will soon be possible to determine an organism's identity based solely on its DNA, using a technique called DNA barcoding. Nielsen's computational work will enable scientists to determine the statistical probability that their sample is what they think it is.

Says Nielsen, "When the math and the biology really meet each other, you feel that you do something that is truly new, because you can pull in new mathematical, computational tools to attack a biological problem. That is really the most exciting moment in my research: to find the interface between the mathematical and the biological."


Nielsen will teach a course on evolutionary genetics and population genetics in IB. He will also teach a course for freshmen on genetic and evolutionary thinking. It will examine how genetics and evolutionary theory have affected society, and include discussions of nature versus nurture, eugenics, and whether there is a genetic basis for the concept of race.

Other Links:

Graduate Group in Computational and Genomic Biology:

header Link to  IB Home Link to College of Letters and Sciences UC Berkeley Home page