A preprint that is attracting the community’s attention

“Game-changing…” “Truly mind-blowing…” “A fascinating investigation...” These are just a few of the comments following the recent publication of the findings presented during van Nimwegen’s talk in a preprint on the BioRxiv server. Read and join the discussion at https://www.biorxiv.org/content/10.1101/601914v1
What is a preprint? Typically, a preprint is a pre-release of a paper made available to the public before its submission to a scientific journal and formal peer-review. Among other aspects, preprints allow researchers to make their findings accessible quickly and easily to the scientific community and to receive input from their peers (read more).

Phylogenetic trees built from the genomes of strains of a bacterial species do not mean what most people think they do. This is what Erik van Nimwegen, SIB Group Leader and Associate Professor in Computational Systems Biology at the Biozentrum (University of Basel), sets out to demonstrate in his intriguing and thought-provoking talk (see below) on how he understands phylogenetic trees with regards to prokaryotic genomes. “Prokaryotic genome evolution is the topic that got me into biology," says van Nimwegen who began his scientific career as a theoretical physicist. “With time, however, I became uneasy with the lack of connection between the theory of evolution and the experimental data.”

Van Nimwegen begins by discussing how, even after almost a century of developing evolutionary theories, it is still extremely rare that any of these theories actually make quantitative predictions about evolution in the wild. He quotes the American theoretical physicist Richard Feynman on the necessity of defining measurable quantities and then looking for laws that connect them. “In evolution,” says van Nimwegen “there are hardly any empirical laws regarding observable quantities in evolving populations in nature.”


In bacteria, genetic recombination occurs when DNA transfers from one organism to another: also called horizontal gene transfer (HGT). The resulting bacterial recombinants carry both genes inherited from their parent cells as well as genes introduced into their genomes by HGT.

Current phylogenetic trees are built using algorithms which – from the outset – assume that the evolution of a species’ genome stems from an ancestral cell that has divided unceasingly, collecting on its way mutation upon mutation. “But this assumption is not accurate,” explains van Nimwegen. “The evolution of a species is not only due to mutations that are passed down generations, but also to DNA that is picked up from the environment, by way of viruses carrying DNA from other bacteria for example.” A process known as horizontal gene transfer, or HGT (BOX 1). “In fact,” continues van Nimwegen, “many current phylogenetic approaches treat HGT as a mere perturbation to a backbone of clonal evolution.”

To prove their case, van Nimwegen and his team studied strains of E.coli collected on the shores of Lake Minnesota over the course of a month and on the same spot. “We wanted to find out how E.coli really evolves in the wild by taking into account HGT,” explains van Nimwegen. To begin with, they cut their core genomes – i.e. the parts of the genomes that are shared among all species of E.coli – into blocks, and inferred a phylogeny for each block; each of which turned out to be significantly different. “This suggests that there is no clear consensus phylogeny,” explains van Nimwegen “and that a ‘classical’ clonal phylogeny cannot be built.” However, when the team considered not only one block but – say – a large collection of up to 50% of the blocks, then the trees began to converge. Would this represent clonal phylogeny then?


SNPs, or single-nucleotide polymorphisms, are variations that occur at the level of one single nucleotide in a DNA sequence and at a specific position in a genome. Each SNP provides a piece of information about the phylogeny for that position in the genome.

Van Nimwegen and his team set out to find an explanation by using their Minnesota E.coli strains and using single nucleotide polymorphisms, or SNPs (BOX 2), to study systematically the phylogenetic structures evident in the data. Their results? “Although one can of course construct a background phylogeny from the entire core genome,” explains van Nimwegen, “we found that almost 75% of the SNPs are not compatible with it.” He goes on to demonstrate that, for any pair of species whose genomes diverge by more than 0.1% from one another, the fraction of DNA that is inherited from their ancestral cell drops swiftly and actually disappears completely around 1% divergence.

So? Van Nimwegen and his team carried out the same tests on other bacterial species and got the same results: for most pairs of strains, none of their current core DNA stems from their clonal ancestor. “Our new methods show unambiguously that recombination is what drives genome evolution in these species,” he states. “It introduces genetic differences at a rate ten times higher than mutations.” Moreover, it is generally not possible to reconstruct the clonal phylogeny of E.coli strains from their DNA sequences. What, then, do phylogenetic trees represent? “We don’t have a complete answer to that yet,” says van Nimwegen. “In my understanding, phylogenetics doesn’t illustrate the ancestry of a gene but rather reflects the rates at which different lineages have recombined with each other.”

Erik van Nimwegen is currently Associate Professor in Computational Systems Biology at the Biozentrum of the University of Basel. Following an undergrad in theoretical physics at the University of Amsterdam (NL), he obtained his PhD in 1999 at the Santa Fe Institute (USA) and at the Theoretical Biology/Bioinformatics Dept. at the University of Utrecht (NL). He went on to complete a one year postdoctoral research at the Santa Fe Institute before becoming a fellow at the Center for Studies in Physics and Biology at the Rockefeller University, NY (USA). In 2003, he was appointed assistant professor at the Biozentrum of the University of Basel and became group leader at SIB in 2004. Since 2008, he has been an Associate Professor and leading SIB’s Genome Systems Biology Group.