Trees for describing relationships between organisms
Scientists use sequences, which are stretches of DNA taken from the organism’s genome, to determine the genealogy connecting all living species from bacteria to plants and animals. Such trees of life – also known as phylogenetic trees – show when and how life evolved from the simplest early lifeforms to increasingly complex organisms.

A new software to account for a previously disregarded phenomenon

Although phylogenetic trees (see box) are widely used in many fields of research, including epidemiology, crop development and drug discovery, there is still a lot of discussion about the best way to estimate the relationships between organisms.
The group led by Nicolas Salamin and the Bioinformatics Core Facility Group have developed a new software that revisits the tree of life as we know it.

“In order to build phylogenetic trees, we generally assume that the units building the genetic data, such as the nucleotides in our DNA, evolve independently of one another. However, this is often not the case,” explains Nicolas Salamin. “In many instances,” continues Linda Dib, co-author of the study and Senior Scientist at SIB’s Bioinformatics Core Facility Group, “the evolution of one nucleotide must be followed by a complementary change in another nucleotide to maintain a functional organism: this process is called molecular coevolution. In other words, in order to run faster you may need to change your shoes to roller skates, but the ‘evolution’ of the left shoe into a skate will not be functional unless it is associated with a similar change in the right shoe.”

The new method (CoevRJ) developed by the team reveals that assuming the independence of nucleotides leads to estimating incorrect genealogies. It also provides a way to correct this by accounting for coevolution, thereby providing improved estimates of phylogenetic relationships as compared to other state-of-the-art software.

Revisiting the tree of life as we know it

The researchers applied their method to an extensive dataset of 146 organisms spanning the three domains of life, and retrieved relationships from major groups of bacteria that differed from previous knowledge. They also uncovered a significantly younger time of origin for plants, animals and other groups than previously thought.

“The complexity of this problem [of estimating the tree of life] is one of the reasons molecular coevolution has so far been ignored in phylogenetic inference,” says SIB’s Xavier Meyer, who led the research and has since joined the University of Berkeley, USA. “We implemented extremely sophisticated methods as well as very powerful computing technologies to achieve our goal of estimating phylogenetic relationships and coevolution at the same time. Our software will allow researchers to revisit their own data more easily.”

CoevRJ Main news pageOn the left, the tree of life spanning all three domains (archaea, bacteria and eukarya), obtained with the new method described in the paper. On the right, the same tree of life using current state-of-the-art inference methods. Dotted lines highlight the changes in the respective position of each taxa with the new method.

From the tree of life to virus transmission and protein structure

“Estimating the tree of life remains a very difficult task,” explains SIB’s Daniele Silvestro, now a Senior Scientist at the University of Gothenburg and co-author of the study, “and this is expected, considering that we are analyzing the results of more than 2.5 billion years of evolution. However, we think our method will improve our understanding of such a long evolutionary history, as well as much shorter ones.” Indeed, the method proposed here could be applied to a wide range of data, such as the evolution of HIV or influenza in the context of vaccine development and epidemiology, and can also help to better understand the function and structure of proteins.

Meyer X, Dib L, Silvestro D, Salamin N. Simultaneous Bayesian inference of phylogeny and molecular coevolution. PNAS