This week, several deep learning methods have led to remarkable advances at the global event for the evaluation of protein structure prediction methods: the 13th CASP experiment. Torsten Schwede and Matteo Dal Peraro, two SIB Group Leaders involved on the organizational and evaluation side of CASP13, tell us more.

With countless conformational possibilities, even for small protein sequences of a few hundred amino acids, predicting the atomic 3D structure of a protein is one of the most daunting computational challenges.

The problem becomes much simpler when scientists can rely on evolutionary related proteins whose structure is already known – an approach known as homology-modelling and used, for instance, in the SIB Resource SWISS-MODEL. But in cases where no such proteins are detectable, the problem becomes much harder. Sophisticated de novo prediction methods are then needed.

1. CASP: a global community experiment to benchmark protein structure prediction techniques

Every second year since 1994, computational structural biologists from around the globe enter the Critical Assessment of protein Structure Prediction (CASP) experiment. Over several months – culminating with a conference – teams attempt to predict the 3D structure of unknown proteins on the basis of a sequence of amino acids sent by CASP’s organizers. Independent assessors then compare the anonymized predictions with the experimentally determined “gold standard” (resolved prior to the contest but not made public). With these blind prediction experiments, CASP aims at objectively establishing the current state-of-the-art in protein structure prediction, identifying what progress has been made over the last 2 years, and highlighting where future effort may be most productively focused.

Every two years, the global structure modelling community meets at CASP (see 1) to compare the current state-of-the-art techniques and to discuss latest developments. At the recent CASP13 meeting, significant progress has been observed. In particular, deep learning methods (see 2) have had a significant impact.

 

A leap forward for deep learning approaches at CASP13

Nearly a hundred teams took part in the CASP protein-folding challenge this year, which took place on 1-4 December in Mexico. “We had a very exciting CASP13 meeting with a lot of new developments”, says SIB Group Leader Torsten Schwede (University of Basel), co-organizer of CASP experiment, “From continued progress arising from contact and distance prediction methods, the steep rise of deep learning methods applied to various tasks in structure prediction, to Google DeepMind entering the arena of protein folding. In the field of free modelling in particular, we have seen complex proteins successfully predicted by these methods.”

2. What is deep learning?

Deep learning is a type of machine learning, which can be applied to a wide array of fields including face, speech or audio recognition, drug design or medical image analysis. A computer algorithm is trained to recognize certain patterns or features – here structural properties of a protein from its amino-acid sequence – based on previous examples – in our case experimentally determined structures. Recent developments include for instance generative adversarial networks (GANs), which can learn to mimic any distribution of data to create realistic representations of the object of interest, such as the conformations of local protein structure.

“This year’s CASP contest has seen a flourishing of deep learning methods, leading to a leap forward for protein structure prediction”, indicates SIB Group Leader Matteo Dal Peraro (EPFL), one of the assessors in charge of evaluating the teams’ submissions (Figure 1).

 Figure 1 - Progression

Figure 1. Progress in CASP’s best models predictions over the years. Machine learning techniques combined with coevolution-based contact prediction methods increasingly contributing to progress in recent years. (GDT_TS is the Global Distance Test Total Score, an established CASP metric to evaluate prediction performance; analysis by Luciano Abriata).

“DeepMind (A7D) topped the CASP13 ranking with its AlphaFold program, followed closely by other groups that also used deep learning methods” (Figure 2). “In particular, the Zhang and Multicom teams both heavily use machine learning – and deep learning in particular – in their software, a trend which started several years ago and now became impressively successful.”

Ranking

Figure 2. Official CASP13 ranking. The ranking for each team is based on the composite score derived from two official CASP metrics: the Global Distance Test Total Score (GDT_TS) that measures protein structure similarity, and Quality Control Score (QCS), a metric that better weights the global topology of secondary structure elements and correlates well with scoring based on manual inspection of predicted models (see complete assessment by Luciano Abriata and Matteo Dal Peraro)

 

The future of protein structure prediction

Is the protein folding problem about to be solved? “Many challenges are still ahead”, tempers Dal Peraro. “Especially for de novo modelling, where one does not rely on similar sequences to infer the new structure. The progress has been sensational, with the fold of many difficult targets predicted at near-atomic resolution (Figure 3), but on the other hand, we see that none of the teams, including AlphaFold, were able to predict all the targets or resolve with atomic accuracy the full protein structure including sidechains.”

 Predictions

Figure 3. Best near-atomic structure predictions at CASP13. For 12 of the 49 very difficult targets of this round, predictors were able to calculate backbone folding topology (in grey) at near atomic resolution (i.e. at <3 Å root-mean-square deviation from the experimental structure in rainbow colors).

“Nobody cracked the protein folding problem yet” he continues, “but scientists found new ways to get closer to solving it by using deep learning. While this is a major step in the discovery process, we need to increase the accuracy of these models to be useful for biomedical applications, for example by model refinement of sidechains.”

And do we have to wait now until the next CASP experiment before we know if the success story continues? “Not really” says Torsten Schwede, “at SIB, we are providing a service called CAMEO – continuous automated model evaluation – which performs a ‘mini CASP experiment’ every week for automated server methods. This allows method developers to continuously test and benchmark their latest software developments. Several of the successful groups used CAMEO to get ready for CASP – so it is a bit like looking into the crystal ball to see the future coming.”

And for now, deep learning methods seem to shape the future of structure modelling.