For decades, researchers have used experimental techniques such as X-ray crystallography or cryo-EMscopy (cryo-EMscopy) to decrypt the 3D structure of proteins. However, this method may take months or years and is not always effective. Of the more than 200 million proteins found in life forms, only about 170,000 structures have been resolved.
In the 1960s, researchers realized that if they could calculate all individual interactions within a protein sequence, they could predict its 3D shape. Each protein has hundreds of amino acids, and each pair of amino acids can interact in many ways, but the number of possible structures for each sequence is astronomical. Computing scientists jumped on this question, but progress was slow.
In 1994, Moult and his colleagues launched the CASP, which is held every two years. The entrants obtained amino acid sequences of about 100 kinds of proteins, the structure of which is unknown. Some groups calculate the structure of each sequence, while others determine the structure through experiments. Then, the organizer compares the calculated prediction results with the laboratory results and provides a Global Distance Test (GDT) score for the prediction results. Murt said that a score of 90 or more from zero to 100 is comparable to experimental methods.
Even in 1994, the predicted structure of a small and simple protein could match the experimental results. But for larger, challenging proteins, the calculated GDT score is about 20, which is "completely a disaster", according to the CASP judge and evolutionary biologist André of the Max Planck Institute for Developmental Biology. Lupas said. By 2016, the competition group’s hardest protein had reached about 40 points, mainly based on insights drawn from known protein structures closely related to CASP goals.
When DeepMind first participated in the competition in 2018, its algorithm AlphaFold was based on this comparison strategy. But AlphaFold also incorporates a computational method called deep learning, in which the software is trained on a huge data warehouse (in this case, sequences, structures, and known proteins) and learns to discover patterns. DeepMind won easily, beating its competitors by an average of 15% on each structure, and the most difficult target won the GDT with a maximum of about 60 points.
However, John Jumper, head of AlphaFold at DeepMind, said that these predictions are still too rough to be useful. “We know how far we are from biological relevance.” In order to do better, Jumper and his colleagues combined deep learning with the “tension algorithm”, which simulates the way people assemble puzzles: first, small Connect the pieces together (in this case, amino acid clusters), and then look for ways to merge the pieces into a larger whole. They worked on a moderately 128-processor computer network and trained algorithms on all 170,000 known protein structures.
And it works. In this year’s CASP, for various target proteins, AlphaFold’s median GDT score was 92.4. For the most challenging protein, AlphaFold has a median of 87, which is 25 points higher than the next best prediction. It is even good at resolving the protein structure wedged in the cell membrane, which is the core of many human diseases, but it is well known that it is difficult to solve with X-ray crystallography. Venki Ramakrishnan, a structural biologist at the Molecular Biology Laboratory of the Medical Research Council, called the results “a surprising improvement in protein folding.”
Murt said that all teams in this year’s competition have improved. But Lupas said that with AlphaFold, “the game has changed.” Organizers even worried that DeepMind might have been cheating in some way. Therefore, Lupas posed a special challenge: a membrane protein from the archaeal species of the archaea group. For ten years, his research team tried all the techniques in the book to obtain the X-ray crystal structure of proteins. “We can’t solve it.”
But AlphaFold has no trouble. It returned a detailed image of a three-part protein with two long spiral arms in the middle. This model enables Lupas and his colleagues to understand their X-ray data. Within half an hour, they made the experimental results fit AlphaFold’s predicted structure. Lupas said: “It’s almost perfect.” “They can’t cheat on it. I don’t know how they did it.”
As a condition of entering CASP, like all groups, DeepMind agreed to disclose sufficient detailed information about its methods to other groups. This will bring good news to experimenters, who will be able to use accurate structural predictions to understand opaque X-ray and cryo-EM data. Murt said it also allows drug designers to quickly determine the structure of each protein in new and dangerous pathogens (such as SARS-CoV-2), which is a key step in finding molecules to stop them.
However, AlphaFold does not do all the work well. During the game, it clearly wobbled on a protein that is composed of 52 small repeats that distort each other’s position when assembled. Jumper said that the team now hopes to train AlphaFold to solve this structure and the structure of protein complexes that can perform key cell functions together.
Even if one major challenge is over, others will undoubtedly arise. Thornton said: “This is not the end of things.” “This is the beginning of many new things.”