DeepMind plans to release millions of protein structures for free
AI research lab DeepMind has used artificial intelligence to create the most comprehensive map of human proteins ever. The company, a subsidiary of Google-parent Alphabet, is releasing the data for free, with some scientists comparing the potential impact of the work to the Human Genome Project, an international effort to map every human gene.
Proteins are long, complex molecules that perform many functions in the body, from building tissues to fighting disease. Their purpose is dictated by their structure, which, like origami, turns into complex and irregular shapes. Understanding how a protein folds helps explain its function, which in turn helps scientists with many tasks – from conducting fundamental research on how the body works to designing new drugs and therapies.
Previously, the determination of the structure of a protein relied on expensive and time-consuming experiments. But last year DeepMind showed it could accurately predict the structure of proteins using AI software called AlphaFold. Now, the company is releasing the hundreds of thousands of predictions made by the program to the public.
“I see this as the culmination of DeepMind’s full 10-year life span,” said company CEO and co-founder Demis Hassabis. ledge. “From the beginning, this is what we set out to do: to make breakthroughs in AI, test it on games like Go and Atari, [and] Apply it to real-world problems to see if we can accelerate scientific breakthroughs and use them to benefit humanity.”
There are currently approximately 180,000 protein structures available in the public domain, each constructed by experimental methods and accessible through the Protein Data Bank. DeepMind is issuing predictions for the structure of approximately 350,000 proteins in 20 different organisms, including animals such as mice and fruit flies, and the like, bacteria. e coli. (There is some overlap between DeepMind’s data and pre-existing protein structures, but exactly how much is difficult to quantify due to the nature of the model.) Most importantly, the release includes predictions for 98 percent of all human proteins. , approximately 20,000 distinct structures, known collectively as the human proteome. it is not First public dataset of human proteins, but it is the most comprehensive and accurate.
If they want, scientists can download the entire human proteome for themselves, says John Jumper, AlphaFold’s technical chief. “A HumanProteome.zip is effectively, I think, about 50 gigabytes in size,” Jumper tells WebMD. ledge. “You can put it on a flash drive if you want, though it won’t do you much good without a computer for analysis!”
After launching this first tranche of data, DeepMind plans to add to the protein’s store, which will be maintained by Europe’s leading life science laboratory, the European Molecular Biology Laboratory (EMBL). By the end of the year, DeepMind expects to release predictions for 100 million protein structures, a dataset that will be “transformative for our understanding of how life works,” according to EMBL Director General Edith Hurd.
The data will be free forever, for both scientific and commercial researchers, Hasabis says. “Anyone can use it for anything,” the DeepMind CEO said in a press briefing. “They just need to give credit to the people involved in the citation.”
Benefits of protein folding
Understanding the structure of proteins is useful to scientists in many fields. The information could help design new drugs, synthesize novel enzymes that break down waste materials, and create crops that are resistant to viruses or extreme weather. Already, DeepMind’s protein predictions are being used for medical research, including study SARS-CoV-2 . functioning of, the virus that causes COVID-19.
New data will accelerate these efforts, but scientists note that it will still take a long time to convert this information into real-world results. Professor Marcelo C. Sousa of the University of Colorado’s Department of Biochemistry, “I don’t think it’s going to be something that changes the way patients are treated within the year, but it’s certainly going to be huge for the scientific community.” will make an impact.” , Told ledge.
DeepMind senior research scientist Katherine Tunyasuvunakul says scientists need to get used to having this kind of information at their fingertips. “As a biologist, I can confirm that we don’t have a playbook for looking at 20,000 structures, so this [amount of data] Extremely unpredictable,” said Tunyasuvunakul The Nerdshala. “Analyzing hundreds of thousands of structures – this is madness. “
Notably, however, DeepMind’s software produces predictions of protein structures rather than an experimentally determined model, which means that in some cases more work will be needed to verify the structure. DeepMind says it spent a lot of time creating accuracy metrics in its AlphaFold software, which ranks how confident it is for each prediction.
However, predictions of protein structures are still extremely useful. Determining the structure of proteins through experimental methods is costly, time-consuming, and relies on a lot of trial and error. This means that even low-confidence predictions can save scientists years of work by pointing them in the right direction for research.
Helen Walden, Professor of Structural Biology at the University of Glasgow explains ledge That DeepMind’s data would “significantly ease” research barriers, but that the “painful, resource-draining task of conducting biochemistry and biological evaluations of, for example, pharmaceutical work” would remain.
Sousa, who has previously used AlphaFold’s data in his work, says the impact will be felt immediately for scientists. “In the collaboration we had with DeepMind, we had a dataset containing a protein sample that we had for 10 years, and we never got to the point of developing a model that fit.” “DeepMind agreed to provide us with a structure, and we were able to solve the problem in 15 minutes after sitting on it for 10 years.”
Why is protein folding so difficult?
Proteins are built from chains of amino acids, which come in 20 different varieties in the human body. Since any individual protein can consist of hundreds of different amino acids, each of which can twist and twist in different directions, this means that the final structure of a molecule has an incredibly large number of possible configurations. Huh. One assessment is that the typical protein can be folded in 10^300 ways – that is 1 followed by 300 zeros.
Because proteins are too small to be examined under a microscope, scientists have to determine their structure indirectly, using expensive and complex methods such as nuclear magnetic resonance and X-ray crystallography. The idea of determining the structure of a protein simply by reading a list of its constituent amino acids has long been theorized but difficult to achieve, leading many to describe it as a “grand challenge” of biology.
In recent years, however, computational methods – particularly those using artificial intelligence – have suggested that such analysis is possible. With these techniques, AI systems are trained on datasets of known protein structures and use this information to make their predictions.
Several groups have been working on this problem for years, but DeepMind’s access to a deep bench of AI talent and computing resources allowed it to accelerate progress dramatically. Last year, the company took part in an international protein-folding competition known as CASP and blew the competition away. its consequences were so accurate One of the co-founders of CASP, computational biologist John Moult, stated that “in some sense the problem [of protein folding] has been resolved.”
DeepMind’s AlphaFold program has been upgraded since last year’s CASP competition and is now 16x faster. “We can fold an average protein in a matter of minutes, seconds in most cases,” says Hasabis. company too Built-in code released As open-source for AlphaFold last week, allowing others to build on their work in the future.
Professor Liam McGuffin from the University of Reading, who developed some of the UK’s leading protein-folding software, praised AlphaFold’s technical genius, but also noted that the program’s success depends on decades of prior research and public data. “DeepMind has vast resources to keep this database up to date and is in a better position to do so than any single academic group,” McGuffin said. ledge. “I think academia will eventually get there, but it will be slow because we don’t have as many resources.”
Why does DeepMind care?
many scientists ledge Noted DeepMind’s generosity in releasing this data for free. After all, the lab is owned by Google-parent Alphabet, which is pouring vast amounts of resources into commercial health projects. Deepmind itself loses a lot of money every year, and there have been very reports of tension Between the company and its parent firm on issues such as research autonomy and commercial viability.
Hussbees, however, tells ledge That the Company has always planned to make this information freely available, and to do so is a fulfillment of the founding ethos of DeepMind. He stressed that DeepMind’s work is used in many places at Google – “almost anything you use, we have some technology that’s part of it under the hood” – but the company’s primary goal has always been fundamental research. Used to be.
“The agreement when we made the acquisition is that we are here primarily to advance state-of-the-art AGI and AI technologies and then use that to accelerate scientific breakthroughs,” says Hasabis. “[Alphabet] There are too many departments to focus on making money,” he said, noting that DeepMind’s focus on research “brings all kinds of benefits in terms of reputation and goodwill to the scientific community.” There are many ways in which value can be derived.”
Hasabis predicts that AlphaFold is a sign of things to come – a project that shows the enormous potential of artificial intelligence to handle dirty problems like human biology.
“I think we’re in a really exciting moment,” he says. “Over the next decade, we, and others in the AI field, look forward to producing amazing breakthroughs that will really accelerate the solutions to the big problems we have here on Earth.”