On 9 October 2024, in a high-profile vote of confidence for the promise of using artificial intelligence (AI) in scientific discovery, the Royal Swedish Academy of Sciences awarded Demis Hassabis (co-founder and chief executive officer) and John M. Jumper (director) of Google DeepMind (London, UK) the 2024 Nobel Prize in Chemistry for their pioneering work in developing the AI-powered protein structure prediction model AlphaFold2 (AF2)
[1]. Also sharing the prize was David Baker (half to Hassabis and Jumper; half to Baker), professor of biochemistry at the University of Washington (Seattle, WA, USA), for his work on computational protein design that started with the mid-1990s development of Rosetta, a since-evolving suite of software tools that model protein structures using physical principles
[2]—and now also AI
[3].
With unprecedented swiftness following its open-access release in July 2021
[4], AF2 has dramatically advanced the field of structural biology and opened new doors in pharmacology. “There was no doubt in anyone’s mind that AlphaFold was going to win the Nobel Prize at some point,” said Stephanie Wankowicz, assistant professor of molecular physiology and biophysics at Vanderbilt University (Nashville, TN, USA). “It has been such a game changer for trying to understand the structures of proteins.”
Additionally, earlier in 2024, on May 8, DeepMind released AlphaFold3 (AF3), accompanied by a report describing this newest iteration published in
Nature [5]. The model includes the ability to predict the structures of proteins during interactions with non-protein molecules like DNA and RNA, key information for illuminating the specific roles of proteins in cells (
Fig. 1)
[5]. In addition to providing an improved understanding of cellular dynamics, AF3 could help scientists to design drugs that more effectively block or enhance the function of disease-related proteins
[6].
For decades before the debut of AF2, the only way to determine a protein’s folded shape was through experimental techniques including nuclear magnetic resonance imaging, X-ray crystallography, and cryo-electron microscopy. However, such methods are time-consuming and not easily accessed by most scientists because the equipment needed is so expensive. In addition, many proteins are not amenable to being studied with these tools.
By the mid-1990s, scientists had developed enough nascent computational methods for predicting protein structures that a biennial contest, Critical Assessment of Protein Structure Prediction (CASP), was created to evaluate and rank their effectiveness
[7]. Since the first CASP contest, in 1994, improvement in the models progressed at a slow but steady pace until 2018, when DeepMind entered and won the competition by a wide margin with its first version of AlphaFold
[8]. At the next CASP in 2020, DeepMind submitted AF2, a complete overhaul of its 2018 model, beating the competition even more convincingly
[7],
[9].
While the initial AlphaFold model used separate steps to predict protein structure, AF2 combined all the steps into a single workflow, implemented as a neural network, the performance of which could be improved by adding training data. DeepMind published a report describing AF2 in
Nature in July 2021
[4],
[9] and concurrently published the model’s entire source code and weights, freely accessible to all, on GitHub
[10].
AF2 quickly proved its value. Concomitantly with the July 2021
Nature report, DeepMind partnered with the European Molecular Biology Laboratory to launch the AlphaFold Protein Structure Database, which initially included protein structures from 21 model organisms
[11]. In January 2022, the protein structures from another 27 model organisms were added to the database
[9]. By mid-2022, the database contained predicted structures for more than 220 million proteins from about one million species
[12],
[13].
The decision to release the AF2 code as open access has also born fruit by allowing other researchers to easily build on it. One early addition, for example, allowed users to predict interactions between multiple proteins
[14], a capability that DeepMind included in an AF2 update that was announced in a 16 March 2023 post on the social media service X (previously Twitter).
“Once AlphaFold2 wowed everyone at the 2020 CASP competition, the excitement, the adoption, and the creativity it sparked among computational scientists and wet lab bench scientists across many different fields—genetics, structural biology, chemistry—was profound,” said Anthony Gitter, associate professor of biostatistics and medical informatics at the University of Wisconsin, Madison (Madison, WI, USA). “You saw computational scientists take that core piece of software and remix it and reuse it in ways that were not previously intended by its creators, which is the great thing about having scientific software openly available.”
Compared with AF2, AF3 advances very little in terms of protein folding prediction, according to David Jones, professor of bioinformatics at University College London (London, UK) and one of the co-authors on the first AlphaFold paper
[8]. The main difference in the models, he said, is how they generate structures. “The AlphaFold2 model uses a hand-coded method designed specifically for proteins, whereas AlphaFold3 produces structures using a diffusion process that allows proteins and other molecules to be modeled in the exact same way, treating the whole system as a cloud of atoms of different types,” Jones said. “Interestingly, because diffusion processes are hard to constrain in useful ways such as maintaining chirality or even ensuring that atoms do not overlap, it is possible that AlphaFold3 could be worse than AlphaFold2 for predicting some protein structures.”
While praise for AF2 has been widespread
[13], the outcry of disappointment from the scientific community around the release of AF3 was equally ubiquitous
[15]. In contrast to AF2, which DeepMind made freely available, without restrictions, AF3 was initially limited to non-commercial use through a DeepMind website server
[15]. Additionally, DeepMind decided not to publish the code for AF3 as it had for AF2. DeepMind’s AF3
Nature report did not offer a justification for this, but simply noted, “Code is not provided”—an omission that appears to violate
Nature’s policy about the openness of its scientific reports
[15]. “Based on the pretty sharp reaction from people working in the field, this decision seemed to catch a lot of us off guard,” Gitter said. “The predecessor was quite open.”
DeepMind also limited access to the AF3 server. Users are restricted to 20 predictions a day
[15] (it was initially 10), and they also face limitations on the molecules they can analyze
[6]. It is not possible to use the AF3 server to predict interactions between proteins and novel drugs, for example, possibly to avoid competition with the drug discovery efforts of the Hassabis-founded DeepMind spinoff company Isomorphic Labs (London, UK)
[15]. “It is obviously a business decision rather than a scientific one,” Jones said. “The functionality that is restricted on the AlphaFold3 web server is the exact functionality that you would need to model the binding of a drug molecule to a protein.”
An open letter dated 11 May 2024 quickly garnered more than 650 signatures within the first week following the release of the AF3
Nature report without the underlying code
[16]. The letter’s authors, including Wankowicz and Gitter, wrote, “While companies have the right to capitalize on their innovations, using the imprimatur of academic publications without the possibility of reproducing the results, far less building on them, subverts the enterprise”
[16].
Shortly after the open letter was posted, DeepMind researchers indicated that more information on AF3 would be forthcoming. Pushmeet Kohli, DeepMind’s vice president of research, announced in a 13 May 2024 post on X that DeepMind was “working on releasing the AF3 model” for academic use within six months. Kohli also confirmed that the company implemented the restrictions so as not to compromise Isomorphic Labs’ commercial drug discovery plans
[6],
[17].
Almost exactly six months later, on 11 November 2024, Kohli posted on X that DeepMind had released the AF3 code for download and non-commercial use
[18]. But what has been made available is not the same as with AF2. “AlphaFold3 is now far more accessible after the code release, though still not fully open source,” Gitter said. “The terms of use and licensing are more restrictive than we saw with AlphaFold2.” Adds Jones: “Even with this newly released code, those aiming to replicate the model's development will still be frustrated.”
Because of the way AF3 was released, Gitter expects it to have a weaker impact than AF2. “When AlphaFold2 came out, it became quite dominant in the scientific community. Now, I am seeing fragmentation,” he said. While some researchers will be fine running 20 jobs a day on DeepMind’s web server, “others are attempting to replicate AlphaFold3’s code as best as possible,” he said, with many research groups around the world working to develop open-source versions of AF3
[18]. Still others, he said, will switch to using alternative models like Baker’s RoseTTAFold All-Atom
[19].
Despite the more restrictive approach DeepMind has taken with AF3, Jones believes the company’s work has already left a profound legacy, one that has quickly merited a Nobel Prize. “AlphaFold has greatly increased the profile of computational biology and machine learning, particularly in the wider biological community,” Jones said. “Although AlphaFold has not actually removed the need to do experimental work in structural biology, it has at least raised the question about whether it might—that represents a huge change in the mindset of the field.”