In May 2024, scientists were excited but disappointed when Google DeepMind (USA) launched AlphaFold3 (AF3), the latest version of its artificial intelligence (AI) program for predicting protein folding [
[1],
[2],
[3]]. Although the model was more powerful than its predecessor AlphaFold2 (AF2), DeepMind, a London-based research division of Alphabet, Google’s parent company, did not publish the computer code for the new release [
1,
3,
4]. Unlike with AF2, researchers could not run AF3 themselves and could only perform up to 10 (now 20) limited protein structure predictions per day through a website [
3,
4]. DeepMind also withheld other vital details—the model weights, the values that determine what an AI system learns from its training data [
4,
5].
After criticism from researchers, DeepMind published the AF3 source code in November 2024 [
1,
3,
4]. But only non-commercial users can run the program, and academic credentials are necessary to obtain the model weights [
4].
What information developers should release about their AI systems and what limits they should place on use of the software are controversial [
6]. Advocates of open-source AI argue that full transparency and unrestricted use is a boon, allowing creative people to develop new products and applications by extending or adapting current AI models [
7]. “I am opposed to having this proprietary and closed-loop development, in which something as important as AI is in the hands of a few players,” said Chirag Shah, professor of information science at the University of Washington in Seattle, WA, USA. The US government has endorsed at least some degree of AI openness [
8], as has the European Union (EU), whose strict new AI regulations go easier on open-source systems [
9,
10].
At the other end of the spectrum are the experts who contend that open-source AI is dangerous because it can easily be misused [
11]. “There are some technologies we do not want to be open source,” said David E. Harris, a chancellor's public scholar at the University of California in Berkeley, CA, USA, and former research manager for responsible AI at the Silicon Valley tech giant Meta. Making the issue even more complicated is the absence of an agreed-upon definition of what constitutes open-source AI [
12].
Open-source software that anyone can use or modify has been around for decades [
13]. Prominent examples include the Linux operating system and the Mozilla Firefox web browser [
14]. Developers have built on such open-source programs to create other powerful software such as the Linux-based Android operating system for cell phones [
14]. Open-source AI, proponents contend, would spur innovation, speed development of new models, open additional uses for the technology, boost scientific research, and provide other benefits (
Fig. 1). By broadening access to models, open-source AI would, as
The Economist put it, help to ensure that its power “is not concentrated in the hands of a few Californian firms” [
15].
AI transparency has also gained government backing. A 2024 report by the US National Telecommunications and Information Administration advocates open-weight models, in which developers release the weights they used during training [
16]. The EU AI Act that went into force in 2024 goes further. For example, the rules require developers of general-purpose AI models, such as the famous chatbot ChatGPT, to spell out how they trained and tested their systems [
9,
17]. Makers of open systems are exempt from those requirements [
18]. However, the EU regulators punted on specifying the criteria for open-source AI. “We need good ways to judge whether models really deserve to be exempted,” said Mark Dingemanse, an associate professor of language and communication at Radboud University, Nijmegen, the Netherlands.
Applying the open-source concept to AI is tricky because a model is not just software. “With AI, code is not the ‘meat,’” said David Widder, a postdoctoral researcher at Cornell University’s Cornell Tech in New York, NY, USA. A model’s output also depends on the data it learned from, the training weights that shape what inferences it draws, and other factors. Moreover, said Shah, “the term ‘open AI’ has lost its meaning in some respects.” For instance, OpenAI, the company headquartered in San Francisco, CA, USA, that produces ChatGPT, is no longer open with its technology. As a result, tech titan Elon Musk, one of OpenAI’s co-founders, has filed at least two lawsuits against the company for violating its original principles [
19].
Several companies and non-profits have released AI models that they describe as open. Meta’s Llama 3, which debuted in April 2024, is a large language model that is a rival to more restrictive systems like Google's Gemini and OpenAI's GPT-4o, for example [
20]. Other companies that have unveiled self-described open AI models include Mistral AI of Paris, France [
21], and Alibaba of Hangzhou, China [
22]. A non-corporate example is GPT-J, a large language model aimed at AI researchers that was launched by the non-profit EleutherAI (Washington, DC, USA) [
23]. But because the AI field lacks a consensus definition for the term, it is difficult to assess which, if any, of these systems is truly open-source or simply promoted as such [
24].
In August 2024, the influential Open Source Initiative (OSI), a non-profit located in West Hollywood, CA, USA, that sets the standards for open-source software, published a first draft of its definition for open-source AI [
25,
26]. To qualify, a model must be freely available for any purpose, modifiable for any purpose, and shareable for any purpose [
25]. Potential users must have full access to its source code and training weights [
25]. The OSI definition does not require the release of training datasets—the reasoning is that AI models can learn from confidential sources such as medical records [
26]. However, the model’s creator must provide enough information about the training data “so that a skilled person can build a substantially equivalent system” [
25].
Almost none of the so-called open AI models meets the OSI’s criteria. Meta’s Llama 3, for instance, fails the test because Meta does not reveal the model’s training data and requires some large potential users to apply for a license before running the model [
27,
28]. Not surprisingly, Meta takes issue with the OSI draft definition [
27].
Although OSI’s proposed criteria are a good start, they include a “crucial loophole,” said Dingemanse, as developers do not have to release their full training datasets. “This is an open invitation to play fast and loose with dataset sharing, and it makes it too easy to just claim something about your data’s availability,” he said. In a study published in June 2024, Dingemanse and his colleague Andreas Liesenfeld, an assistant professor of language and communication, also at Radboud University, evaluated 40 text-generating AI models, including ChatGPT and a version of Llama 3, on a broader set of 14 openness criteria [
18]. Some of their stipulations, such as release of the model’s code and training weights, are the same as the OSI’s. But they added other requirements that would allow a user to understand and run the system, such as its availability as a software package and publication of a peer-reviewed paper describing the model. By their ranking, ChatGPT was the most closed system, and the version of Llama 3 they analyzed was fourth from the bottom [
18]. The most open was a version of OLMo 7B from the non-profit Ai2 in Seattle, WA, USA [
18]. Dingemanse said that many of the companies who create so-called open AI models are guilty of “openwashing,” claiming a greater degree of transparency than they deliver. “If exemptions like those in the EU AI Act are to have any meaningful effect in fostering innovation, which is why they were introduced, then they should work to reward the most open and innovative model providers,” he said.
Even the most open AI models can still be effectively closed to users, Widder and colleagues argue in an article published in
Nature in November 2024 [
29]. Running the models requires vast amounts of computing power, which is usually only available at a substantial cost. And a variety of obstacles can prevent users from developing their own open-source models. Building them requires access to enormous data sets for training and a workforce with AI expertise. Shah noted that even more labor is necessary during the fine-tuning that models undergo after training. The idea that “open AI somehow democratizes AI” is a misconception, said Widder. “When you need access to expensive resources to build or use otherwise ‘open’ AI, it is not actually very open.”
One way to make models more open, some researchers say, is to shrink them, thus reducing users’ dependence on big tech for computing resources. The Llama 3 version that Meta released in July 2024 contains more than 400 billion parameters, the variables that the system learns during its training and that determine its output [
30,
31]. But researchers have created an assortment of models that are less than one-tenth that size and can run on a laptop or, in some cases, a cell phone [
32]. Apple and Microsoft are among the companies that have released scaled-down models that include the training weights [
32,
33]. And in February 2024 Ai2 published a series of small models that meet the OSI criteria for open source [
34]. Although such models can be used by more people, they still fail to provide broad accessibility, Widder said, “small models often still end up being made by well-resourced institutions.”
Experts are not just divided on what counts as open-source AI. They also disagree about its dangers. Critics like Harris contend that open-source AI is too easy to use for harmful purposes [
11]. The models can be altered to create child pornography and produce deepfakes—manipulated and deceptive videos and images, Harris said. Authorities might employ them with unintended—or intended—harmful consequences for innocent people. Terrorists might use them to produce biological or chemical weapons. AI models’ ability to design new, potentially harmful proteins is particularly worrying, researchers say [
35]. Harris likens AI to nuclear technology and favors stricter regulations that would require, for example, companies to assume liability for misuse of their models and to clearly label AI-created content so that viewers know it is machine-generated [
11]. The new EU regulations are a step in the right direction, he said. “The EU AI Act is the most significant piece of legislation about AI in the world today.”
Shah argues that locking away AI is not the solution. AI is not like nuclear technology, he said. “One can build it on a laptop. It is not something you could stop in the same way.” Widder notes that some AI companies, such as OpenAI, favor regulation in the name of safety, but he thinks their support has an ulterior motive—to protect their products. “A lot of rhetoric around ‘unsafe’ AI is simply a thinly disguised attempt to build a regulatory moat.”
In any case, regulators have stepped into the breach, not waiting for the companies and AI experts to resolve their differences about the potential dangers of AI, whether open-source or not. The EU is now drafting codes of practice that will spell out specific rules that AI developers will have to follow [
36], and other efforts to regulate the technology appear to be gaining steam. In the US, 45 states considered bills to regulate AI in 2024, and legislatures in two states, California and Colorado, approved new laws, although California’s governor vetoed its bill [
37].