Open-Source Artificial Intelligence—How Open? How Safe?

Mitch Leslie

Engineering ›› 2025, Vol. 47 ›› Issue (4) : 9 -12.

PDF (577KB)
Engineering ›› 2025, Vol. 47 ›› Issue (4) :9 -12. DOI: 10.1016/j.eng.2025.03.002
News & Highlights
research-article

Open-Source Artificial Intelligence—How Open? How Safe?

Author information +
History +
PDF (577KB)

Graphical abstract

Cite this article

Download citation ▾
Mitch Leslie. Open-Source Artificial Intelligence—How Open? How Safe?. Engineering, 2025, 47(4): 9-12 DOI:10.1016/j.eng.2025.03.002

登录浏览全文

4963

注册一个新账户 忘记密码

In May 2024, scientists were excited but disappointed when Google DeepMind (USA) launched AlphaFold3 (AF3), the latest version of its artificial intelligence (AI) program for predicting protein folding [[1], [2], [3]]. Although the model was more powerful than its predecessor AlphaFold2 (AF2), DeepMind, a London-based research division of Alphabet, Google’s parent company, did not publish the computer code for the new release [1,3,4]. Unlike with AF2, researchers could not run AF3 themselves and could only perform up to 10 (now 20) limited protein structure predictions per day through a website [3,4]. DeepMind also withheld other vital details—the model weights, the values that determine what an AI system learns from its training data [4,5].
After criticism from researchers, DeepMind published the AF3 source code in November 2024 [1,3,4]. But only non-commercial users can run the program, and academic credentials are necessary to obtain the model weights [4].
What information developers should release about their AI systems and what limits they should place on use of the software are controversial [6]. Advocates of open-source AI argue that full transparency and unrestricted use is a boon, allowing creative people to develop new products and applications by extending or adapting current AI models [7]. “I am opposed to having this proprietary and closed-loop development, in which something as important as AI is in the hands of a few players,” said Chirag Shah, professor of information science at the University of Washington in Seattle, WA, USA. The US government has endorsed at least some degree of AI openness [8], as has the European Union (EU), whose strict new AI regulations go easier on open-source systems [9,10].
At the other end of the spectrum are the experts who contend that open-source AI is dangerous because it can easily be misused [11]. “There are some technologies we do not want to be open source,” said David E. Harris, a chancellor's public scholar at the University of California in Berkeley, CA, USA, and former research manager for responsible AI at the Silicon Valley tech giant Meta. Making the issue even more complicated is the absence of an agreed-upon definition of what constitutes open-source AI [12].
Open-source software that anyone can use or modify has been around for decades [13]. Prominent examples include the Linux operating system and the Mozilla Firefox web browser [14]. Developers have built on such open-source programs to create other powerful software such as the Linux-based Android operating system for cell phones [14]. Open-source AI, proponents contend, would spur innovation, speed development of new models, open additional uses for the technology, boost scientific research, and provide other benefits (Fig. 1). By broadening access to models, open-source AI would, as The Economist put it, help to ensure that its power “is not concentrated in the hands of a few Californian firms” [15].
AI transparency has also gained government backing. A 2024 report by the US National Telecommunications and Information Administration advocates open-weight models, in which developers release the weights they used during training [16]. The EU AI Act that went into force in 2024 goes further. For example, the rules require developers of general-purpose AI models, such as the famous chatbot ChatGPT, to spell out how they trained and tested their systems [9,17]. Makers of open systems are exempt from those requirements [18]. However, the EU regulators punted on specifying the criteria for open-source AI. “We need good ways to judge whether models really deserve to be exempted,” said Mark Dingemanse, an associate professor of language and communication at Radboud University, Nijmegen, the Netherlands.
Applying the open-source concept to AI is tricky because a model is not just software. “With AI, code is not the ‘meat,’” said David Widder, a postdoctoral researcher at Cornell University’s Cornell Tech in New York, NY, USA. A model’s output also depends on the data it learned from, the training weights that shape what inferences it draws, and other factors. Moreover, said Shah, “the term ‘open AI’ has lost its meaning in some respects.” For instance, OpenAI, the company headquartered in San Francisco, CA, USA, that produces ChatGPT, is no longer open with its technology. As a result, tech titan Elon Musk, one of OpenAI’s co-founders, has filed at least two lawsuits against the company for violating its original principles [19].
Several companies and non-profits have released AI models that they describe as open. Meta’s Llama 3, which debuted in April 2024, is a large language model that is a rival to more restrictive systems like Google's Gemini and OpenAI's GPT-4o, for example [20]. Other companies that have unveiled self-described open AI models include Mistral AI of Paris, France [21], and Alibaba of Hangzhou, China [22]. A non-corporate example is GPT-J, a large language model aimed at AI researchers that was launched by the non-profit EleutherAI (Washington, DC, USA) [23]. But because the AI field lacks a consensus definition for the term, it is difficult to assess which, if any, of these systems is truly open-source or simply promoted as such [24].
In August 2024, the influential Open Source Initiative (OSI), a non-profit located in West Hollywood, CA, USA, that sets the standards for open-source software, published a first draft of its definition for open-source AI [25,26]. To qualify, a model must be freely available for any purpose, modifiable for any purpose, and shareable for any purpose [25]. Potential users must have full access to its source code and training weights [25]. The OSI definition does not require the release of training datasets—the reasoning is that AI models can learn from confidential sources such as medical records [26]. However, the model’s creator must provide enough information about the training data “so that a skilled person can build a substantially equivalent system” [25].
Almost none of the so-called open AI models meets the OSI’s criteria. Meta’s Llama 3, for instance, fails the test because Meta does not reveal the model’s training data and requires some large potential users to apply for a license before running the model [27,28]. Not surprisingly, Meta takes issue with the OSI draft definition [27].
Although OSI’s proposed criteria are a good start, they include a “crucial loophole,” said Dingemanse, as developers do not have to release their full training datasets. “This is an open invitation to play fast and loose with dataset sharing, and it makes it too easy to just claim something about your data’s availability,” he said. In a study published in June 2024, Dingemanse and his colleague Andreas Liesenfeld, an assistant professor of language and communication, also at Radboud University, evaluated 40 text-generating AI models, including ChatGPT and a version of Llama 3, on a broader set of 14 openness criteria [18]. Some of their stipulations, such as release of the model’s code and training weights, are the same as the OSI’s. But they added other requirements that would allow a user to understand and run the system, such as its availability as a software package and publication of a peer-reviewed paper describing the model. By their ranking, ChatGPT was the most closed system, and the version of Llama 3 they analyzed was fourth from the bottom [18]. The most open was a version of OLMo 7B from the non-profit Ai2 in Seattle, WA, USA [18]. Dingemanse said that many of the companies who create so-called open AI models are guilty of “openwashing,” claiming a greater degree of transparency than they deliver. “If exemptions like those in the EU AI Act are to have any meaningful effect in fostering innovation, which is why they were introduced, then they should work to reward the most open and innovative model providers,” he said.
Even the most open AI models can still be effectively closed to users, Widder and colleagues argue in an article published in Nature in November 2024 [29]. Running the models requires vast amounts of computing power, which is usually only available at a substantial cost. And a variety of obstacles can prevent users from developing their own open-source models. Building them requires access to enormous data sets for training and a workforce with AI expertise. Shah noted that even more labor is necessary during the fine-tuning that models undergo after training. The idea that “open AI somehow democratizes AI” is a misconception, said Widder. “When you need access to expensive resources to build or use otherwise ‘open’ AI, it is not actually very open.”
One way to make models more open, some researchers say, is to shrink them, thus reducing users’ dependence on big tech for computing resources. The Llama 3 version that Meta released in July 2024 contains more than 400 billion parameters, the variables that the system learns during its training and that determine its output [30,31]. But researchers have created an assortment of models that are less than one-tenth that size and can run on a laptop or, in some cases, a cell phone [32]. Apple and Microsoft are among the companies that have released scaled-down models that include the training weights [32,33]. And in February 2024 Ai2 published a series of small models that meet the OSI criteria for open source [34]. Although such models can be used by more people, they still fail to provide broad accessibility, Widder said, “small models often still end up being made by well-resourced institutions.”
Experts are not just divided on what counts as open-source AI. They also disagree about its dangers. Critics like Harris contend that open-source AI is too easy to use for harmful purposes [11]. The models can be altered to create child pornography and produce deepfakes—manipulated and deceptive videos and images, Harris said. Authorities might employ them with unintended—or intended—harmful consequences for innocent people. Terrorists might use them to produce biological or chemical weapons. AI models’ ability to design new, potentially harmful proteins is particularly worrying, researchers say [35]. Harris likens AI to nuclear technology and favors stricter regulations that would require, for example, companies to assume liability for misuse of their models and to clearly label AI-created content so that viewers know it is machine-generated [11]. The new EU regulations are a step in the right direction, he said. “The EU AI Act is the most significant piece of legislation about AI in the world today.”
Shah argues that locking away AI is not the solution. AI is not like nuclear technology, he said. “One can build it on a laptop. It is not something you could stop in the same way.” Widder notes that some AI companies, such as OpenAI, favor regulation in the name of safety, but he thinks their support has an ulterior motive—to protect their products. “A lot of rhetoric around ‘unsafe’ AI is simply a thinly disguised attempt to build a regulatory moat.”
In any case, regulators have stepped into the breach, not waiting for the companies and AI experts to resolve their differences about the potential dangers of AI, whether open-source or not. The EU is now drafting codes of practice that will spell out specific rules that AI developers will have to follow [36], and other efforts to regulate the technology appear to be gaining steam. In the US, 45 states considered bills to regulate AI in 2024, and legislatures in two states, California and Colorado, approved new laws, although California’s governor vetoed its bill [37].

References

[1]

Offord C.Google DeepMind releases code behind its most advanced protein prediction program [Internet].Washington, DC: Science; 2024 Nov 11 [cited 2024 Dec 1]. Available from: https://www.science.org/content/article/google-deepmind-releases-code-behind-its-most-advanced-protein-prediction-program.

[2]

O'Neill S.Machine learning turbocharges structural biology.Engineering 2022; 12(5):9-11.

[3]

Palmer C.AlphaFold wins Nobel Prize, gains functionality, drops open access.Engineering 2025; 45:6-8.

[4]

Callaway E.AI protein-prediction tool AlphaFold3 is now more open.Nature 2024; 635:531-532.

[5]

Jung M.AI essentials: what are model weights? [Internet].San Francisco: Medium; 2024 Oct 10 [cited 2024 Dec 1]. Available from: https://engineadvocacyfoundation.medium.com/ai-essentials-what-are-model-weights-2e5b47ec77a1.

[6]

Lohr S.An industry insider drives an open alternative to big tech’s A.I. [Internet]. New York City: The New York Times; 2023 Oct 19 [cited 2024 Dec 1]. Available from: https://www.nytimes.com/2023/10/19/technology/allen-institute-open-source-ai.html.

[7]

Brooks B.Open-source AI is good for us [Internet].New York City: IEEE Spectrum; 2024 Feb 8 [cited 2024 Dec 1]. Available from: https://spectrum.ieee.org/open-source-ai-good.

[8]

Vaughan-Nichols S.A new White House report embraces open-source AI [Internet].New York City: ZDNet; 2024 Jul 31 [cited 2024 Dec 1]. Available from: https://www.zdnet.com/article/a-new-white-house-report-embraces-open-source-ai/.

[9]

Palmer C.European Union issues world’s first comprehensive regulations for artificial intelligence.Engineering 2024; 38(7):5-7.

[10]

Gibney E.Not all 'open source' AI models are actually Open: here's a ranking [Internet].London: Nature; 2024 Jun 19 [cited 2024 Dec 1]. Available from: https://www.nature.com/articles/d41586-024-02012-5.

[11]

Harris DE.Open-source AI is uniquely dangerous [Internet].New York City: IEEE Spectrum; 2024 Jan 12 [cited 2024 Dec 1]. Available from: https://spectrum.ieee.org/open-source-ai-2666932122.

[12]

Wiggers K.We finally have an 'official' definition for open source AI [Internet].San Francisco: TechCrunch; 2024 Oct 28 [cited 2024 Dec 1]. Available from: https://techcrunch.com/2024/10/28/we-finally-have-an-official-definition-for-open-source-ai/.

[13]

Susnjara S, Smalley I.What is open source software? [Internet].Armonk: IBM; [cited 2024 Dec 1]. Available from: https://www.ibm.com/topics/open-source.

[14]

The Economist.A battle is raging over the definition of open-source AI [Internet].London: The Economist; 2024 Nov 6 [cited 2024 Dec 1]. Available from: https://www.economist.com/science-and-technology/2024/11/06/a-battle-is-raging-over-the-definition-of-open-source-ai.

[15]

The Economist.Why open-source AI models are good for the world [Internet].London: The Economist; 2024 Nov 7 [cited 2024 Dec 1]. Available from: https://www.economist.com/leaders/2024/11/07/why-open-source-ai-models-are-good-for-the-world.

[16]

National Telecommunications and Information Administration.Dual-use foundation models with widely available model weights [Internet].Washington, DC: National Telecommunications and Information Administration; 2024 Jul [cited 2024 Dec 1]. Available from: https://www.ntia.gov/sites/default/files/publications/ntia-ai-open-model-report.pdf.

[17]

Chan K.Europe's world-first AI rules get final approval from lawmakers.Here's what happens next [Internet]. New York City: Associated Press; 2024 Mar 13 [cited 2024 Dec 1]. Available from: https://apnews.com/article/ai-act-european-union-chatbots-155157e2be2e42d0f1acca33983d8c82.

[18]

Liesenfeld A, Dingemanse M.Rethinking open source generative AI: open-washing and the EU AI act.In: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency; 2024 Jun 3–6; Rio de Janeiro, Brazil. New York city: Association for Computing Machinery; 2024. p. 1774–87.

[19]

Acton M, Kinder T.Elon Musk files new Lawsuit against OpenAI and Sam Altman [Internet].London: Financial Times; 2024 Aug 5 [cited 2024 Dec 1]. Available from: https://www.ft.com/content/bcfc3cc8-6465-4fd3-b44a-bdf22996fc3a.

[20]

Knight W.Meta's open source Llama 3 is already nipping at OpenAI's heels [Internet].San Francisco: Wired; 2024 Apr 25 [cited 2024 Dec 1]. Available from: https://www.wired.com/story/metas-open-source-llama-3-nipping-at-openais-heels/.

[21]

Zeff M.Mistral's large 2 is its answer to Meta and OpenAI's latest models [Internet].San Francisco: TechCrunch; 2024 Jul 24 [cited 2024 Dec 1]. Available from: https://techcrunch.com/2024/07/24/mistral-releases-large-2-meta-openai-ai-models/.

[22]

Yang Z.Why Chinese companies are betting on open-source AI [Internet].Cambridge: MIT Technology Review; 2024 Jul 24 [cited 2024 Dec 1]. Available from: https://www.technologyreview.com/2024/07/24/1095239/chinese-companies-open-source-ai/.

[23]

Venture Beat.Open source NLP is fueling a new wave of startups [Internet].San Francisco: VentureBeat; 2021 Dec 23 [cited 2024 Dec 1]. Available from: https://venturebeat.com/ai/open-source-nlp-is-fueling-a-new-wave-of-startups/.

[24]

Gent E.The tech industry can't agree on what open-source AI means.That's a Problem [Internet]. Cambridge: MIT Technology Review; 2024 Mar 25 [cited 2024 Dec 1]. Available from: https://www.technologyreview.com/2024/03/25/1090111/tech-industry-open-source-ai-definition-problem/.

[25]

Open Source Initiative.The open source AI definition—1.0 [Internet]. West Hollywood: Open Source Initiative; 2024 [cited 2024 Dec 1]. Available from: https://opensource.org/ai/open-source-ai-definition.

[26]

Williams R, O'Donnell J.We finally have a definition for open-source AI [Internet].Cambridge: MIT Technology Review; 2024 Aug 22 [cited 2024 Dec 1]. Available from: https://www.technologyreview.com/2024/08/22/1097224/we-finally-have-a-definition-for-open-source-ai/.

[27]

Robison K.Open-source AI must reveal its training data, per new OSI definition [Internet].New York City: The Verge; 2024 Oct 29 [cited 2024 Dec 1]. Available from: https://www.theverge.com/2024/10/28/24281820/open-source-initiative-definition-artificial-intelligence-meta-llama.

[28]

Robison K.Meta’s new A.I. is an open-source breakthrough with fine print to freeze out competitors [Internet]. Sunnyvale: Yahoo Finance; 2023 Jul 18 [cited 2024 Dec 7]. Available from: https://finance.yahoo.com/news/meta-open-source-breakthrough-fine-025211942.html.

[29]

Widder DG, Whittaker M, West SM.Why 'open' AI systems are actually closed, and why this matters.Nature 2024; 635:827-833.

[30]

Tech Crunch.Meta releases its biggest ‘open’ AI model yet [Internet].San Francisco: TechCrunch; 2024 Jul 23 [cited 2024 Dec 1]. Available from: https://techcrunch.com/2024/07/23/meta-releases-its-biggest-open-ai-model-yet/.

[31]

Peters J.AI is confusing—here's your cheat sheet [Internet].New York City: The Verge; 2024 Jul 22 [cited 2024 Dec 1]. Available from: https://www.theverge.com/24201441/ai-terminology-explained-humans.

[32]

Hutson M.Forget ChatGPT: why researchers now run small AIs on their laptops.Nature 2024; 633:728-729.

[33]

Agarwal S.Apple, microsoft shrink AI models to improve them [Internet].New York City: IEEE Spectrum; 2024 Jun 20 [cited 2024 Dec 1]. Available from: https://spectrum.ieee.org/small-language-models-apple-microsoft.

[34]

Wiggers K.AI2 open sources text-generating AI models—and the data used to train them [Internet].San Francisco: TechCrunch; 2024 Feb 1 [cited 2024 Dec 1]. Available from: https://techcrunch.com/2024/02/01/ai2-open-sources-text-generating-ai-models-and-the-data-used-to-train-them/.

[35]

Callaway E.Could AI-designed proteins be weaponized? Scientists lay out safety guidelines.Nature 2024; 627:478.

[36]

Coulter M.Tech giants push to dilute Europe's AI Act [Internet].London: Reuters; 2024 Sep 20 [cited 2024 Dec 1]. Available from: https://www.reuters.com/technology/artificial-intelligence/tech-giants-push-dilute-europes-ai-act-2024-09-20/.

[37]

Curry R.How AI regulation in California, Colorado and beyond could threaten U.S. tech dominance [Internet]. New York City: CNBC; 2024 Nov 21 [cited 2024 Dec 1]. Available from: https://www.cnbc.com/2024/11/21/how-ai-laws-in-california-states-threaten-us-tech-dominance.html.

RIGHTS & PERMISSIONS

THE AUTHOR

AI Summary AI Mindmap
PDF (577KB)

9632

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/