The December 2022 paper in the cancer journal
Oncoscience appeared to be a conventional discussion of the pros and cons of treating patients with the drug rapamycin [
1]. But the article was written using artificial intelligence (AI) and listed the AI chatbot ChatGPT as its lead author. The large language model (LLM) built by OpenAI (San Francisco, CA, USA) had made its sensational public debut less than a month before [
2], and the paper was one of the first scientific publications to credit it as an author [
3].
Shortly after the article appeared, most scientific publishers announced that they would not permit papers to list ChatGPT or other similar LLMs as co-authors [
4]. However, top scientific jour- nals such as
Science and
Nature now allow scientists to use LLMs to help them write papers, providing they acknowledge the assis- tance [
5,
6]. More and more researchers appear to be taking advan- tage of the opportunity. One study found that at least 1% of scientific papers published in 2023 already appeared to include some AI-modified text [
7,
8].
The growing reliance on AI for scientific writing could be a boon, some experts argue, boosting researchers’ productivity and provid- ing many other benefits [
9-
11]. But the realization that large num- bers of scientists are delegating writing duties to AI also worries many researchers. The models are mistake-prone, raising fears that they could create or perpetuate errors in the scientific literature. “I often compare ChatGPT to a sloppy, unreliable research assistant,” said Sandra Wachter, professor of technology and regulation at the University of Oxford in Oxford, UK. Researchers also fret that AI will churn out biased prose, plagiarize from other works, and worsen the problem of paper mills, unscrupulous publishers that flood the sci- entific literature with phony or shoddy articles [
12,
13].
Scientists face intense pressure to publish, however, and are unlikely to stop using AI as a writing aid. “What we need to do is work out what rules and regulations are necessary” so that authors can get the most out of AI, said Julian Koplin, lecturer in bioethics at Monash University in Melbourne, Australia. “There is a role for these tools as long as they are controlled and managed carefully.” Several AI models have already made a splash in science, includ- ing the protein structure predictor AlphaFold, whose creators shared the
2024 Nobel Prize of Chemistry [
14,
15]. ChatGPT (
Fig. 1) and related LLM such as Gemini from Google (Mountain View, CA, USA) and Claude from the AI company Anthropic (San Francisco, CA, USA) offer a different capability for researchers—they can rapidly churn out long passages of seemingly coherent prose [
2]. The models assemble sentences by deducing the most likely sequence of words based on statistical patterns in their training data [
2]. AI models’ output depends on this data, and it increasingly includes scientific articles. Several major scientific publishers, including Wiley (Hoboken, NJ, USA) and Taylor and Francis (Milton Park, Oxfordshire, UK), have started licensing papers from their journals for inclusion in AI training sets [
16].
AI models that could generate scientific prose were around before ChatGPT, but the chatbot was much more accessible than its prede- cessors [
17]. Researchers agree that the use of AI for writing assis- tance surged after the release of ChatGPT, but how many authors rely on the technology remains unclear. A 2025 survey by the journal
Nature found that 57% of scientists admitted to seeking writing help from AI in the last two years, and 72% said they wanted to do so in the next two years [
18]. However, the survey did not determine what percentage of the respondents actually used AI to prepare papers.
Gauging how much scientists are using AI for scientific writing is difficult because prose fashioned by LLMs is hard to detect. AI companies hone their models so that the output appears to have come from a human. A team from Stanford University (Palo Alto, CA, USA) developed an approach to estimate the frequency of sci- entific writing that includes AI-modified text [
19,
20]. They asked ChatGPT to write sample paragraphs and compared the results to human-authored paragraphs on the same subjects. Certain words appeared much more commonly in prose generated by AI, includ- ing “commendable,” “innovative,” “meticulous,” and “intricate.” Measuring the frequencies of these verbal markers allowed the researchers to estimate what fraction of texts had likely been altered by AI. Their results reported in two 2024 preprints suggest that the percentage is surprisingly high and increasing. For instance, the researchers analyzed peer reviews for AI conferences that took place before and after ChatGPT’s release. Between 6.5% and 16.9% of the reviews from post-ChatGPT-release meetings showed signs of being “substantially modified” by AI, which means the changes went beyond spelling corrections or minor stylistic improvements [
19]. And the number of computer science abstracts with indications of AI use jumped 17.5% in 2023 and early 2024, the scientists reported [
20].
Andrew Gray, a bibliometrics support officer at University Col- lege London (London, UK), took this approach further to estimate the prevalence of AI-modified papers across a range of disciplines [
8]. “I got terrifyingly clear results,” he said of his 2024 study. Like the Stanford group, Gray found that the frequency of AI’s favored words shot up after ChatGPT’s introduction. For instance, the use of the word “intricate” more than doubled in 2023 over previous years [
8]. His calculations suggested that authors had received AI assistance to write at least 1% of the papers published in 2023 [
8]. That more scientists are seeking help from AI makes sense, Gray said. “People need to write a lot, and they want to streamline it.” Still, he described his results as “quite alarming.” How much AI contributed to the published texts is not clear, he said. Scientists might only resort to LLMs for polishing. Alternatively, they could be allowing AI to produce complete papers. “People do not say what they are using the models for.”
Despite Gray’s misgivings, the London, UK-based magazine
The Economist cheered his results and similar findings that suggest a growing role for AI in scientific publishing [
9]. By automating many of the time-consuming steps of writing papers, AI could free scien- tists to spend more time on their research, the magazine argued [
9]. Some scientists tout other potential upsides from the technol- ogy. AI can translate papers and help authors who are not well versed in English, the international language of science, craft better articles [
10]. AI might also speed peer review, the vetting process for scientific publications [
11].
But many researchers remain wary, and a big reason is AI’s well-documented unreliability. “The fundamental thing we need to be trying to address is errors entering the scholarly record,” said Koplin. ChatGPT and its counterparts are infamous for making up results, or hallucinating [
2,
15,
21]. Their propensity to invent answers means that any AI-generated text could be wildly wrong. The upside of hallucinations is that they are often outlandish, which makes them easy to recognize, Wachter noted. But chatbots also make subtle mistakes that can slip past even well-informed people. In the end, she said, the errors stem from how the chatbots operate. AI “gives the illusion that it understands us and can give us back a well-thought-out answer.” However, chatbots are statis- tical models, so they do not necessarily return the right answer, only the most likely one, she said. “A model that can predict the next word in a sequence will be good at that; it will not be good at telling the truth.”
A 2023 experiment by Melissa Kacena, professor of orthopaedic surgery at the Indiana University School of Medicine in Indi- anapolis, IN, USA, and colleagues illustrates the benefits and costs of allowing AI to write scientific text. The researchers asked ChatGPT to generate review papers on three topics in bone health. The LLM had to analyze the literature in the field, create an outline, and then write the paper. Kacena and colleagues then compared the results to human-written versions and to reviews co-authored by AI—for the latter of the three sets of papers, the team provided the references and an outline for the chatbot to follow [
22]. Using AI slashed preparation and writing time. AI working solo com- pleted a draft review on neural regulation of fracture healing in about one-fourth as much time as human authors [
23]. The chatbot also delivered clear, well-structured text, said Kacena. Peer review- ers said that only two of the pieces needed to be reorganized, one human-written and the other co-authored by AI [
22]. “Sometimes AI is smarter than we are,” Kacena said.
Still, the references in the papers written solely by AI were especially riddled with errors. In the AI-only-written review on neural regulation of fracture healing, for instance, six of the 55 cited papers did not exist and another 43 were considered irrele- vant for the context, meaning only six papers were correctly cited [
23]. Even when the researchers gave ChatGPT the references it was supposed to cite, it got some of them wrong [
23]. The chatbot’s tendency to blunder created more work for the team, Kacena and colleagues determined. Researchers had to spend an additional 27 hours fact-checking the AI-only-written review on neural regu- lation of fracture healing. The AI-only-written review on the effects of Alzheimer’s disease on bone did not require as much vetting, but the researchers still had to put in a further 8 hours of work [
24]. The final verdict, Kacena said, was that the chatbot “was not per- fect, and it required a lot of human oversight.”
Publishers and the scientific community are still trying to decide what uses of AI are acceptable. “Acknowledging how widespread it is is a big step,” said Gray. But the choice of whether to apply AI in scientific writing—and whether to admit doing it—comes down to individual authors, he said. Researchers draw the line at different uses. Wachter said that AI is only valuable in situations where authors provide the information for it to digest, such as creating a summary of text they know is accurate [
25]. Koplin is more open to the technology. “I do not think any use is off-limits,” he said. But users must remember that AI might be feeding them nonsense, he added. “At every stage, you should treat it like you would treat a friend who has had several beers and is holding court at the pub.”