Born and raised in Japan, Tatsuya Amano started learning English in the 7th grade. But after finishing his doctorate at the University of Tokyo in 2006, he began traveling to international scientific conferences. To his dismay, he discovered his English was not good enough to allow him to converse with many of the other participants. "I found so many barriers to communicating with other people in English,” recalled Amano, now an associate professor in the School of the Environment at the University of Queensland in Brisbane, Australia.
Many other scientists who did not grow up speaking English also face professional limitations. A survey of more than 900 envi-ronmental science researchers that Amano and his colleagues con-ducted in 2023 revealed that non-native English speakers are more than twice as likely to have had a paper rejected because of writing quality—something that recently happened to Amano [
1]. Non-native speakers are less likely to attend scientific conferences. They may also be less productive because they must spend more time on activities that require knowledge of English. Reading papers takes up to 91% longer and writing papers up to 51% longer, the survey revealed [
1].
But Amano and others are hopeful that one type of technology will help many non-native English speakers overcome these obsta-cles—artificial intelligence (AI). The quality of AI translations has dramatically improved in just over a decade; for some widely used languages they are now almost as good as translations produced by human professionals [
2]. Myriad AI translation tools are now avail-able, and they are smoothing communication not just in science, but also in tourism, business, and many other areas of society. "Machine translation is the big success story in AI,” said Philipp Koehn, professor of computer science at Johns Hopkins University (JHU) in Baltimore, MD, USA. "It works, and it is useful.”
AI, however, is not quite ready to take over the world’s translat-ing duties. Because it frequently stumbles, human translators still need to check its work, said Andy Benzo, the San Diego-based president-elect of the American Translators Association (Alexandria, VA, USA), which has almost 7000 members. "We are in a transition, but it will not replace the translator.” Moreover, AI can produce fluent results in only a tiny minority of the world’s 7000 languages [
3,
4]. "The quality of the tools is not guaranteed for most of the world’s languages,” said Amano.
Regardless, AI-generated translations are now ubiquitous. Chat-bots such as ChatGPT provide them, joining specialty websites such as Google Translate, which made its official debut in 2006. Cell phones feature translation apps, some of which can handle more than 100 languages with varying levels of proficiency. Travelers can wear smart glasses that translate text and speech (
Fig. 1 [
5,
6]). Video conferencing services such as Google Meet and Microsoft Teams translate participants’ speech in near real-time and then dub their voices in another language [
7,
8]. Some publish-ers, including UK-based academic publishing giant Taylor & Francis, have announced plans to use AI to translate portions of their catalogs [
9,
10].
Today’s multitude of translating options is the fruit of research that began more than 70 years ago, when scientists inspired by the code-breaking devices of World War II began trying to develop machines that could translate [
11]. They made enough progress that by the 1970s companies such as SYSTRAN of San Diego, CA, USA, were offering machine translation commercially [
12]. Starting in the 1990s, the internet made automated translation widely available. Google Translate was not the first web-based service, but it became the go-to choice for many people because it was free, easy to use, and its results in many languages were reasonably accurate [
13,
14]. The initial versions broke text into phrases and then used an approach known as statistical machine translation to deduce, based on translated transcripts from the United Nations and the European Parliament, the most likely corresponding string of words in the target language [
13,
14].
While functional, that earliest iteration of Google Translate delivered plenty of howlers, particularly when translating to or from less common languages. Its results for Welsh were wrong so often that people in Wales adopted the term Scymraeg, which means "scummy Welsh,” to describe its inaccurate output [
15]. One reason for the high error rate was that Google Translate first converted text into English and then into the appropriate language [
16].
The accuracy of AI translation has recently soared due to several key developments, said Kenton Murray, a research scientist at JHU who studies natural language processing by computer systems. That surge in performance is apparent in the results of a competi-tion, now held during the annual Conference on Machine Transla-tion (WMT—known by this acronym due to the meeting’s previous existence as the Workshop on Machine Translation), that has pit-ted translation algorithms against each other for nearly 20 years [
2]. Accuracy scores began to climb in 2014 when the AI models known as neural networks began to replace statistical machine translation among the competition entries, said Murray, who co-authored an analysis of results from the most recent WMT. Unlike the early incarnations of Google Translate, neural networks translate directly from one language to another, rather than going through English first. The models also boasted two advantages over statistical machine translation systems, said Koehn, one of WMT’s founders. When trying to determine the meaning of a particular word, statistical machine translation models only consider a few nearby words. Neural networks can analyze much longer sequences of text, gathering more clues about each word’s mean-ing. In addition, neural networks are better at generalizing, Koehn said. Once they have deciphered a phrase in one passage, they can understand it when they encounter it again. Google Translate switched to neural networks in 2016, and users reported overnight improvements in translation quality [
13,
17].
Murray said that a second big jump in translation performance began in 2017, with the introduction of neural networks known as transformers. These models boast a feature known as self-attention that enables them to analyze an entire sequence of text at once, rather than word-by-word as the most used neural networks did [
16,
18]. This ability "makes them more sensitive to context,” said Koehn. And it leads to substantial improvements in accuracy. Well-known large language models (LLMs) such as GPT-4 from OpenAI (San Francisco, CA, USA), Claude from Anthropic (also based in San Francisco), and Llama from Meta (Menlo Park, CA, USA) are transformers. LLMs can perform a range of tasks, not just translation, but they often produce more fluent translations than specialized translation models. "One hundred percent it is due to data,” said Koehn. Specialized translation models are trained on billions of words, whereas LLMs are trained on trillions of words, he said.
LLMs showed their prowess at the most recent WMT, held in November 2024 in Miami, FL, USA [
2]. Twenty-eight teams of researchers submitted models for evaluation, most of which were LLMs, and the conference also assessed four web-based translation services and eight publicly available LLMs, including versions of GPT-4, Claude, and Llama. For seven of the 11 translation tasks evaluated, AI systems were as good as or better than humans. For example, the machines proved superior at translating English into Chinese, Ukrainian, and Hindi, while humans topped the machines at translating English into Spanish and English into Russian.
AI translations are getting better—but can they really capture the experience of reading text in the original language? Whether human translations can do that has long been a matter of debate, particularly for literary works. "It is a fuzzy problem to decide if a translation is good. We still do not have benchmarks in litera-ture,” said Katherine Elkins, professor of comparative literature and humanities at Kenyon College in Gambier, OH, USA, who stud-ies how AI affects culture. To address the quality question, Elkins turned the tables, using AI tools to judge three human translations of the first volume of Àla recherche du temps perdu, a literary mas-terwork by the French novelist Marcel Proust [
19]. AI uncovered large-scale patterns in the prose—in the density and diversity of language and in emotional ups and downs that keep readers invested in the story—that none of the translations reproduced, she found. The result does not imply the translations were inade-quate, said Elkins, but they did miss some elements present in the original French. That AI can discern patterns in the work that have eluded Proust scholars suggests that it could also do the same when translating, Elkins added. "We are already within that realm” where AI could soon produce a readable, compelling translation of a literary work, she said.
The success of machine translation has prompted some com-mentators to question whether learning another language is worthwhile [
20] and to declare that AI translation is "almost solved” [
21]. Murray and Koehn are among the experts who dis-agree. "They are doing very well,” said Murray. "But that is not to say that everything is solved.” AI translators can struggle with slang [
5], miss cultural references, use insensitive language [
3,
22], and make other mistakes. They often cannot replicate the tone and style of a passage [
22]. LLMs pose a further problem as translators because they are more likely to hallucinate, or invent results, than bespoke translation models [
23]. Such errors are instructive, Murray pointed out. "Showing where the models fail is very important.” But the consequences of blunders can be disas-trous—inaccurate AI translations have, in extreme cases, led to mistaken arrests and even deaths [
24].
These failings explain why Benzo is not worried about the future of her profession. The United States Bureau of Labor Statis-tics agrees, predicting that the number of jobs for translators will increase 4% over the next 10 years [
22]. Humans do not need to verify every AI translation, she said, but they should check critical legal, medical, financial, and educational documents. Readers should also know how text was translated, she said. The European Union already requires that AI-generated content be labeled as such [
25]. Benzo and her colleagues are pushing for international standards that would include voluntary labeling for AI-translated material. "If you use AI, okay, but at least tell me,” she said.
Another limitation of AI translation is that it works well only for languages for which there exist large amounts of data for training. Tech companies are trying to make the tools more inclusive. In 2025 Meta launched an open-source model that can translate among text and speech in as many as 101 languages [
4,
26]. Google Translate now handles 249 languages and language varieties, and the company aims to boost that number to 1000 [
27].
The potential benefits of AI translation for scientists are large, said Amano. When writing a paper, they can seek help from AI if they have trouble composing a sentence. Researchers can also use it to check the prose in papers before submitting them. AI could also change scientific publishing to make papers more acces-sible, he noted. Instead of publishing every article in English, journals could start publishing each article in its authors’ native language; scientists could then read it in their own language with help from AI [
28].
Still, reliance on AI for translation could have downsides in science and other fields. Although some translation tools like Google Translate are free, others are not, and they are out of reach for some scientists with limited budgets. Moreover, researchers who do not speak languages that AI can translate could be shut out of scientific communication. “The long-term effect of using AI tools is something we need to think about and discuss in the scientific community,” Amano said.