Monday, 27 January 2025, was a dismal day for technology investors. The share price of Nvidia (Santa Clara, CA, USA), the world’s largest manufacturer of the graphics processing units (GPUs) that underlie artificial intelligence (AI), plummeted 17% [
1]. The fall sliced 589 billion USD from the company’s value, the biggest single day decrease in history [
1]. The stocks of other AI heavyweights such as Meta (Menlo Park, CA, USA) and Alphabet (Mountain View, CA, USA), Google’s parent company, also sank [
2]. Power companies suffered as well. Shares of Constellation Energy (Baltimore, MD, USA), the country’s largest owner of nuclear plants, fell by 20% [
3].
The trigger for this sell-off was the announcement by DeepSeek, a start-up in Hangzhou, China, that it had released an AI reasoning model that matched or beat the performance of rival models but cost much less to build, train, and operate [
4]. Big technology play-ers such as OpenAI, the organization based in San Francisco, CA, USA, that produced ChatGPT, had already unveiled reasoning mod-els [
5,
6]. "The idea was that they were impossible without signifi-cant amounts of resources,” said Subbarao Kambhampati, professor of computing and augmented intelligence at Arizona State University (Tempe, AZ, USA). For a small, almost unknown company to release a model of this type was "a shot across the bow,” he added. "They said, ‘hey, we can catch up.’”
That prospect scared investors in big technology and power companies, who feared that DeepSeek’s feat could start a trend toward smaller, cheaper AI models that gobbled less electricity [
7]. Much of the initial reaction to DeepSeek’s news was due to "obsessive hype,” as one technology research company put it [
8]. Still, experts say that DeepSeek deserves credit for using creative approaches to develop its reasoning model [
9]. It produced the model, known as R1, even though it did not have access to Nvidia’s most advanced GPUs—US technology export restrictions prohibit their sale to Chinese companies [
10]. "They were very clever about how they used the compute power they had,” said Anthony Cohn, professor of automated reasoning at the University of Leeds (UK) and a researcher at the Alan Turing Institute in London. In addition, industry observers say, the company’s achievement is a testament to the growing AI capability of China, which is rapidly gaining on the companies from Western countries that have dominated the field [
10,
11].
The performance of standard large language models (LLMs) such as OpenAI’s GPT-4 may be plateauing, so large reasoning models, or LRMs, could be the next big thing for the AI industry [
12,
13]. If these models can "think through” questions, they may be better at tasks such as solving complex mathematical problems and writing computer code. Specialized models are necessary because standard LLMs are poor at reasoning, said Kambhampati. OpenAI debuted the first LRM, dubbed o1, in September 2024 and set the standard for the field [
5,
14]. By the time of R1’s release, companies such as Google had jumped in with their own products [
6]. Whether these models reason in the same way as humans do is controversial [
15], but they outperform ordinary LLMs on a variety of problems that require logic to solve.
Although R1 was not the first LRM, it made a splash because DeepSeek seemed to do more with less. The company did not spec-ify how many GPUs it required to train R1, but it did reveal that only about 2000 Nvidia H800 GPUs were necessary to train V3, the LLM that R1 is based on [
10]. In contrast, Meta enlisted 16 000 of the more powerful Nvidia H100 GPUs to train a compa-rable LLM [
10].
DeepSeek also appeared to achieve high performance at a sur-prisingly low cost. The company said that it spent only about 6 mil-lion USD to train V3 (it did not provide an amount for R1) [
16]. Big technology companies are tight-lipped about their costs, but esti-mates suggest that training some LLMs may now require 100 mil-lion to 1 billion USD [
17]. DeepSeek’s 6 million USD figure was likely the amount needed just for V3’s final training, said Kambhampati. Still, it suggested that DeepSeek had managed to economize. R1’s price tag remains unclear, however, because the company has not disclosed its total development cost [
7].
Researchers know a lot about how R1 works because DeepSeek published a preprint that described the model, made the source code available for free, and provided the model weights that shape what R1 learns during training [
18-
20]. The model shows "a lot of very careful engineering,” said Cohn. One example is how it learns [
21]. After standard LLMs go through initial training on gigantic amounts of text, they undergo further refinement to improve their answers, including a stage known as reinforcement learning from human feedback (RLHF), during which they tailor their responses based on ratings by human evaluators [
21]. DeepSeek’s approach also included an initial training period, but the reinforcement learning stage did not rely on human feedback. Instead, R1 attempted to solve problems with known solutions, and algorithms graded its answers [
21,
22]. During the process, the model gener-ates long sequences of so-called intermediate tokens, or raw out-put [
22]. Some researchers refer to these sequences as reasoning traces or chains of thought, although Kambhampati and colleagues object to that terminology because it implies that they are compa-rable to the steps of human reasoning [
23]. The model can produce pages and pages of intermediate tokens for each problem it attempts to solve [
23]. As Kambhampati and his coauthors put it, they resemble "better formatted and spelled human scratch work” [
23]. However, they serve a useful purpose because they are the fodder for further training that allows the model to improve its odds of getting the right answers [
22]. One way DeepSeek’s approach to training may have permitted the company to cut development costs is by reducing the number of workers needed to refine the model [
21].
DeepSeek’s achievement could open the way for a range of powerful new models, said Fei Wu, professor of computer science and director of the Artificial Intelligence Research Institute at Zhejiang University in Hangzhou, China. "Building upon Deep- Seek’s work, we can develop numerous domain-specific small models to accomplish tasks within particular fields,” he said. "By then integrating these specialized models, we can create a highly versatile general-purpose model capable of performing tasks across different domains.” Moreover, Wu said, several large Chinese technology companies—including Baidu (Beijing), Tencent/WeChat (Shenzhen), and ByteDance (Beijing)—are inte-grating DeepSeek’s models into their operations. "DeepSeek’s open-source strategy is accelerating the universalization of AI technology.”
DeepSeek jolted stock markets, outraged pundits and politicians who lamented that the United States was falling behind technolog-ically, and sparked plenty of breathless news coverage. Its applica-tion (app) quickly became a bestseller in Apple’s App Store (
Fig. 1). However, whether DeepSeek’s approach will revolutionize the AI industry remains uncertain. For instance, it is not a given that other companies will pursue leaner, cheaper models because of Deep- Seek. Competitors may stick with the standard paradigm for AI advances, which emphasizes ever-larger and increasingly expen-sive models [
7]. "It is not clear whether small models will domi-nate,” said Kambhampati.
Moreover, DeepSeek’s approach may not curb AI’s growing appetite for energy, as some experts prophesied [
24-
27]. Although R1 appears to have been cheaper to train, the reasoning it performs is computationally more demanding and requires more power to answer questions than an LLM [
28]. Even if companies can cut the power use of their models, they might respond by making yet larger ones, thus negating any energy savings [
24].
And R1 does not stand apart from its competitors. DeepSeek’s announcement claimed that the model topped o1 on three bench-marks and almost matched it on two others [
29]. However, there are no standard benchmarks for evaluating the performance of AI models [
30], and companies tend to cherry-pick the ones that show their models in the best light [
31]. Now that researchers have had months to put R1 through its paces on many different challenges, they can say it excels in some areas but falls behind in others.
A study published in April 2025 [
32], led by Xueyan Mei, an instructor in biomedical engineering and imaging at the Icahn School of Medicine at Mount Sinai (New York City, NY, USA), and Zahi Fayad, a professor of radiology and medicine at the same school, illustrates this mixed record. Mei, Fayad, and colleagues compared three models—R1, o1, and a version of Meta’s Llama—on four medi-cally relevant tasks. The Llama version they used, which was an older, smaller model, got the lowest score each time. They found that o1 topped R1 on a multiple-choice exam that all United States doc-tors must pass. The two models were about equal at classifying tumors and making diagnoses from case descriptions. However, R1 provided clearer reasoning for its diagnoses. And o1 outperformed R1 at writing summaries of imaging studies. The researchers asked radiologists to grade the imaging summaries, and R1 scored lower because it was verbose and, in some instances, hallucinated, or made-up answers, said Mei. Overall, she said, "it has very good com-mon sense, but I would be very careful about the outputs.”
Cohn and his team have also been evaluating the model’s capa-bilities on various tasks, including spatial reasoning. They found that o1 was superior. R1 also tends to be slow, Cohn said. His take on the model is that "it was a wake-up call, but in the long run I do not think it is going to make a huge difference.” Investors now seem to agree. Within a month, Nvidia’s stock had bounced back from its DeepSeek-prompted tumble [
33]. And power companies say they still anticipate large increases in electricity demand from AI [
34].
R1 suffers from a further drawback. Many organizations— including several government agencies in the United States, a num-ber of universities, and companies such as Microsoft—have banned the company’s models from their systems because of concerns over data security [
35]. Mei and Fayad were only able to analyze R1’s capabilities in their study by running it on an isolated platform.
R1 may be most important for what it suggests about the assumed leadership of Western companies in AI development. Despite its limitations, the model shows that Chinese companies are making rapid advances in the field. And given China’s growing competitiveness in AI research, Cohn said, "it will be a challenge for the West to keep up.”