1. Introduction
In the field of artificial intelligence (AI)
[1],
[2],
[3],
[4], large language models (LLMs) have revolutionized progress by achieving success across multimodal tasks. Models such as OpenAI o1 have demonstrated high capabilities in natural language understanding, generation, and reasoning. They have been instrumental in advancing applications ranging from conversational agents to complex problem-solving systems. Despite these achievements, LLMs have significant challenges that limit their effectiveness and applicability in certain domains:
• LLMs suffer from inherent limitations that necessitate knowledge empowerment. Their training on large-scale, unsupervised text corpora primarily results in models that encode knowledge implicitly within a vast number of parameters, leading to several issues including stale or outdated information, hallucinations and inaccuracies, inability to reason over structured data, and lack of interpretability. These shortcomings highlight the need to integrate explicit knowledge sources to empower LLMs’ factual accuracy, reasoning capabilities, and interpretability.
• LLMs often struggle with efficiency and domain-specific accuracy. LLMs require significant resources, making them impractical for certain applications and environments. Moreover, their decision-making processes can be opaque, limiting their interpretability. Model collaboration leverages the complementary strengths of various models, integrating the capabilities of large models with the efficiency and specialization of smaller or different functional models. This approach enhances performance, usability, and transparency, addressing the inherent limitations of LLMs.
• LLMs exhibit limitations in adaptability and continual learning. They are generally trained on static datasets and may not effectively incorporate new information or adapt to evolving tasks without extensive retraining. To overcome these issues, models need to mutually promote each other’s learning processes in order to achieve co-evolution, which will enable them to stay current and effective in dynamic environments.
To address these challenges, integrating external knowledge—which provides semantically rich representations of entities and relationships—into LLMs has emerged as a promising direction
[5],
[6],
[7]. This integration occurs through several technical approaches:
(1) Integrating external knowledge into training objectives. These methods design knowledge-aware loss functions by assigning higher masking probabilities to important entities or balancing token-level and entity-level losses.
(2) Incorporating knowledge into LLM inputs. These methods inject relevant subgraphs into input sequences using mechanisms such as visible matrices to mitigate knowledge noise.
(3) Knowledge-empowered instruction tuning. In this approach, models can fine-tune LLMs to comprehend external knowledge by converting them into natural language prompts and employing self-supervised tasks.
(4) Retrieval-augmented knowledge fusion during inference. These methods combine non-parametric retrieval modules with LLMs to dynamically fetch and incorporate pertinent knowledge information.
(5) Knowledge prompting. This approach transforms external knowledge into textual prompts for LLMs without retraining, although it often requires manual prompt engineering.
To improve efficiency and domain-specific accuracy, some works
[8],
[9],
[10] have explored the collaborative interplay between LLMs and smaller models (SMs). In the context of this survey, we use the term “SMs” to refer to models that have significantly fewer parameters than LLMs and lack the latter’s emerging properties. Model collaboration involves the interaction of AI models with varying architectures, sizes, and functionalities to enhance their overall performance. This approach allows models to combine their strengths—such as the efficiency of SMs with the powerful capabilities of larger models—to improve their accuracy, interpretability, and computational efficiency. Model collaboration can be categorized into strategies such as model merging and functional model collaboration. These methods enable the integration of diverse AI techniques, leading to better performance and adaptability across tasks.
To advance adaptability and continual learning, model co-evolution harnesses the mutual evolutionary processes between LLMs and SMs to enhance performance and computational efficiency in multimodal tasks. Model co-evolution refers to the simultaneous evolution of multiple models that influence each other’s development over time while working together to solve complex and diverse tasks in various environments. In this dynamic process, models influence one another by sharing knowledge, parameters, and learning strategies, which helps them adapt to heterogeneous conditions such as different architectures, tasks, and data distributions. Through co-evolution, models can balance the need for specialization and generalization, making them more robust and efficient—particularly in decentralized and federated learning settings where privacy and resource constraints are critical.
Together, knowledge empowerment, collaboration, and co-evolution form an interconnected framework to enhance AI capabilities beyond individual models by achieving levels of reasoning, accuracy, and adaptability unattainable by isolated models. The functionality of each component may depend on the functionality of the others. Knowledge empowerment sometimes relies on collaboration to effectively integrate and utilize external knowledge sources. Collaboration acts as a catalyst for co-evolution, as interacting models influence each other’s development. In turn, co-evolution enhances both knowledge empowerment and collaboration by fostering continuous adaptation and learning.
Furthermore, the ternary space made up of the cyberspace, physical world, and human society (CPH) has expanded the interplay among science, engineering, and society, leading to new dimensions of interaction and development. All these advancements are inseparable from the technologies of the post-LLM era—specifically, knowledge-empowered, collaborative, and co-evolving AI models—which further improve and facilitate these complex interactions. As depicted in
Fig. 1, in the post-LLM era, the integration of such techniques has the potential to tackle complex challenges in hypothesis development, problem formulation, problem-solving, and interpretability. Hypothesis development now leverages domain-specific knowledge within AI models to improve accuracy and reliability. Problem formulation has advanced through the modeling of entities, environments, and laws using multi-agent systems, such as simulating personalized roles in educational settings to uncover pedagogical principles and employing physics-informed neural networks (PINNs) to incorporate physical laws for improved predictive accuracy in, for example, fluid mechanics and heat conduction
[11],
[12]. In problem-solving, the shift from symbolic logic reasoners to large-scale neural networks has enabled models to either retrieve knowledge from databases or memorize and generate complete solutions, with collaborative agent systems enhancing mathematical problem-solving through the separation of computation and verification tasks. Interpretability has been improved by integrating standardized operating procedures (SOPs) into multi-agent workflows for better task decomposition and coordination, and through enhanced human–computer interaction, with multiple LLM-based agents collaborating via natural language and programming exchanges to refine software development processes.
In this light, this review aims to examine post-LLM techniques, addressing ongoing challenges in science, engineering, and society, and illuminate future pathways to further advance AI applications.
Fig. 2 depicts the overall outline of this survey. In Section 2, we introduce the current challenges presented by AI in the areas of knowledge empowerment, collaboration, and co-evolution. Section 3 gives an overview of knowledge-empowered LLMs. In Section 4, we present cutting-edge methods in model collaboration. Section 5 delves into recent techniques for model co-evolution. In Section 6, we explore how knowledge-empowered, collaborative, and co-evolved AI advances science, engineering, and society. Section 7 showcases potential future advancements and applications of knowledge-empowered, collaborative, and co-evolved AI. Finally, in Section 8, we summarize the key insights from this survey.
2. Challenges
In this section, we identify four major types of challenges for current AI models: task heterogeneity, model heterogeneity, data heterogeneity, and security and privacy concerns.
2.1. Task heterogeneity
Existing AI models are primarily developed for distinct tasks, scenarios, and applications with differing or even conflicting optimization objectives and evaluation metrics, which results in theoretical and practical challenges regarding collaboration and co-evolution among these task-specific models. We identify three types of research challenges in task heterogeneity. First, the disparities in training objectives may hinder the model’s evolutionary process, which particularly occurs in the model training phase. A notable example is optimizing generative adversarial networks, in which a generator and a discriminator are jointly optimized in an adversarial manner, making it extremely difficult to reach an equilibrium. Thus, it is a challenging problem to balance the divergent objectives and stabilize the training dynamics. Second, the lack of shared knowledge can prevent collaboration and co-evolution. This is because, while models made for completely different tasks may develop unique expertise, such knowledge cannot be easily leveraged across tasks without a common framework. Third, it is difficult for models to reach consensus due to communication barriers between different models; thus, interpreting and acting on outputs from other models becomes challenging.
2.2. Model heterogeneity
Model heterogeneity mainly refers to drastic architecture discrepancies between different AI models that lead to crucial challenges hindering model synergism. Typical examples range from collaboration between models with different architectures and levels of complexity to models with divergent learning paradigms. First, differing input and output representations may make it difficult to align the features well. For example, two convolutional backbones may differ significantly in model depth and width (e.g., hybrid model collaboration), resulting in a varying number of neurons and feature maps and eventually leading to inflexibility for model collaboration. Besides, due to incompatibility in intermediate representations, two fundamentally different learning paradigms may pose a major challenge to model collaboration. An example is the question of how to enable collaboration between symbolic AI and connectionistic AI, which requires the translation of logical rules into numerical formats or vice versa. Therefore, effectively extracting, transferring, and aligning shared knowledge between heterogeneous models will promote model utilization.
2.3. Data heterogeneity
In real-world scenarios, data from different devices or sources are often not independent and identically distributed (non-IID), resulting in significant variations in data distributions. For example, data collected from different end users, functionally different sensors in embodied AI, different patients, or different enterprises may differ in features, labels, and domains, leading to phenomena including class imbalance, covariate shift, and concept drift that remarkably affect the generalization performance of model coordination. Moreover, data from multiple sources may be inconsistently labeled or may exhibit varying levels of annotation quality. For example, some data might have been mislabeled or contain noise, which is likely to introduce performance degradation during model collaboration or co-evolution. Data modality differences introduce additional challenges, such as inconsistent data representations (e.g., spatial images versus temporal audio) and imbalanced data modalities (e.g., missing or sparse modality). Data heterogeneity introduces particular challenges due to varying data distributions, modality-specific challenges, fusion difficulties, and training complexities. It is essential to effectively address these challenges in order to build collaborative and robust systems that can handle complex, real-world tasks involving diverse data sources.
2.4. Security and privacy
As protected by laws and regulations (e.g., General Data Protection Regulation (GDPR)), data security and privacy are critical concerns in model collaboration, especially in distributed and decentralized machine learning systems, where multiple entities (e.g., devices and organizations) contribute to training a global model without sharing raw data. Even though raw data is not directly shared, model updates through gradients or features can still inadvertently expose sensitive information about the underlying data. For example, certain patterns in gradients can be reverse-engineered to reconstruct the original data. Also, collaborated models are vulnerable to inference attacks that exploit the learned model or its outputs to deduce information about the training data, with typical threats such as model inversion attacks and membership inference attacks. Although mechanisms such as differential privacy offer a potential solution to guarantee privacy, the excessive noise injected by the framework can reduce the overall model performance. Thus, protecting against such privacy attacks while maintaining the utility of the model is a significant challenge in collaborative environments. Moreover, models are vulnerable to poisoning attacks, in which adversaries attempt to corrupt the global model by injecting malicious updates. For example, adversaries can send malicious model updates that deliberately degrade the performance of the global model, often targeting specific sub-tasks or objectives. While robust aggregation mechanisms (e.g., Byzantine-resilient algorithms) can detect malicious clients, designing such a mechanism is complex, especially in environments where clients’ contributions are diverse and their trustworthiness cannot be assumed. As collaborative AI systems continue to grow, security and privacy concerns will remain central to the development of secure and privacy-preserving model-collaboration paradigms.
3. Knowledge-empowered LLMs
Their reliance on unsupervised training on large-scale corpora often leaves LLMs devoid of practical, real-world knowledge, limiting their current applicability in knowledge-intensive tasks. To bridge this gap, researchers have explored various strategies to empower LLMs with external knowledge sources. These approaches involve integrating knowledge during pre-training through specialized training objectives, augmenting model inputs with relevant information, and leveraging knowledge during instruction-tuning and inference. This section delves into these methodologies, outlining how they enhance LLMs’ capabilities by making them more knowledgeable and effective.
3.1. Knowledge-empowered LLM pre-training
Existing LLMs mostly rely on unsupervised training on a large-scale corpus and thus lack practical real-world knowledge. Previous works that integrate knowledge into LLMs can be categorized into two parts: ① integrating knowledge into training objectives, and ② knowledge-empowered instruction tuning.
3.1.1. Integrating knowledge into training objectives
Zhou et al.
[13] constructed a minimal, high-quality dataset and fine-tuning protocol to align pre-trained models with user interaction style, leveraging stylistically coherent yet topically diverse prompts and responses. Akyürek et al.
[14] employed the technique of integrating domain knowledge into training objectives by comparing and contextualizing two types of training data attribution (TDA) methods (gradient-based and embedding-based methods), which analyze model behavior at different stages of the training process to assess influence on predictions, alongside a baseline information retrieval method (BM25) that uses lexical similarity for fact tracing without model dependency. The research efforts in this category focus on designing knowledge-aware training objectives. For example, Shen et al.
[15] leveraged a knowledge graph (KG) structure to assign a masking probability. Entities that can be reached within a certain number of hops are considered important and are given a higher masking probability during pre-training. Zhang et al.
[16] further controlled the balance between token-level and entity-level training losses. Tian et al.
[17] followed a similar fusion approach to inject sentiment knowledge during LLM pre-training by determining words with positive and negative sentiment and assigning a higher masking probability to those identified as sentiment words. It feeds both sentences and corresponding entities into LLMs and trains them to predict alignment links between textual tokens and entities in KGs. Gao
[18] enhanced input tokens by incorporating entity embeddings and includes an entity prediction pre-training task. Wang et al.
[19] directly employed both a KG embedding training objective and a masked token pre-training objective into a shared transformer-based encoder. The deterministic LLM
[20] focused on pre-training language models to capture deterministic factual knowledge. It only masks the span that has a deterministic entity as the question and introduces additional clue contrast learning and clue classification objective. Xiong et al.
[21] first replaced entities in the text with other same-type entities and then feeds them into LLMs and pre-trains the model to distinguish whether the entities have been replaced or not.
3.1.2. Knowledge-empowered instruction tuning
Ji et al.
[22] demonstrated elasticity within LLMs, where the model’s alignment can be inversely adjusted through a compression-based protocol, revealing a resistance to alignment that favors the retention of broader pre-training distributions over fine-tuning adjustments. Zhang et al.
[23] shed some light on various kinds of instruction-tuning techniques. Gekhman et al.
[24] examined how the inclusion of “Unknown” examples within the fine-tuning dataset affects a model’s performance; the researchers find that an increased proportion of these examples not only risks overfitting but also hampers the model’s generalization, while “MaybeKnown” examples prove most beneficial for balanced performance across knowledge types. KG instruction tuning utilizes facts and the structure of KGs to create instruction-tuning datasets. LLMs finetuned on these datasets can extract both factual and structural knowledge from KGs, enhancing their reasoning ability. Wang et al.
[25] first designed several prompt templates to transfer structural graphs into natural language text and then proposes two self-supervised tasks to finetune LLMs. OntoPrompt
[26] proposed an ontology-enhanced prompt tuning that can place knowledge of entities into the context of LLMs and fine-tune them on several downstream tasks. Luo et al.
[27] fine-tuned LLMs on a KG structure to generate logical queries. Luo et al.
[28] presented a planning-retrieval-reasoning framework, fine-tunes on a KG structure to generate relation paths, and uses these paths to retrieve valid reasoning paths from the KGs for LLMs to conduct faithful reasoning and generate interpretable results.
3.2. Knowledge-empowered LLM inference
While the methods described in Section 3.1 could effectively fuse knowledge into LLMs, they are limited because real-world knowledge changes and they do not permit updates without retraining. Thus, recent research has focused on keeping knowledge and text spaces separate during inference, particularly for question answering (QA) tasks.
3.2.1. Retrieval-augmented knowledge fusion
Ovadia et al.
[29] evaluated the leveraging of an auxiliary knowledge base to retrieve relevant information for a given query, combining it with pre-existing model context to enhance a language model’s responses to knowledge-intensive tasks. This approach outperforms traditional fine-tuning by offering dynamic, contextually enriched knowledge integration. Retrieval augmented generation (RAG) combines nonparametric and parametric modules. Yang et al.
[30] involved an iterative multi-stage process, IM-RAG, in which a reasoner, retriever, refiner, and progress tracker collaborate through reinforcement learning and supervised fine-tuning; this combination enables an LLM to construct, refine, and finalize answers by progressively retrieving, refining, and synthesizing relevant information in a structured, retrieval-augmented reasoning loop. Given input text, we can retrieve relevant documents via maximum inner product search
[31], treat them as hidden variables, and feed them into the output generator as additional context. The model presented in Lewis et al.
[32] outperforms other baseline models in open-domain QA and can generate more specific and factual text. Story-fragments improves the architecture by adding a module to determine salient knowledge entities. Wu et al.
[33] improved efficiency by encoding external knowledge into memory and using fast search. Guu et al.
[34] proposed a knowledge retriever for the pre-training stage to improve open-domain QA. Logan et al.
[35] selected facts from a KG using the current context to generate sentences. Zhang et al.
[36] leveraged multimodal large language models (MLLMs) in conjunction with a neural combinatorial optimization solver to address the combinatorial explosion challenge of ancient manuscript restoration, implementing a two-stage pipeline in which MLLMs perform initial fragment matching while a neural solver optimizes candidate fragment selection, particularly in open-world settings with outliers. Sun et al.
[37] represented a KG triple as a sequence of tokens and concatenates them with the sentences, randomly masking either the relation token in the triple or tokens in the sentences. However, this approach may cause knowledge noise. Sun et al.
[38] used unified word-knowledge graph to further reduce knowledge noise. Zhang et al.
[39] intended to improve LLMs’ representations toward those entities by determining long-tail entities and replacing them with pseudo token embedding. Yu et al.
[40] leveraged external dictionaries to improve the representation quality of rare words by appending their definitions from the dictionary at the end of input text and training the language model to align rare word representations and discriminate whether the input text and definition are correctly mapped.
3.2.2. Knowledge-empowered prompting
Knowledge-empowered prompting designs a prompt to convert structured knowledge into text sequences for LLMs during inference. Li et al.
[41] used a predefined template to convert KG triples into short sentences. Luo et al.
[42] sampled relation paths from KGs, verbalizes them, and feeds them into LLMs to generate logical rules. Chain-of-knowledge (CoK)
[43] uses a sequence of triples for prompting to elicit LLMs’ reasoning ability. KG prompting is a simple way to combine LLMs and KGs without retraining, but the prompt is usually manually designed and thus requires a great deal of effort.
4. Model collaboration
Research on collaboration between AI models is an increasingly prominent field, centered on the cooperation of models with different sizes, structures, or functions. The goal is to leverage the models’ respective strengths to achieve performance or efficiency superior to that of a single model. This collaborative approach not only focuses on the complementarity between large and small models but also involves the integration of different types of models, such as deep learning models and traditional machine learning models, to harness the powerful capabilities of large models alongside the efficiency and interpretability of small models. With the rapid advancement of deep learning technology, large models have gained significant attention due to their outstanding performance; however, they often require substantial computational resources, which limits their application in resource-constrained environments and increases their opacity, making their decision-making process difficult to understand. Therefore, exploring the collaborative modes of models to enhance performance and usability has become a research hotspot.
Model collaboration can be categorized into two types based on the collaboration strategy. The first type is model merging, exemplified by the mixture of experts (MoEs)
[44], which combines several relatively small expert models to achieve or even surpass the performance of a large model. The second type involves the collaboration of different functional models, such as using a large model agent to coordinate specialized small models to complete specific tasks
[45].
4.1. Collaboration based on model merging
In the field of machine learning, a single model often struggles to achieve optimal performance. Model merging is an effective strategy to improve prediction accuracy and robustness; it enhances performance by combining the prediction results, structures, or parameters of multiple models to mitigate the shortcomings of individual models.
4.1.1. Model ensembling
One type of model merging, known as model ensembling, is performed by aggregating the predictions of individual models
[46]. The most straightforward model ensemble approach is the simple averaging method, where the final prediction is obtained by averaging the prediction results of all models. However, this method is only reasonable when the performance of each classifier is similar. If one classifier performs significantly worse than the others, the final prediction may not be as good as that of the best classifier in the group. A better approach to ensemble classifiers is to use weighted averaging, where the weights are learned from the validation set. For classification problems, voting
[47] is a commonly used model ensemble strategy. The final prediction is selected by having multiple models vote on the predicted classes, with the class receiving the most votes being chosen. Voting can involve either running different models or running a single model multiple times. Stacking
[48] is a more complex model ensemble method that uses the prediction results of multiple different models as inputs to train a new model, which then produces the final prediction. This approach effectively leverages the predictive capabilities of different models.
4.1.2. Model fusion
MoE
[49] is a sparsely gated deep learning model consisting of two key components: a gate network (GateNet) and expert networks (Experts). The gate network is responsible for dynamically deciding which expert model should be activated based on the input data’s characteristics in order to generate the best prediction. Experts are a group of independent models, each specialized in handling a specific sub-task. Through the gate network, the input data are allocated to the most suitable expert model for processing, and the outputs of different models are weighted and fused to obtain the final prediction result. For example, Mixtral 8×7B
[50], a modification of the Mistral 7B model, is a sparse MoE models that includes eight experts per layer. This results in a 47B parameter model that—against several benchmarks—can rival or outperform larger models such as Llama2 70B
[51].
Model collaborative computing based on model merging can integrate the strengths and expertise of various models and reduce the bias and errors that may arise from a single model, thereby improving the accuracy and reliability of decisions. Moreover, model fusion can enhance models’ interpretability and transparency. For example, in an MoE systems, each expert model’s role in and contribution to specific tasks can be clearly identified, providing clearer explanations for the final decision.
4.2. Collaboration based on different functional models
Another typical model collaboration approach is an intelligent agent system composed of multiple functional models. While large models provide broad knowledge and advanced reasoning capabilities, such as mathematical reasoning, programming, and task planning
[52], they may be less accurate in handling domain-specific tasks compared with smaller, specialized models. Thus, an effective mechanism is needed to integrate the general capabilities of large models with the specialized expertise of small models, ensuring that the agent system can flexibly handle different tasks and environments.
Collaboration based on different functional models can be divided into two types. In one type of collaboration, LLMs act as intelligent agents, serving as task managers that call upon various specialized models to accomplish different tasks. In the other type, LLMs work together with other specialized models, such as diffusion models, to complete a specific task. With the support of LLMs, the task can be executed more effectively.
4.2.1. LLM agent as task manager
Researchers have begun building intelligent agent systems based on the collaboration between LLMs and small specific models
[53]. Specifically, they use LLMs as the brains or controllers of these agents, extending the perception and action space by scheduling SMs. Early works were aimed at enhancing the tool-learning capabilities of LLMs. For example, both tool augmented language models (TALMs)
[54] and Toolformer
[55] fine-tune language models to learn to use external tool application programming interface (API). HuggingGPT
[56] further utilizes LLMs as the brain and SMs as tools, solving complex problems through collaboration between LLMs and SMs.
Chain of thought (CoT)
[45], tree of thoughts (ToT)
[57], and graph of thoughts (GoT)
[58] techniques enable LLM-based agents to demonstrate reasoning and planning capabilities comparable to those of symbolic and reinforcement learning-based agents
[59]. These systems can also learn from feedback and execute new actions, gaining the ability to interact with their environment
[60]. LLM-based agents can interact seamlessly, forming multi-agent systems that promote collaboration and competition between multiple agents
[61].
4.2.2. Collaboration of functional models for one task
LLMs can help specialized models perform specific tasks more effectively. For example, in image-generation tasks, while Stable Diffusion
[62] can generate high-quality images, it struggles to control the output strictly based on the prompts. LLMs can better understand prompts and guide the behavior of the generation model, leading to improved controllability in the image-generation process. Wu et al.
[63] proposed a framework that generates an image from the input prompt, assesses its alignment with the prompt, and performs self-corrections on the inaccuracies in the generated image. Steered by an LLM controller, this framework turns text-to-image generation into an iterative closed-loop process, ensuring correctness in the resulting image. Wang et al.
[64] proposed a training-free method for text-to-image generation and editing. It utilizes the reasoning ability of MLLMs to improve compositionality in diffusion models. This method breaks down complex image generation into simpler tasks for different sub-regions using regional diffusion. It integrates text-guided generation and editing in a closed-loop system, improving its generalization capabilities.
Some specialized SMs can also enhance the capabilities of MLLMs. For example, Sachin et al.
[65] used visual models such as semantic segmentation and instance segmentation to improve MLLMs’ performance in object-counting tasks.
5. Model co-evolution
Model co-evolution refers to a dynamic process in which multiple models evolve together to solve complex, heterogeneous tasks and share insights across diverse environments. In this context, models not only adapt and improve based on their individual learning paths but also influence each other’s development, ensuring efficient cross-task generalization, parameter sharing, and knowledge transfer. This process becomes essential in scenarios characterized by varied architectures, task requirements, or data distributions, as co-evolution enables models to collaboratively address the heterogeneity by balancing specialization and generalization. The resulting co-adaptation yields models that are more robust, efficient, and capable of solving a wider array of tasks, especially under the constraints of resource limitations and privacy concerns, which are typical of decentralized and federated environments.
This section is organized into three subsections that explore the co-evolution of models under different types of heterogeneity—namely, model, task, and data heterogeneity. Section 5.1 focuses on co-evolution under model heterogeneity, discussing techniques such as parameter sharing, dual knowledge distillation (KD), and hypernetwork-based parameter projection. Section 5.2 addresses co-evolution under task heterogeneity, examining methods such as dual learning, adversarial learning, and model merging. Lastly, Section 5.3 explores co-evolution under data heterogeneity, with a focus on federated learning and out-of-distribution (OOD) KD. Each section examines specialized strategies for optimizing model collaboration and efficiency in diverse environments.
5.1. Co-evolution under model heterogeneity
5.1.1. Parameter sharing under sub-model homogeneity
In the context of parameter sharing under sub-model homogeneity, recent works have significantly advanced the balance between model-specific learning and shared parameter efficiency. Haller et al.
[66] introduced “sparse sharing,” which utilizes overlapping subnetworks within a larger model, improving parameter efficiency through iterative magnitude pruning (IMP), based on the Lottery Ticket Hypothesis. Ding et al.
[67] extended this idea by proposing the multiple-level sparse sharing model (MSSM), which enables more granular control through task-specific and shared features at different network levels. Wang et al.
[68] introduced multitask prompt tuning (MPT), which distills shared knowledge into transferable prompts for efficient adaptation across LLMs. Zhang et al.
[69] employed a shared encoder across tasks in their contrastive learning model for blind image-quality assessment, dynamically adjusting shared parameters to boost performance. In a different domain, Chen et al.
[70] introduced group detection transformer, which applies a group-wise parameter-sharing mechanism across object queries, significantly improving the efficiency of detection transformers. Ghosh et al.
[71] proposed iterative federated clustering algorithm (IFCA), in which shared representation layers are employed across user clusters in federated learning, enabling parameter sharing across distributed environments while preserving cluster-specific learning. Lastly, Ye et al.
[72] presented OpenFedLLM, a federated learning framework for LLMs, in which parameter sharing across decentralized systems is achieved through federated instruction tuning and value alignment, allowing collaborative learning without exposing raw data. These works collectively highlight the power of parameter sharing to enhance model efficiency, reduce redundancy, and enable robust performance across heterogeneous task settings and domains.
5.1.2. Dual KD
Dual KD has emerged as a pivotal strategy under the paradigm of model co-evolution, particularly addressing the challenges of model heterogeneity. In this approach, models simultaneously assume the dual roles of both student and teacher, fostering bidirectional knowledge transfer and enhancing learning efficacy across diverse architectures. Unlike traditional unidirectional distillation, dual KD leverages mutual learning, as demonstrated in frameworks such as mutual contrastive learning (MCL)
[73], adaptive cross-architecture mutual knowledge distillation (ACMKD)
[74], and all-in-one knowledge distillation (AIO-KD)
[75]. For example, AIO-KD enables the simultaneous optimization of multiple student models through dynamic gradient detaching and mutual learning strategies, optimizing knowledge exchange without sacrificing the teacher model’s performance. Similarly, in the context of semi-supervised learning, multistage collaborative knowledge distillation (MCKD)
[76] refines pseudo labels iteratively across multiple student models, preventing overfitting and fostering generalization in sequence-generation tasks. This duality is also critical in tasks such as text-to-image synthesis, in which an adaptive teacher–student collaboration
[77] refines student outputs through iterative guidance by means of an oracle mechanism. Additionally, frameworks such as Selective-FD
[78] ensure that knowledge sharing is efficient and accurate, selectively filtering ambiguous or OOD predictions in federated learning environments. Collectively, these methods demonstrate the power of dual KD for addressing both architectural and domain-specific discrepancies and thus enhancing model performance and generalization through iterative and collaborative learning processes.
5.1.3. Hypernetwork-based parameter projection
The concept of hypernetwork-based parameter projection has emerged as a robust strategy for addressing model heterogeneity in co-evolutionary systems, particularly when dealing with large-scale models such as pre-trained language models. Hypernetworks, originally introduced to generate weights for target networks, can facilitate the transfer of information across heterogeneous models by learning a mapping from a shared latent space to the diverse parameter spaces of different models. This projection technique is especially beneficial in scenarios where models have been fine-tuned on distinct tasks or domains and a unified mechanism is required to harmonize the varied representations. By utilizing hypernetworks, it becomes feasible to dynamically generate task-specific parameters for a target model, effectively adapting the model to different inputs or tasks without the need for exhaustive retraining. In the context of knowledge fusion, hypernetworks allow for the seamless integration of heterogeneous model outputs, as demonstrated by approaches such as knowledge fusion for large language model (FUSELLM)
[79] and mixture-of-adaptations (AdaMix)
[80], in which the alignment of tokenizations or adaptation modules is a critical factor. This method aligns well with other model averaging techniques, such as model soups
[81] and ensemble strategies
[82], by enhancing the parameter space exploration while preserving the unique characteristics of each model through modularity. Additionally, methods such as regression mean (RegMean)
[83] and ranking-based merging (RankMean)
[84], which focus on parameter fusion without requiring downstream data, highlight the flexibility of hypernetwork-based projection in optimizing the fusion of diverse model parameters. By effectively navigating the parameter projection space, hypernetworks can create a more coherent and efficient model co-evolution process in heterogeneous environments.
5.2. Co-evolution under task heterogeneity
5.2.1. Dual learning
Dual learning has emerged as a powerful paradigm for tackling task heterogeneity in model co-evolution by leveraging the intrinsic duality between paired tasks to enhance learning efficiency and performance across diverse domains. For unbiased learning to rank (ULTR), Yu et al.
[85] proposed the contextual dual learning algorithm with listwise distillation (CDLA-LD), which combines a listwise-input ranking model employing self-attention to capture local context with a pointwise-input model for distilling relevance judgments, outperforming existing methods on the Baidu-ULTR dataset by mitigating position and contextual biases. For constrained optimization, Park and Van Hentenryck
[86] introduced self-supervised primal-dual learning (PDL), a method that jointly trains primal and dual networks without pre-solved instances by mimicking the augmented Lagrangian method to balance optimality and feasibility, achieving negligible constraint violations and minor optimality gaps. Fei et al.
[87] enhanced dual learning by aligning structural information between tasks, introducing syntactic structure co-echoing and cross-reconstruction in text-to-text generation, and using syntactic–semantic alignment in text-to-non-text scenarios, thus significantly improving performance across tasks such as machine translation and image captioning. For video captioning, Ji et al.
[88] developed an attention-based dual learning (ADL) approach that establishes a bidirectional flow between videos and captions using a multi-head attention mechanism to focus on effective information, resulting in more accurate and coherent captions. Li et al.
[89] presented a multi-pass dual learning (MPDL) framework for stylized dialogue generation, leveraging mappings among the context and responses of different styles and incorporating discriminators to ensure stylistic consistency, and achieved state-of-the-art results. Additionally, frameworks such as dual learning enhanced auto-reflective translation (DUAL-REFLECT)
[90] enhance LLMs for reflective translation through dual learning feedback mechanisms, while the dual learning with dynamic KD (DL-DKD) framework
[91] integrates contrastive language–image pre-training (CLIP) models into partially relevant video-retrieval tasks by employing a teacher–student network with dynamic KD, further demonstrating the merits of dual learning for addressing task heterogeneity under model collaboration.
5.2.2. Adversarial learning
Adversarial learning is pivotal in model co-evolution under task heterogeneity. It frames objectives as adversarial games between competing models or components, thereby enhancing robustness, alignment, and performance across diverse tasks. An LLM-enhanced adversarial editing system for lexical simplification employs confusion and invariance losses to predict lexical edits, effectively distinguishing complex words from simple ones while preserving semantics
[92]. Latent adversarial training removes undesirable behaviors in LLMs by using targeted adversaries to elicit and mitigate harmful outputs
[93]. In AI-text detection, adversarial learning between a paraphraser and a detector enhances robustness against paraphrasing attacks
[94]. Worst-class adversarial training addresses class imbalance in adversarial robustness by focusing on improving the worst-performing classes using no-regret dynamics
[95]. In weakly supervised semantic segmentation, adversarial learning between a classifier and a reconstructor improves segmentation precision by encouraging the classifier to produce more accurate class activation maps
[96]. By fostering adversarial interactions, these methods effectively tackle the challenges of task heterogeneity in collaborative models.
5.2.3. Model merging
Model merging under task heterogeneity is a technique for creating a unified model that can handle heterogeneous tasks while minimizing interference. Basic methods such as parameter averaging
[97], despite being straightforward, often result in suboptimal performance due to task conflicts. To address this, weighted-based approaches, such as spherical linear interpolation
[98], optimize merging coefficients by evaluating the importance of each model or task vector, with some techniques extending this to layer-wise or parameter-specific weighting using methods such as layer-wise adaptive model merging
[99] or merging models with fisher-weighted averaging
[100]. Subspace-based methods, including trim, elect sign and merge (TIES-MERGING)
[101] and drop and rescale (DARE)
[102], focus on pruning unimportant parameters and leveraging the over-parameterized nature of neural networks to merge sparse subspaces, thereby reducing task interference. Routing-based strategies dynamically adjust merging during inference, thus adapting to input-specific variations. Examples include twin-merging
[103] and weight-ensembling MoE
[104], which use routing networks to guide the merging process. Finally, post-calibration techniques, such as representation surgery
[105], address representation bias in merged models by aligning the representations of merged and independent models to enhance performance. Together, these methods provide a sophisticated toolkit for merging models in multi-task learning environments in order to optimize performance while addressing the complexities introduced by task heterogeneity.
5.3. Co-evolution under data heterogeneity
5.3.1. Federated learning
Federated learning addresses the challenges of data heterogeneity in model co-evolution by leveraging the powerful multimodal capabilities of LLMs and the low computational requirements and swift response times of SMs. The essence of federated learning lies in enabling LLMs to enhance the performance of SMs in domain-specific tasks while rigorously protecting data privacy. To augment LLMs’ performance through training, OpenFedLLM
[72] involves a comprehensive pipeline that includes federated instruction tuning (FedIT) and federated value alignment (FedVA), which optimize instruction adherence and model alignment while safeguarding data privacy. Zhang et al.
[106] introduced multimodal large language model assisted federated learning (MLLM-FL) to bolster federated learning performance on heterogeneous and long-tailed data distributions through global multimodal pretraining, federated finetuning, and global alignment, effectively mitigating data heterogeneity while minimizing privacy risks and computational burdens on client devices. Bai et al.
[107] developed a federated learning scheme for fine-tuning LLMs that dynamically adjusts the low-rank adaptation (LoRA) ranks based on individual client resources, thus enhancing the effective use of diverse client capabilities and improving generalization across heterogeneous tasks and resources. For collaborative model performance enhancement, FedMKT
[108] involves a framework for federated selective mutual knowledge transfer and token alignment using a minimum edit distance, which enhances the performance of both LLMs and SMs. To improve the effectiveness of SMs through LLMs, Li et al.
[109] extracted generalized and domain-specific knowledge from LLMs via synthetic data generation and then transfers this knowledge to local SMs while preserving privacy. Fan et al.
[110] developed PDSS, a framework that employs the step-by-step distillation of LLMs to augment the capabilities of SMs, utilizing advanced strategies for prompt and rationale encoding to maintain information integrity during the perturbation and subsequent distillation of domain-specific knowledge.
5.3.2. OOD KD
KD involves training computationally efficient specialized models as student models to replicate the performance of more powerful LLMs as teacher models. This process reduces resource demands without significantly impacting performance, thereby facilitating broader deployment of LLMs. Traditional distillation techniques using synthetic or data-free approaches often suffer performance declines in OOD scenarios. To address these challenges, Gholami et al.
[111] used a task-agnostic framework for OOD KD, which iteratively leverages feedback from LLMs to refine the specialized models, thus enhancing their generalizability. Li et al.
[112] targeted OOD distillation challenges in vision language models (VLMs) by improving prompt coherence and enriching language representations in teacher models in order to better align vision-language tasks between teacher and student models. Agarwal et al.
[113] developed generalized knowledge distillation (GKD), which uses reinforcement-learning-based fine-tuning to align the training and inference distributions, informed by the teacher model’s feedback on the student models’ outputs. Chen et al.
[114] used a perturbation distillation approach that integrates modifications in score, class, and instance levels to distill knowledge to SMs, specifically addressing domain generalization challenges.
6. AI for science, engineering, and society
The post-LLM era marks a significant shift in the role of AI across multiple domains—particularly in science, engineering, and society. These domains share common challenges and unique characteristics that necessitate the tailored application of AI methodologies.
Fig. 3 depicts the outline of this section, which elaborates on hypothesis development, problem formulation, problem-solving, and the interpretability of AI applications in science, engineering, and society, exploring how knowledge, collaboration, and co-evolution underpin these advances.
6.1. Hypothesis development
The development of hypotheses is a foundational challenge shared across science, engineering, and society domains. Hypotheses can take on various forms, depending on the domain. In science, hypotheses often serve as theoretical propositions aimed at explaining natural phenomena and are typically crafted to be empirically tested
[29]. For example, hypotheses in scientific research might predict the effects of a specific variable on a biological process or forecast the outcome of a chemical reaction under certain conditions. In engineering, hypotheses often manifest as objectives designed to achieve specific goals or meet operational constraints
[115]. For example, the operation of complex systems such as power grids, space stations, or autonomous vehicles often requires setting hypotheses related to efficiency goals and safety constraints. These hypotheses are more practical and serve as a basis for system design and decision-making, helping engineers determine the optimal settings and controls for achieving the desired performance under the given limitations. In societal contexts, hypotheses are often related to behavioral or policy outcomes
[22]. For example, an AI model might hypothesize that specific interventions (e.g., public awareness campaigns or infrastructure adjustments) could lead to better outcomes in areas such as healthcare accessibility or traffic management. These hypotheses are typically tested in simulations or pilot programs prior to broader implementation. Despite the diversity in hypothesis types across these domains, there are shared categories of hypotheses, such as predictions about system behavior under various scenarios or predictions that serve as the basis for simulation models in order to validate different configurations before real-world implementation. These shared and unique hypotheses guide subsequent formulation and problem-solving processes in all three domains.
In the post-LLM era, knowledge-empowered AI models are instrumental in crafting these hypotheses, as they incorporate domain-specific expertise and thus enhance both accuracy and reliability. For example, advanced meteorological AI models such as Pangu
[116], FengWu
[117], and FuXi
[118] could be integrated with domain-specific knowledge to improve renewable energy (e.g., wind and solar) forecasting, which is crucial for the integration of renewable energy sources into power systems. Collaboration among multiple smaller AI models also plays a critical role in validating hypotheses by cross-verifying outcomes from diverse perspectives, thereby enhancing the robustness of the hypotheses. This collaborative approach helps mitigate biases and provides a more holistic understanding of the problem space. Moreover, co-evolution fosters the iterative refinement of hypotheses. Through ongoing learning from both successes and failures, models can evolve to develop more nuanced and effective hypotheses. In this way, the post-LLM advancements contribute significantly to transforming hypothesis development, enabling deeper theoretical reasoning and more extensive data-driven exploration across science, engineering, and societal applications. The iterative process of co-evolution leads to hypotheses that are more adaptive to changing environments, better aligned with domain-specific challenges, and ultimately more capable of driving meaningful advancements in each respective field.
6.2. Problem formulation
The application of large models for modeling the real world is currently a focal point of science, engineering, and society research. There are three types of modeling in this research domain: the modeling of objective entities, the modeling of objective environments, and the modeling of objective laws.
For entity modeling, multi-agent systems are introduced to effectively simulate personalized roles such as students and teachers in educational scenarios
[119]. The strategic integration of simulated rules based on large model agents has propelled the educational field forward by uncovering the principles of education and teaching.
For environment modeling, the key challenge is how to realize the organized interaction between multiple agents. A proposal has emerged for a virtual classroom platform that leverages the power of multi-agent systems. The virtual platform applies large model agents to simulate multiple students and explore the cultivation of their academic abilities. Yue et al.
[119] integrated domain knowledge in the teaching process into the classroom simulation process. Utilizing well-crafted role simulators, the exploration of classroom teaching processes is meticulously conducted through the orchestration of meaningful role interactions.
Exploring objective laws is an important goal for the development of AI
[120]. To this end, PINNs
[11] have been proposed that utilize physical laws to improve the model’s predictive accuracy and generalization ability for the physical world. Compared with traditional neural networks, PINNs can achieve predictive outcomes that adhere to physical laws using a more modest amount of training data. Moreover, they exhibit enhanced resilience to noise and other interference. PINNs have been widely applied in many fields of physical research, such as fluid mechanics and heat conduction research
[12]. In the study of heat conduction, they can help analyze physical-world objective phenomena such as heat diffusion
[121]. Although research on PINNs has made great progress, problems such as slow training and difficulty in convergence still remain. Furthermore, PINNs perform poorly when processing high-dimensional data and solving high-dimensional equations.
6.3. Problem-solving
The application of AI has undergone extensive development for problem-solving in the domains of science, engineering, and society
[122]. When symbolicism prevailed in the development of AI, many studies
[123] designed various logical automatic reasoners using first-order logic and higher-order logic for scientific research, such as automatic mathematical provers
[124] and automatic physical reasoners
[125]. However, the amount of knowledge (i.e., logical rules) stored in these manually designed reasoners is often limited, and they may perform unsatisfactorily on more complex science problems. With the rise of deep learning, researchers
[123] turned their attention to large-scale neural networks with greater knowledge retention and utilization capabilities. Such studies are roughly divided into two categories based on the function of neural networks. ① One way is to design deep learning models as retrievers
[122]. The deep learning models are responsible for retrieving the knowledge needed for each reasoning step from knowledge databases, thus assisting in the step-by-step solution of a science problem. ② Another way is to regard deep learning models as pure memorizers
[126]. During the training process, deep learning models fully memorize knowledge. In the subsequent inference process, the deep learning models directly produce the sufficient and complete solution without needing to retrieve knowledge databases.
LLMs designed for real-world problem-solving, such as the mathematical model DeepSeek_prover_v1.5
[126], take the ability to induce and store domain knowledge of neural networks to the extreme. The collaboration of multiple agents has also been gradually applied to the field of science research with the development of large models. For example, similar to the separation of computation and verification in the real world, separate mathematical problem-solving agents and mathematical conclusion verification agents are set up
[127]. The effective collaboration of these two types of agents has achieved more accurate solutions to mathematical problems.
6.4. Interpretability
In AI-driven research, aside from reaching conclusions, explaining the reasoning process is an important issue. To this end, a meta-programming framework, MetaGPT
[128], has been proposed, which integrates SOPs into the workflows of multi-agent systems. This framework is designed to enhance task decomposition and coordination, which are critical for managing complexity in software engineering projects. By encoding SOPs into prompt sequences, MetaGPT allows agents to operate with human-like domain expertise, verifying intermediate results and reducing errors. By mimicking the behavior of human experts, this approach of integrating SOPs increases the interpretability of model operations. Improving a model’s capability for human–computer interaction is another way to improve interpretability. Building on this concept, Qian et al.
[129] introduced as a framework for software development driven by multiple LLM-based agents. These agents collaborate through natural language and programming language exchanges, guided by chat chains and a dehallucination mechanism to improve software completeness, executability, and consistency.
Supporting hypotheses is another important goal in realizing model interpretability. Fang et al.
[130] presented KANO, a KG-enhanced molecular contrastive learning method that integrates chemical domain knowledge to provide interpretable molecular representations and superior prediction performance. KANO generates functional prompts that evoke downstream task-related knowledge, thus enhancing the interpretability of the model’s predictions. Li et al.
[131] introduced modSAR, an optimization-based quantitative structure–activity relationship (QSAR) modeling technique that offers transparent and explainable predictions by pinpointing key breakpoint features and crafting piecewise linear regression equations. The model’s ability to generate clear rules and assign Shapley additive explanations (SHAP) values to molecular fragments enhances its justification of its predictions, making it a valuable tool for drug discovery.
7. Future directions and emerging applications
7.1. Future lines of research
Beyond the topics covered above, several important and relevant areas warrant further exploration.
(1) Embodied AI. Embodied AI is a promising post-LLM direction. Collecting high-quality robotic datasets is labor intensive, and over-reliance on simulation data intensifies the sim-to-real gap, requiring collaborative dataset creation and improved simulators. Efficiently integrating human demonstration data and advancing cognition in complex environments are also critical in building adaptive models. Additionally, enabling causal reasoning, continual learning, and unified evaluation benchmarks will be essential for robust, scalable, and generalizable embodied AI systems.
(2) Brain-like AI. AI systems and algorithms inspired by the structure and functions of the human brain seek to emulate the brain’s parallel processing, adaptability, and efficiency to enhance computational models. Interdisciplinary integration with neuroscience could yield AI models that closely mirror human cognitive functions by adopting insights into brain-based learning, memory, and decision-making processes. Advances in neuroscience could also inspire robust brain-like AI models capable of naturalistic emotional and contextual responses, enhancing the potential for human–computer empathy and adaptability. Moreover, significant opportunities lie in developing scalable, efficient, and responsible AI frameworks that can operate reliably in real-world applications, especially in resource-constrained or sensitive domains. By integrating insights from the structure and adaptability of neural circuits, researchers can enhance the resilience, efficiency, and transparency of brain-like AI models, ultimately moving closer to designing AI that is responsible, adaptable, and responsive to complex human-centered needs.
(3)
Non-transformer foundation models. Despite the prominence of transformer architectures in large foundational models, several alternative architectures show promise as potential replacements. Hyena
[132] introduces an efficient structure by integrating data-controlled gating with implicitly parametrized long convolutions, providing a subquadratic solution to large-scale sequence processing. Other models leverage state space models (SSMs)
[133] to achieve linear scaling and improved efficiency over traditional transformers. RetNet
[134], which replaces multi-head attention with a multi-scale retention mechanism, captures sequence information effectively while reducing memory usage and significantly accelerating training. Thus, these models can be seen as viable and efficient transformer alternatives.
(4)
LLM-involved model generation. Leveraging LLMs to generate small, task-specific models by summarizing user requirements and a few in-domain data into latent variables, which are then decoded to produce tailored AI models directly usable for prediction
[135], can be a promising post-LLM direction.
7.2. Emerging applications
In the post-LLM landscape, the next generation of AI, characterized by knowledge empowerment, model collaboration, and co-evolution, will definitely redefine the capabilities of AI and reshape our perceptions of these new AI systems. Its continually evolving nature will bring new possibilities to our real-world society, meeting the highly complex demands of more specialized, adaptative, and human-aligned applications.
The characteristic of knowledge empowerment suggests that the post-LLM AI systems will increasingly emphasize the fusion of more specialized, factual, and structured information, significantly enhancing their expertise in specific fields with precision and logical reasoning and eventually surpassing today’s general-purpose AI models. In particular, with the integration of rich knowledge sources accumulated from science, engineering, and human society, next-generation AI is expected to delve deeper into exploring scientific laws, generating new hypotheses for scientific research and discoveries, and predicting the trajectory of events. For example, in the field of mathematics, the integration of AI will become more widespread, with large-scale neural networks being utilized to store mathematical knowledge and to conduct reasoning, and with the accuracy of problem-solving being improved through multi-agent collaboration. This will benefit other emerging AI interdisciplinary fields as well, such as online education, physics, and more. For example, the personalized application of AI will appear in the field of education, enriching teaching interactions and experiences by simulating and integrating interactive insights between students and teachers. In the realm of physics, technologies such as PINNs will leverage the laws of physics to enhance models’ predictive accuracy and generalization capabilities.
Model collaboration in the post-LLM era will involve deeper collaboration among both heterogeneous data and heterogeneous models. By fusing data from multiple sources (e.g., text, images, audio, and sensory signals), omni-modal AI systems will gain a more holistic understanding of the physical world, which will be particularly useful in fields such as autonomous vehicles, cross-media content generation, and digital twins. Collaboration between large (general-purpose) and small (specialized) models is another emerging trend in collaborative AI. Large models exhibit strong capabilities in generation, reasoning, and knowledge integration, while SMs display merits such as efficiency, low latency, security, and privacy. Achieving deeper collaboration between large and small models is a future development trend, involving not only effective data exchange but also knowledge sharing and task decomposition to address complex task scenarios, particularly in areas such as embodied intelligence. As application scenarios expand, personalized and adaptive collaborative systems incorporating large and small models will become a significant development direction in areas including intelligent assistants and service robots.
Model co-evolution, inspired by biological ecosystems, is another key ingredient in establishing the next generation of AI, in which AI models evolve collectively, learning and adapting in an interdependent process. This dynamic and continually evolving relationship between collective models is expected to remarkably advance the intelligence level and adaptation capability of AI systems, enhancing their robustness to dynamic and unknown physical-world changes. Merging diverse functional models may be a potential approach to co-evolving AI systems, as this can synthesize information from multiple models into a cohesive, unified framework. Nevertheless, several crucial challenges still remain to be resolved, including the lack of a deeper theoretical understanding of the merging mechanism—especially in scenarios involving models trained on different datasets or for different tasks—and the high computational and memory costs of merging schemes, among other issues. Co-evolved AI models have broad potential applications, such as autonomous driving, mining robotics, and industrial manufacturing.
Knowledge-empowered, collaborative, and co-evolving AI will likely bring AI systems to a new level with higher intelligence, resilience, and autonomy, expanding AI’s capability to handle complex real-world applications such as scientific discoveries, engineering design, personalized education, manufacturing, and more. Meanwhile, the increasing autonomy and interconnectivity of AI models may also present challenges in terms of safety and societal impacts, requiring mechanisms to monitor and control AI systems in order to prevent unforeseen behaviors.
8. Conclusions
In this survey, we explored the evolving landscape of AI beyond LLMs, with a focus on the paradigms of knowledge-empowered AI, model collaboration, and the co-evolution of AI systems. While LLMs have significantly advanced AI capabilities, they present inherent challenges in scalability and adaptability. To address these limitations, we discussed a range of post-LLM techniques and applications aimed at building more robust, scalable, and adaptable AI models. This survey sheds light on potential roadmap points in the post-LLM era for researchers and practitioners. The ongoing evolution of AI requires continued innovation and interdisciplinary collaboration to build systems that are not only powerful but also adaptable and aligned with human values.
Acknowledgments
This work was supported in part by National Natural Science Foundation of China (62441605).
Compliance with ethics guidelines
Fei Wu, Tao Shen, Thomas Bäck, Jingyuan Chen, Gang Huang, Yaochu Jin, Kun Kuang, Mengze Li, Cewu Lu, Jiaxu Miao, Yongwei Wang, Ying Wei, Fan Wu, Junchi Yan, Hongxia Yang, Yi Yang, Shengyu Zhang, Zhou Zhao, Yueting Zhuang, and Yunhe Pan declare that they have no conflict of interest or financial conflicts to disclose.