Generative Semantic Communication: Architectures, Technologies, and Applications

Jinke Ren; Yaping Sun; Hongyang Du; Weiwen Yuan; Chongjie Wang; Xianda Wang; Yingbin Zhou; Ziwei Zhu; Fangxin Wang; Shuguang Cui

doi:10.1016/j.eng.2025.07.022

Engineering ›› 2026, Vol. 56 ›› Issue (1) :45 -61. DOI: 10.1016/j.eng.2025.07.022

Research

research-article

Generative Semantic Communication: Architectures, Technologies, and Applications

Jinke Ren ^a^,^b^,^c
, Yaping Sun ^d^,^a^,^*
, Hongyang Du ^e
, Weiwen Yuan ^a^,^b^,^c
, Chongjie Wang ^a^,^b^,^c
, Xianda Wang ^a^,^b^,^c
, Yingbin Zhou ^a^,^b^,^c
, Ziwei Zhu ^a^,^b^,^c
, Fangxin Wang ^b^,^a^,^c
, Shuguang Cui ^b^,^a^,^c^,^*

Author information +

History +

PDF (2162KB)

Abstract

Semantic communication (SemCom) has emerged as a transformative paradigm for future wireless networks, aiming to improve communication efficiency by transmitting only the semantic meaning (or its encoded version) of the source data rather than the complete set of bits (symbols). However, traditional deep-learning-based SemCom systems present challenges such as limited generalization, low robustness, and inadequate reasoning capabilities, primarily due to the inherently discriminative nature of deep neural networks. To address these limitations, generative artificial intelligence (GAI) is seen as a promising solution, offering notable advantages in learning complex data distributions, transforming data between high- and low-dimensional spaces, and generating high-quality content.

This paper explores the applications of GAI in SemCom and presents a comprehensive study. It begins by introducing three widely used SemCom systems enabled by classical GAI models: variational autoencoders, generative adversarial networks, and diffusion models. For each system, the fundamental concept of the GAI model, the corresponding SemCom architecture, and a literature review of recent developments are provided. Subsequently, a novel generative SemCom system is proposed, incorporating cutting-edge GAI technology—large language models (LLMs). This system features LLM-based artificial intelligence (AI) agents at both the transmitter and receiver, which act as “brains” to enable advanced information understanding and content regeneration capabilities, respectively. Unlike traditional systems that focus on bitstream recovery, this design allows the receiver to directly generate the desired content from the coded semantic information sent by the transmitter. As a result, the communication paradigm shifts from “information recovery” to “information regeneration,” marking a new era in generative SemCom. A case study on point-to-point video retrieval is presented to demonstrate the effectiveness of the proposed system, showing a 99.98% reduction in communication overhead and a 53% improvement in average retrieval accuracy compared to traditional communication systems. Furthermore, four typical application scenarios for generative SemCom are described, followed by a discussion of three open issues for future research. In summary, this paper provides a comprehensive set of guidelines for applying GAI in SemCom, laying the groundwork for the efficient deployment of generative SemCom in future wireless networks.

Graphical abstract

Keywords

Semantic communication / Generative artificial intelligence / Large language model / Variational autoencoder / Generative adversarial network / Diffusion model

Cite this article

Download citation ▾

Jinke Ren, Yaping Sun, Hongyang Du, Weiwen Yuan, Chongjie Wang, Xianda Wang, Yingbin Zhou, Ziwei Zhu, Fangxin Wang, Shuguang Cui. Generative Semantic Communication: Architectures, Technologies, and Applications. Engineering, 2026, 56(1): 45-61 DOI:10.1016/j.eng.2025.07.022

登录浏览全文

4963

注册一个新账户忘记密码

1. Introduction

The rapid advancements in artificial intelligence (AI) and big data technologies are driving unprecedented transformations in communication networks. Unlike the previous five generations of wireless networks, which focused on rate-centric metrics such as throughput, latency, spectral efficiency, and connection density, sixth-generation (6G) wireless networks will integrate advanced information processing technologies to redefine global information transmission, ushering in a new era of interconnectedness among humans, machines, and environments [1]. However, the classical Shannon-based communication paradigm emphasizes the accurate transmission of data at the bit (symbol) level, ensuring that the receiver can precisely reconstruct the original data, regardless of its meaning or utility [2]. This separation of data communication from data application often results in the transmission of redundant information, leading to inefficiencies in resource utilization. As a result, it falls short of meeting the stringent transmission requirements of future intelligent applications, such as the metaverse, autonomous driving, smart cities, and holographic communication.

Semantic communication (SemCom) has emerged as a promising technology poised to revolutionize the communication paradigm. First proposed by Warren Weaver in 1953, the concept of SemCom focuses on transmitting information at the semantic level rather than at the bit (symbol) level to improve communication efficiency [3]. In recent years, the academic community has advanced this idea by developing specialized designs for specific tasks, a field known as task-oriented communication [4,5]. The typical operation of a SemCom system involves four stages: First, the receiver sends task requirements to the transmitter via a backward channel. Then, the transmitter extracts task-relevant semantic information from the source data and encodes it into signals suited to the channel conditions. Next, the transmitter sends these encoded signals to the receiver through a forward channel. Finally, the receiver decodes the noisy received signals and reconstructs semantically equivalent information. By filtering out task-irrelevant information before transmission, SemCom can significantly reduce communication overhead [6].

Driven by groundbreaking advances in deep learning (DL) technologies, recent years have seen a surge in research on DL-based SemCom, covering key areas such as text transmission [7], image reconstruction [8], and speech recognition [9]. Despite these notable achievements, the deployment of such systems in real-world scenarios faces three major challenges: ① limited generalization capability: Existing approaches often rely on hand-engineered neural networks designed for specific data modalities and tasks. These “one-by-one” solutions require substantial human effort for implementation and validation, and tend to perform poorly in multimodal and multitask settings; ② low robustness: Current SemCom systems typically require end-to-end training under fixed network conditions, making them unable to adapt quickly to dynamic environments or respond effectively to unseen data distributions; and ③ insufficient reasoning ability: DL-based SemCom systems are constrained by the diversity of their training data and often lack sufficient background knowledge, which limits their capacity for contextual reasoning and semantic calibration at the receiver.

To address these challenges, generative artificial intelligence (GAI) has been identified as a promising technology that offers new opportunities for advancing SemCom. Broadly defined, GAI refers to a type of AI technology capable of generating original content, such as text, images, and videos, by learning intrinsic patterns and latent features from large-scale data. Its remarkable capabilities stem from the deep integration of data and algorithms. On one hand, through self-supervised learning mechanisms, GAI can leverage vast amounts of unlabeled data to acquire rich general knowledge, thereby demonstrating strong generalization across cross-modal and multitask scenarios. On the other hand, GAI’s robust network architectures effectively capture semantic richness for both information understanding and content generation, enabling adaptive solutions to complex tasks across diverse contexts. Given these two key advantages, there is a growing consensus supporting the application of GAI in SemCom [[10], [11], [12], [13], [14]]. Notably, our previous work [14] introduced the concept of “generative SemCom” for the first time and constructed three GAI-empowered semantic knowledge bases (KBs) to facilitate efficient semantic coding and transmission.

The aforementioned studies highlight the potential of GAI to enhance the efficiency and flexibility of SemCom. Building on this foundation, this paper aims to elucidate the intrinsic mechanisms through which GAI empowers SemCom. We first provide a systematic review of SemCom research based on three classical GAI models: variational autoencoders (VAEs), generative adversarial networks (GANs), and diffusion models (DMs). Building upon this, we incorporate cutting-edge GAI technology, large language models (LLMs), to develop a novel LLM-native generative SemCom system. This system presents a unified design that leverages the powerful capabilities of LLMs in information understanding and content generation to fulfill the objectives of SemCom. At its core, the system moves beyond the traditional reconstruction-based “encode-before-decode” paradigm, adopting a new “understand-before-regenerate” philosophy. In this approach, the transmitter first performs a deep understanding of the original data, and the receiver directly generates the desired content based on the transmitter’s interpreted output. This paradigm shift is expected to fundamentally reshape the architecture of future wireless networks and bridge the final gap toward the practical realization of 6G.

The remainder of this paper is organized as follows. Section 2 introduces three types of SemCom systems enabled by VAEs, GANs, and DMs, respectively. Section 3 reviews the fundamental concepts of LLMs and their current applications in SemCom, and then presents the proposed LLM-native generative SemCom system. Section 4 provides a case study to demonstrate the effectiveness of the proposed system. 5 Promising applications of generative SemCom, 6 Open issues and future directions elaborate on potential application scenarios and future research directions for generative SemCom, respectively. Finally, Section 7 concludes the paper.

2. Traditional SemCom empowered by GAI models

In this section, we will introduce three prominent SemCom systems, each built upon a distinct GAI model (i.e., VAEs, GANs, and DMs).

2.1. VAE-enabled SemCom

(1) Introduction to VAEs: VAEs are well-known probabilistic generative models introduced by Kingma and Welling in 2013 [15]. The core idea behind VAEs is to learn the probabilistic distribution of training data in order to generate new samples as variations of the input. A standard VAE consists of a probabilistic encoder and decoder, both typically implemented using neural networks. The encoder compresses input data into a low-dimensional latent space, parameterized by the mean and standard deviation that capture the statistical properties of the input. The decoder then samples latent variables from this space to reconstruct the original data. A distinguishing feature of VAEs is their ability to quantify data uncertainty during training via the Kullback-Leibler divergence term. This term regularizes the latent space to approximate a standard normal distribution, thereby ensuring a continuous and smooth latent space conducive to coherent sample generation.

The robust data generation capabilities of VAEs have led to their widespread adoption across various domains, including image generation [16] and drug discovery [17]. Notably, the “compress-before-reconstruct” approach of VAEs aligns well with the principles of SemCom, enabling a significant reduction in communication overhead.

(2) VAE-enabled SemCom architecture: The probabilistic coding scheme of VAEs enables efficient feature extraction and data reconstruction, which forms the basis of the VAE-enabled SemCom architecture, as illustrated in Fig. 1. Typically, the encoder and decoder of the VAE are deployed at the transmitter and receiver, respectively, serving as the semantic encoder and semantic decoder.

•Semantic encoding. At the transmitter, the VAE encoder performs semantic encoding by extracting essential statistical features from the input data. Specifically, it first estimates the mean μ and standard deviation σ of the latent space for the input data x. Then, using the reparameterization trick, the transmitter samples a random vector ∊ from a standard normal distribution $\mathcal{N}(0,I)$ and computes a low-dimensional latent vector z by

(1)$z=\mu +∊\odot {{\sigma }^{2}}$

where I is an identity matrix and ⊙ denotes element-wise multiplication. Consequently, the distribution of the latent vector approximates a normal distribution $\mathcal{N}(\mu,{{\sigma }^{2}})$. Notably, the latent vector z captures the statistical properties of the data distribution, thereby encapsulating the semantic information of the input data.

•Semantic decoding. At the receiver, the VAE decoder performs semantic decoding to reconstruct the original data. This involves mapping the received noisy latent vector $\tilde{z}$, output by the channel decoder, back into the high-dimensional data space, thereby completing the communication task.

(3) Literature review: In recent years, VAEs and their variants have been widely used in SemCom, primarily for semantic coding and joint source-channel coding (JSCC).

•Standard VAEs for coding. Following the introduced architecture, Saidutta et al. [18] proposed a VAE-based JSCC scheme for distributed Gaussian sources over a multiple access additive white Gaussian noise (AWGN) channel. This approach eliminated the need for the joint distribution of all sources, which is required in traditional JSCC schemes. Erdemir et al. [19] developed a privacy-aware JSCC scheme for wireless wiretap channels, leveraging the VAE encoder to map source data into a latent probability distribution rather than a deterministic representation, thereby obscuring privacy-sensitive attributes from potential eavesdroppers. Building on these efforts, Alawad et al. [20] explored a short-packet communication system that transmitted only the statistical parameters of packets, enhancing spectral efficiency. In addition, Xi et al. [21] and Yao et al. [22] investigated text and speech transmission scenarios, respectively. Xi et al. [21] combined variational neural inference with conventional explicit semantic decoding to improve text decoding accuracy. Yao et al. [22] employed nonlinear transformations and variational modeling to capture dependencies between speech frames and estimate the probabilistic distribution of speech features, optimizing both encoding and reconstruction. Furthermore, a VAE-based joint coding-modulation framework was introduced in Ref. [23], which mapped source data to discrete constellation symbols in a probabilistic manner, addressing non-differentiability challenges in digital modulation. Several recent studies have also incorporated deep reinforcement learning (DRL) [24] and semantic-aware codebooks [25] to provide side information that further enhances semantic coding performance.

•VAE variants for coding. To meet diverse task requirements, several pioneering efforts have employed variants of VAEs, such as hierarchical VAEs (HVAEs) [26,27], conditional VAEs (CVAEs) [28,29], β-VAEs [30], and vector quantized VAEs (VQ-VAEs) [[31], [32], [33], [34], [35]], to develop advanced semantic coding schemes. HVAEs learn hierarchical data representations using multiple latent variables, thereby enhancing the flexibility of semantic coding. For example, Chen and Hua [26] used an HVAE to map source data into low-level and high-level latent variables, capturing local detail features and global abstract features, respectively. These latent variables were flexibly combined to improve the accuracy of remote wireless control tasks. Similarly, Zhang et al. [27] applied an HVAE to autoregressively learn multiple latent representations of images using a combination of bottom-up and top-down pathways. These representations were then mapped to varying numbers of channel symbols for transmission, achieving favorable rate-distortion performance. CVAEs introduce external conditioning variables to enable fine-grained control over the data generation process. For instance, Li et al. [28] and Xie et al. [29] incorporated image category labels and channel state information (CSI) into the semantic coding process, thereby improving reconstruction accuracy and system robustness under noisy channel conditions. In β-VAEs, the introduction of a hyperparameter β during training allows the latent variables to be disentangled into a set of task-relevant independent components, enhancing the interpretability of SemCom systems [30]. To align with digital communication systems, many studies have adopted VQ-VAEs for SemCom [[31], [32], [33], [34], [35]]. These models learn discrete latent representations of source data and map them to codewords in a shared codebook accessible at both the transmitter and receiver. This approach significantly reduces communication overhead by transmitting only the indices of selected codewords. Enhancements to VQ-VAEs have included adversarial training [32], DRL algorithms [33], and fine-tuning techniques [34,35], all of which improve robustness against channel impairments and enhance generalization to unseen data distributions.

2.2. GAN-enabled SemCom

(1) Introduction to GANs: GANs have emerged as prominent models in GAI, capable of producing realistic data by learning the statistical properties of real-world datasets. Originally conceptualized by Ian Goodfellow and colleagues in 2014 [36], a standard GAN consists of two neural networks—a generator and a discriminator—that are trained concurrently in an adversarial manner. The generator takes random noise as input and is trained to synthesize data that can mislead the discriminator into classifying the generated samples as real. Conversely, the discriminator is trained to distinguish between real data and the synthetic data produced by the generator. This adversarial training can be mathematically formulated as a two-player minimax game, wherein the generator aims to minimize the discriminator’s ability to differentiate real from synthetic data, while the discriminator aims to maximize its classification accuracy. The training process continues until a Nash equilibrium is reached, at which point the generator is able to produce highly realistic data that the discriminator can only correctly classify with approximately 50% accuracy.

The effectiveness of GANs stems from their ability to model complex data distributions without requiring explicit likelihood functions or prior assumptions. Moreover, the “evolutionary arms race” between the generator and discriminator introduces new methodologies for SemCom, allowing the receiver to generate data that accurately reflects the characteristics of the original source.

(2) GAN-enabled SemCom architecture: Fig. 2 illustrates a typical GAN-enabled SemCom architecture, in which GANs are employed for semantic encoding at the transmitter and semantic decoding at the receiver.

•Semantic encoding. The generator in a GAN can serve as the semantic encoder by extracting semantic information from input data via an inversion mechanism [37]. Specifically, the transmitter samples a latent vector z from a predefined semantic space and inputs it into the generator G to produce synthetic data G(z). The latent vector z is then iteratively optimized to minimize the discrepancy between the synthetic data G(z) and the input data x. This discrepancy is typically measured using metrics such as mean squared error (MSE) or learned perceptual image patch similarity (LPIPS). When using MSE, the optimization toward z is expressed as

(2)${{z}^{\star }}=\arg \underset{z}{\mathop{\min }}\,\|G\left( z \right)-x\|_{2}^{2}$

where $\|\cdot {{\|}_{2}}$ denotes the ${{L}_{2}}$ norm and ${{z}^{\star }}$ is seen as the semantic information.

•Semantic decoding. At the receiver, the generator of the GAN serves as the semantic decoder, reconstructing the data based on the received noisy latent vector $\tilde{z}$. During the training phase, the synthetic data based on $\tilde{z}$, denoted by $G\left( {\tilde{z}} \right),$, is fed into the discriminator for authenticity assessment. The output is then used to update the generator, enhancing its data generation capabilities. This adversarial training process is formalized as a minimax game with the value function $V(D,G)$, as given by

(3)$\underset{G}{\mathop{\min \;}}\,\underset{D}{\mathop{\max }}\,V\left( D,G \right)={{\mathbb{E}}_{x\sim {{p}_{\text{data}}}\left( x \right)}}\left[ \log D\left( x \right) \right]+{{\mathbb{E}}_{\tilde{z}\sim {{p}_{{\tilde{z}}}}\left( {\tilde{z}} \right)}}\left[ \log \left( 1-D\left( G\left( {\tilde{z}} \right) \right) \right) \right]$

where $\mathbb{E}$ is the expected value operator, D is the discriminator, ${{p}_{\text{data}}}\left( x \right)$ denotes the ground-truth data distribution, and ${{p}_{{\tilde{z}}}}\left( {\tilde{z}} \right)$ is the distribution of the noisy latent vector. Through iterative optimization, the generator can achieve efficient data reconstruction in the inference phase.

(3) Literature review: Existing research on GAN-enabled SemCom can be broadly categorized into two directions: leveraging GANs to support the encoding process at the transmitter and the decoding process at the receiver.

•GANs for encoding. The application of GANs to the transmitter’s encoding process shows significant potential, particularly for image-related tasks. For example, Huang et al. [38] proposed a novel coarse-to-fine semantic encoding approach by integrating the generator of a GAN with better portable graphics (BPG) residual encoding technology. This method extracted multi-level semantic features from input images, enabling the receiver to reconstruct high-quality images with fine details. Subsequently, Han et al. [37] and Tang et al. [39] employed the inversion mechanism of StyleGAN to extract semantic features from input images. In particular, Han et al. [37] introduced a privacy filter to remove sensitive information from the semantic features, while Tang et al. [39] developed a semantic cache module to reduce redundant semantic transmission. In addition, a CycleGAN-based data adaptation module was proposed in Ref. [40] to align input data with pre-stored empirical data in the semantic KB. This approach allowed the transmitter to perform adaptive semantic encoding without requiring retraining.

•GANs for decoding. The adversarial interaction between the generator and discriminator offers distinct advantages in the decoding process at the receiver. Specifically, the generator functions as the semantic decoder or JSCC decoder, reconstructing data from the received noisy semantic information, while the discriminator participates in adversarial training to enhance reconstruction performance [[41], [42], [43]]. Building on this concept, Wang et al. [42] introduced both perceptual and adversarial loss functions during training to capture global semantic and local texture information. Xin et al. [43] and Tan et al. [44] further explored the trade-offs among signal distortion, perceptual quality, and transmission rate. Additionally, Erdemir et al. [45] proposed two StyleGAN-based JSCC schemes called InverseJSCC and GenerativeJSCC. InverseJSCC modeled the conventional DeepJSCC process [46] as an unsupervised inverse problem and employed the StyleGAN generator to address distortion issues. GenerativeJSCC, by contrast, achieved high-quality image reconstruction through end-to-end training of the StyleGAN-based JSCC encoder and decoder. Yu et al. [47] presented a two-way SemCom system that leveraged weight reciprocity between the transmitter and receiver, thereby eliminating the need for feedback during training. A conditional GAN was also used in this system to estimate channel distributions. In parallel, several studies have adopted semantic slicing [48], semantic segmentation [49], and vector-quantized (VQ) semantic codebooks [[50], [51], [52]] to enable flexible semantic coding and adaptive image reconstruction. Finally, Mao et al. [53] applied GANs to text transmission, designing a GAN-based distortion suppression module to mitigate signal distortion caused by the absence of CSI.

2.3. DM-enabled SemCom

(1) Introduction to DMs: DMs are a class of GAI models inspired by non-equilibrium thermodynamics. They have attracted considerable attention for their ability to model complex data distributions and generate high-fidelity samples [54]. DMs typically encompass three main formulations: denoising diffusion probabilistic models (DDPMs) [55], score-based generative models (SGMs) [56], and score-based stochastic differential equations (Score SDEs) [57]. DDPMs operate via a two-step process comprising a forward diffusion phase and a reverse denoising phase. In the forward process, noise is progressively added to the original data, converting it into a noise distribution. The reverse process then removes the noise to reconstruct the original data distribution. In contrast, SGMs generate new samples by estimating the score function of the noisy data distribution, without explicitly performing denoising. Score SDEs further generalize these approaches to continuous time by modeling both data perturbation and generation as solutions to well-defined stochastic differential equations.

DMs are particularly effective at handling high-dimensional data and have demonstrated impressive performance in a range of applications, including image generation [58] and time series forecasting [59]. Notably, their inherent denoising capability makes them highly suitable for mitigating channel noise effects in SemCom.

(2) DM-enabled SemCom architecture: Fig. 3 illustrates a typical DM-enabled SemCom architecture, in which DMs enhance the entire communication process through three key components: semantic encoding, channel modeling and equalization, and semantic decoding.

•Semantic encoding. At the transmitter, DMs are used to construct the semantic encoder, which generates a compact semantic representation of the input data via the diffusion process. This involves gradually adding Gaussian noise to the input data x over T timesteps, following a forward Markov chain. At the timestep $t\;\left( t\in \left[ 1,T \right] \right)$, noise is added according to

(4)$\boldsymbol{x}_{t}=\sqrt{\alpha_{t}} \boldsymbol{x}_{t-1}+\sqrt{1-\alpha_{t}} \epsilon_{t-1} $

where the initial point ${{x}_{0}}=x$, ${{\alpha }_{t}}\in \left( 0,1 \right)$ is the noise schedule at the tth timestep, and ∊_t-1 is the noise sampled from a Gaussian distribution at the tth timestep. Notably, the number of timesteps in the diffusion process can be flexibly adjusted to balance data compression and semantic fidelity. More importantly, DM-based semantic encoders demonstrate strong robustness to input variations due to the progressive nature of the diffusion process.

•Channel modeling and equalization. The output of the semantic encoder ${{x}_{T}},$ is susceptible to various channel impairments during wireless transmission, such as fading, interference, and noise. To mitigate these effects, DMs can be employed for channel modeling and equalization. Specifically, channel impairments can be interpreted as an additional noise source affecting the transmitted signal. As such, DMs can be used to generate network-level solutions by modeling channel distributions and estimating network parameters, thereby supporting the design of effective channel equalization strategies.

•Semantic decoding. At the receiver, DMs are employed to construct the semantic decoder, which reconstructs the data through the reverse denoising process. Specifically, this denoising process can be formulated as a reverse Markov chain, defined as:

(5)$\widetilde{\boldsymbol{x}}_{t-1}=\frac{1}{\sqrt{\alpha_{t}}}\left(\boldsymbol{x}_{t}-\frac{1-\alpha_{t}}{\sqrt{1-\bar{\alpha}_{t}}} \boldsymbol{\epsilon}_{\boldsymbol{\theta}}\left(\boldsymbol{x}_{t}, t\right)\right) $

where ${{\tilde{x}}_{t-1}}$ is the mean of the reconstructed data distribution at the tth timestep, ${{\bar{\alpha }}_{t}}=\underset{i=1}{\overset{t}{\mathop \prod }}\,{{\alpha }_{i}}$, and ${ }^{\epsilon}{ }_{\theta}(\cdot, \cdot) $·,· is a neural network with a parameter set θ for predicting the noise component. Through multiple denoising steps, the semantic decoder progressively removes noise from the received signal and reconstructs the original data. DM-based semantic decoders have demonstrated strong performance in generating high-dimensional data, such as images [[60], [61], [62], [63]] and audio [64].

(3) Literature review: Recent studies on DM-enabled SemCom primarily focus on three key areas: multimodal semantic coding, channel modeling and enhancement, and secure and efficient transmission.

•Multimodal semantic coding. For image transmission, Grassucci et al. [65] deployed a DM at the receiver to generate semantically consistent images from received noisy semantic information. This concept was later extended to multiuser scenarios to address the problem of information loss due to limited subcarrier availability [66]. Building on these studies, Pignata et al. [67] applied post-training quantization to DMs, reducing computational complexity and memory usage for on-device deployment. Zhang et al. [68] employed a compact DM to compute conditional vectors, thereby improving decoding efficiency. In addition, recent studies have utilized semantic feature decomposition [69,70] and reinforcement learning (RL) [71] to extract diverse image features, such as texture, color, and object characteristics, enabling receiver-side DMs to perform flexible and controllable image reconstruction. Several works have also incorporated semantic segmentation models at the transmitter to extract essential semantic maps [[72], [73], [74]], which serve as guidance to enhance the quality of image generation by DMs. Notably, the deployment of pre-trained semantic KBs at transceivers has been shown to further improve image quality [75]. Beyond image transmission, the robust denoising and generative capabilities of DMs have led to their broader adoption in practical SemCom scenarios, including speech synthesis [76], three-dimensional (3D) object generation in the metaverse [77], scene understanding [78], and panoramic image reconstruction in virtual reality [79].

•Channel modeling and enhancement. DM-based channel modeling and enhancement have demonstrated significant advantages in addressing channel impairments. Specifically, Xu et al. [80] developed a latent diffusion-based denoising framework to eliminate noise interference during transmission. Building on this, Wu et al. [81] introduced a channel denoising DM as an add-on module to mitigate channel noise. Li and Deng [82] proposed a stable diffusion-based denoiser that removes noise by learning the distribution of channel gains. In Ref. [83], a consistency distillation strategy was introduced to transform the multistep denoising process into a single step or a few deterministic steps, enabling real-time channel denoising. Several recent studies have also utilized DMs for CSI estimation and refinement, thereby enhancing decoding performance at the receiver [84,85]. In parallel, Duan et al. [86] proposed a plug-in module that combines DMs with singular value decomposition techniques for precoding and channel equalization in multiple-input multiple-output (MIMO) channels. Furthermore, rate-adaptive mechanisms [87] and hybrid digital-analog approaches [88] have been integrated with DMs to effectively balance data rate and semantic distortion.

•Secure and efficient transmission. DMs hold substantial potential for enhancing the security and efficiency of SemCom systems. For example, a DM-based secure transmission scheme was proposed to address security risks arising from semantic-oriented attacks [89]. By introducing Gaussian noise to original images and incorporating a denoising module at the receiver, the scheme significantly improved robustness against adversarial perturbations. Du et al. [90] proposed a DM-enabled AI-generated contract mechanism to incentivize secure and efficient semantic sharing in full-duplex device-to-device SemCom systems. In addition, a DM-based defense mechanism was developed to counter adversarial attacks in semantic Internet-of-Things (IoT) scenarios [91]. This mechanism iteratively added and removed noise to neutralize adversarial perturbations in images, striking a balance between computational efficiency and network security. Furthermore, several studies have investigated radio resource management strategies aimed at optimizing system-level efficiency in SemCom applications [[92], [93], [94]].

3. LLM-driven generative SemCom

In this section, we explore SemCom systems powered by cutting-edge GAI models, specifically LLMs. We begin by outlining the key components of LLMs and reviewing their current applications in SemCom. Building upon this foundation, we then propose a novel LLM-native generative SemCom architecture designed to fully leverage the generative capabilities of LLMs.

3.1. Preliminaries of LLMs

LLMs are a class of deep neural networks trained on massive datasets and characterized by an extensive number of parameters. Compared to traditional AI models, LLMs not only possess a broad base of world knowledge but also demonstrate advanced understanding and reasoning capabilities. These attributes enable LLMs to perform complex tasks based on human instructions and generate coherent, human-like responses [95]. To date, LLMs have achieved remarkable success across numerous research domains and are widely regarded as a driving force behind ongoing technological innovation [96].

The predominant architectures of LLMs can be broadly categorized into three types. The encoder-only architecture, exemplified by bidirectional encoder representations from transformers (BERT) [97], is optimized for text understanding. The decoder-only architecture, represented by the generative pre-trained transformer (GPT) [98], is specialized for content generation. Lastly, the encoder-decoder architecture, typified by the text-to-text transfer transformer (T5) [99], is capable of handling both text understanding and content generation tasks. All three architectures are built upon a fundamental unit known as the transformer block. As illustrated in Fig. 4, a transformer block comprises three core components: a multi-head self-attention (MHSA) mechanism, a feedforward neural network (FFN), and two layer normalization modules, each integrated with residual connections.

•The MHSA mechanism is the core component of the transformer block. It captures dependencies among different positions in the input sequence using multiple attention heads [100]. Each attention head independently computes attention scores by applying linear transformations followed by scaled dot-product attention operations on three matrices—that is, query Q, key K, and value V, which can be mathematically expressed as

(6)$Attention\left( Q,K,V \right)=softmax\left( \frac{Q{{K}^{\top }}}{\sqrt{{{d}_{K}}}} \right)V$

where ${{d}_{K}}$ is the dimension of K. All attention scores are then concatenated and passed through a linear transformation to produce the final attention output. The parallel computation enabled by the MHSA mechanism allows the model to simultaneously learn multiple representations across diverse subspaces, thereby significantly enhancing the expressive power of LLMs.

•The FFN comprises two linear transformations separated by a non-linear activation function (e.g., rectified linear unit (ReLU)). It complements the MHSA mechanism by extracting deep features from each positional embedding, thereby aiding LLMs in capturing global dependencies. The FFN and the MHSA mechanism are often collectively referred to as a “sublayer.”

•Layer normalization typically follows both the MHSA mechanism and the FFN. It normalizes the inputs to each sublayer, thereby stabilizing and accelerating the training process. Residual connections are employed to mitigate the vanishing gradient problem by adding the inputs of each sublayer to its outputs, thus preserving essential input features.

The synergistic operation of the three components described above enables the transformer block to effectively capture global information and contextual dependencies within input sequences. By stacking multiple transformer blocks, LLMs gain the capacity to accurately interpret contextual information and generate content aligned with specific requirements. This characteristic endows LLMs with significant potential to support SemCom, providing advanced capabilities in both semantic understanding and content generation.

3.2. Existing applications of LLMs in SemCom

The academic community has recently explored various approaches to integrating LLMs into SemCom, which can be broadly categorized into two major applications: ① the direct use of LLMs for semantic coding, and ② the use of LLMs as auxiliary tools to support and enhance semantic coding.

•The direct use of LLMs for semantic coding. LLMs possess strong capabilities in semantic understanding and content generation, making them suitable for direct use in semantic coding. For example, Wang et al. [101] employed two pre-trained LLMs—bidirectional and auto-regressive transformers (BART) and GPT-2, as semantic codecs. A rate adaptation module was developed to align the output of these models with the rate requirements of various channel codecs. Building upon this, Chang et al. [102] adopted BART to perform bidirectional semantic coding. By capturing correlations between consecutive tokens, BART enabled the receiver to recover semantically similar tokens, even when some were corrupted or lost during transmission. In contrast, Wang et al. [103] leveraged the tokenizer training of LLMs for semantic encoding and utilized unsupervised pre-training to build a KB, thereby equipping the receiver with prior knowledge for effective semantic decoding. Several recent studies have also explored the use of multimodal vision-language models (VLMs), such as bootstrapping language--image pre-training (BLIP) [[104], [105], [106], [107]] and contrastive language-image pre-training (CLIP) [108], to facilitate efficient modality transformation between images and text. By converting input images into concise textual prompts for transmission, these approaches significantly reduce communication overhead. Furthermore, Ren et al. [109] proposed an edge-device collaborative SemCom framework that uses VLMs to generate textual prompts via visual captioning or question answering, enabling ultra-low-rate communications. Additionally, Cao et al. [110] introduced a privacy-preserving, multimodal LLM-based SemCom system that uses text semantics as the central medium, allowing unified semantic conversion across diverse modalities without requiring additional KB alignment.

•The use of LLMs as auxiliary tools to support and enhance semantic coding. The extensive world knowledge embedded in LLMs can serve as valuable prior information to facilitate semantic coding [111,112]. For instance, Guo et al. [113] employed BERT to perform importance-aware semantic understanding at the transmitter and leveraged ChatGPT for error calibration at the receiver. Jiang et al. [114] constructed a cross-modal KB using BLIP and stable diffusion, enabling accurate text extraction from input images to support semantic coding. Similarly, a pre-trained large speech model, WavLM, was used to construct a semantic KB at the transmitter, allowing efficient semantic encoding while reducing communication overhead [115]. In addition, several pioneering studies [[116], [117], [118], [119], [120]] have proposed multimodal LLM-enabled SemCom frameworks. In these frameworks, multimodal LLMs such as GPT-4 functioned as semantic KBs to support tasks including task decomposition, semantic representation, knowledge distillation, content calibration, transmission optimization, and out-of-distribution generalization. These frameworks contribute significantly to standardized semantic encoding and personalized semantic decoding. Notably, by integrating conditional GANs for channel estimation [116] and RL for adaptive semantic offloading [117], these systems demonstrated improved reliability and scalability. Finally, LLMs have shown notable advantages in practical SemCom scenarios, including satellite communications [121], underwater communications [122], and edge IoT networks [123], where they were employed for semantic feature extraction, semantic importance assessment, and radio resource management, respectively.

3.3. Challenges and opportunities

Thus far, we have introduced four types of SemCom systems based on different GAI models: VAEs, GANs, DMs, and LLMs. Table 1 [8, [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35],[37], [38], [39], [40], [41], [42], [43], [44], [45],[47], [48], [49], [50], [51], [52], [53],[60], [61], [62], [63], [64], [65], [66], [67], [68], [69], [70], [71], [72], [73], [74], [75], [76], [77], [78], [79], [80], [81], [82], [83], [84], [85], [86], [87], [88], [89], [90], [91], [92], [93], [94],[101], [102], [103], [104], [105], [106], [107], [108], [109], [110], [111], [112], [113], [114], [115], [116], [117], [118], [119], [120], [121], [122], [123]] summarizes the key characteristics of these models alongside relevant literature. Among them, there is a growing consensus within the research community that LLMs represent the most promising technology for advancing intelligent SemCom. However, despite preliminary progress, applying LLMs to SemCom still faces four critical challenges:

•Large modality gap. SemCom often involves multimodal data, such as images, audio, and videos that go beyond plain text. Each modality exhibits unique data patterns, resulting in significant modality gaps that hinder LLMs from processing them efficiently. Although some studies have attempted to incorporate multimodal LLMs (e.g., GPT-4), the range of supported modalities remains limited, primarily to images and audio. Directly applying these models to process higher-dimensional data, such as videos and 3D point clouds, remains a significant challenge.

•High adaptation cost. Most existing LLMs are primarily designed for natural language tasks and do not explicitly account for the unique characteristics of SemCom, such as channel fading, interference, and noise. Directly applying these LLMs to SemCom may introduce significant domain-specific biases. While fine-tuning can help bridge this domain gap, adapting all model parameters entails high computational costs, limiting the practical deployment of LLMs in resource-constrained edge devices.

•Limited generative capability. In receiver design, most prior studies adhere to the traditional communication paradigm focused on information recovery, often employing specialized AI models (e.g., DMs) rather than adopting a fundamentally generative approach. This limits the ability to fully leverage the generative strengths of LLMs, thereby restricting the system’s generalization capacity in complex scenarios with diverse task requirements.

•Inconsistent design philosophy. Prior works primarily employ LLMs for semantic encoding or the construction of semantic KBs, typically achieving incremental performance gains. However, these approaches lack a unified guiding framework, resulting in fragmented methodologies that impede the integration of research insights and limit the scalability of future SemCom systems.

To address these challenges, we propose a novel LLM-native generative SemCom architecture in the following subsection. The key innovations are summarized as follows:

•Mitigating the modality gap. We introduce a perception encoder that projects multimodal input data into a shared feature space compatible with language tokens. This enables LLMs to effectively understand and process diverse data types.

•Reducing the adaptation cost. We introduce a lightweight fine-tuning module that functions as a “thin wrapper” for LLMs. By freezing the full parameters of the pre-trained LLM and training only this fine-tuning module, the general knowledge of the LLM is preserved while efficiently acquiring SemCom-specific domain knowledge.

•Enhancing the generative capability. We integrate both LLMs and specialized AI models in the receiver design. LLMs handle content generation, while specialized AI models refine outputs to reduce uncertainty and hallucination. This hybrid design leverages the strengths of both model types, maximizing the generative potential at the receiver.

•Unifying the design philosophy. We develop a universal AI agent framework for both transmitter and receiver, where LLMs serve as the core components for either information understanding or content generation. This framework ensures the deep integration of LLMs within SemCom systems, moving beyond their current role as auxiliary tools.

Before presenting the detailed designs, we emphasize that the proposed generative SemCom architecture represents a paradigm shift in communication, moving from the traditional focus on “information recovery” to a new emphasis on “information regeneration.” By leveraging the generative capabilities of LLMs at the receiver, our architecture establishes a coherent and universal design methodology that supports a “one architecture for all scenarios” vision, thereby promoting the sustainable development of generative SemCom systems.

3.4. LLM-native generative SemCom architecture

(1) Architecture overview: As illustrated in Fig. 5, the LLM-native generative SemCom system consists of four building blocks, with both the transmitter and receiver equipped with an LLM-based AI agent and a channel adaptation module. Analogous to human-to-human conversation, the two LLM-based AI agents function as the “brains” of the interlocutors, while the channel adaptation modules serve as their “mouths and ears.” The operational mechanism of the system is described as follows:

•Initially, the LLM-based AI agent at the transmitter comprehends the input data, along with the channel sensing data and task requirements, producing a compact semantic representation called the understanding embedding. This embedding encapsulates the semantic information essential for accomplishing the task at the receiver.

•Next, the channel adaptation module at the transmitter converts the understanding embedding into a signal suitable for wireless transmission.

•Upon receiving the noisy signal, the channel adaptation module at the receiver converts it back into the understanding embedding.

•Finally, the recovered understanding embedding is input to the LLM-based AI agent at the receiver, which directly generates the task-oriented content. Since the understanding embedding guides the generation process, it effectively serves as a prompt for the LLM-based AI agent.

In the following, we elaborate on the detailed design of each building block.

(2) LLM-based AI agent: There is a growing consensus that LLM-based AI agents represent the next frontier in GAI [124]. Leveraging their robust capabilities in information analysis and human-like responses, we deploy two LLM-based AI agents at the transmitter and receiver for information understanding and content generation, respectively referred to as the understanding AI agent and generating AI agent. As shown in Fig. 6, both AI agents share the same backbone structure composed of five components: a perception encoder, data-driven LLM adaptation, a memory system, tool integration, and a functional head. The two AI agents are distinguished by an embedded “role prompt” and can achieve multi-agent collaboration through end-to-end training. The functions of each component are described below.

•Perception encoder is responsible for sensing the communication environment and preprocessing input data to enable efficient utilization by LLMs. Its inputs include multimodal data (e.g., images, audio, videos, or their understanding embeddings), channel sensing data (e.g., CSI and environmental point cloud data), and task requirements (e.g., scene rendering, robotic control, and video analytics). Since most LLMs accept only plain text as input, all data must be projected into a unified feature space as language tokens and represented as token embeddings for LLM processing. A straightforward approach is to use existing well-designed encoders tailored for specific modalities, such as the vision transformer (ViT) for images and the C-Former for audio. Additionally, recent studies have proposed unified encoders that can handle multiple modalities, such as ImageBind [125] and Sber-modulated vector quantized (MVQ) GAN [126], achieving “one encoder for all modalities.” Reusing these off-the-shelf models significantly reduces engineering costs compared to building perception encoders from scratch. It is important to note that the understanding embedding generated by the understanding AI agent may not align with the token space of the LLM at the receiver (e.g., LLaMA 2’s input dimension is 4096 [127]). To address this issue, the perception encoder at the receiver must learn an efficient mapping from the embedding space to the token space, producing token embeddings directly usable by the LLM. This mapping can be implemented using linear layers with layer normalization.

•Data-driven LLM adaptation is responsible for performing information understanding at the transmitter and content generation at the receiver. It consists of a pretrained LLM (e.g., ChatGPT-3.5 and LLaMA 3.2) combined with a fine-tuning module. The pretrained LLM provides powerful capabilities in understanding data and generating new content, while the fine-tuning module adapts the LLM specifically for the SemCom domain. Popular parameter-efficient fine-tuning techniques, such as low-rank adaptation, prompt tuning, prefix tuning, and adapter tuning [128], can be employed. By freezing the core LLM parameters and training only a small set of external parameters through a data-driven pipeline, the fine-tuning module enables efficient acquisition of SemCom-specific knowledge at minimal computational cost. The adaptation objective is guided by a role prompt. For example, the transmitter’s role prompt could be: “[understanding prompt] = please act as a (domain) expert, understand the input information, and provide an accurate description.” At the receiver, the role prompt might be: “[generating prompt] = please generate content based on the input information to meet the task requirement.” For complex tasks, the LLM can exploit its chain-of-thought reasoning to decompose tasks hierarchically and process them iteratively, thereby enhancing system efficiency.

•Memory system functions similarly to the hippocampus in the human brain, accumulating and organizing knowledge during training to support the LLM in rapid information understanding and content generation. The memory system is generally divided into two types based on the recency of stored knowledge: short-term memory and long-term memory. Short-term memory holds contextual information from recent or ongoing communications to maintain content coherence. This is commonly implemented through mechanisms such as caching and recurrent neural networks. Long-term memory captures and distills general knowledge from historical data, organizing it in structured formats such as knowledge graphs and vector databases. These two memory types complement each other to boost the AI agents’ efficiency. For instance, when the system encounters repeated or similar tasks, the LLMs at the transmitter and receiver can quickly retrieve relevant knowledge fragments from their memory systems, enabling lightweight or even zero-cost inference. Additionally, the memory system supports continuous self-evolution by refining knowledge based on user preferences and behavior patterns, thereby enhancing personalized communication experiences.

•Tool integration acts as the “extended arm” of the LLM, incorporating various specialized tools and application programming interfaces (APIs), such as search engines and data access services, to enhance the LLM’s capabilities in domain-specific information understanding and content generation. For example, in video transmission scenarios, a “video intelligence API” can be used to extract key scenes, events, and target objects from the original video. This processed information helps the LLM at the transmitter produce a more accurate and concise video summary. At the receiver, the LLM refines this summary and generates representative video keyframes. Subsequently, tools like RunwayML can leverage these keyframes to synthesize the desired video content. Such tool integrations mitigate the limitations of LLMs, including understanding biases and hallucination effects, especially in specialized domains. This significantly improves the accuracy, reliability, and quality of generated content. Since the understanding and generating AI agents have different roles, they are typically configured with distinct sets of tools and APIs tailored to their specific responsibilities.

•Functional head is responsible for processing the LLM’s output features to fulfill the AI agent’s ultimate objectives. For the understanding AI agent, the functional head generates a deep semantic representation of the input data. A practical method is to use a classical language modeling head that predicts tokens autoregressively, eventually combining them into a compact understanding embedding. This embedding serves as a concise language-based summary that is significantly smaller than the original data. The “next-token prediction” approach offers a highly compatible representation applicable to diverse any-to-any tasks [126]. For the generating AI agent, the functional head transforms the LLM’s output features into the task-oriented content. Depending on the specific task, specialized generative models can be integrated, such as stable diffusion for image reconstruction, AudioLDM for text-to-audio generation, and Zeroscope for video prediction [129]. These models typically produce the final content in a single generation step, although this may introduce some additional storage overhead.

(3) Channel adaptation module: To ensure the reliable transmission of the understanding embedding, we deploy two channel adaptation modules, one at the transmitter and the other at the receiver. The transmitter’s channel adaptation module maps the understanding embedding into a signal optimized for wireless transmission. Conversely, the receiver’s channel adaptation module converts the received noisy signal back into the understanding embedding. These two mapping processes can be implemented using classical discriminative models, such as convolutional neural networks (CNNs). However, traditional discriminative models often lack adaptability and generalization, making it difficult to handle the dynamic nature of wireless channels (e.g., fading) and the variety of communication scenarios (e.g., MIMO systems). To overcome these limitations, an alternative solution is to leverage GAI models—particularly GANs and DMs. Thanks to their strong self-learning capabilities, GAI-enabled channel adaptation modules can continuously adapt to changing communication environments. Furthermore, by harnessing self-supervised learning, these modules can perform tasks such as assessing information importance, optimizing wireless resource allocation, and fine-tuning network parameters, thereby improving communication efficiency and increasing task success rates [12].

In summary, the two LLM-based AI agents work collaboratively to perform information understanding at the transmitter and content generation at the receiver. Their synergy enables efficient transformation between high-dimensional input data and low-dimensional semantic embeddings, substantially reducing communication overhead while enhancing the SemCom system’s generalization capability. Meanwhile, the two channel adaptation modules serve as reliable “bridges” between the AI agents, ensuring robust information transmission over wireless channels. This seamless integration fully leverages the power of LLMs, establishing a strong foundation for developing efficient generative SemCom systems in future wireless networks.

4. A case study of LLM-native generative SemCom

Building upon the system design, this section provides a case study to validate the effectiveness of the proposed generative SemCom system.

4.1. Experimental settings

(1) Task requirements: We consider a point-to-point communication system tailored for a video retrieval task. In this setup, the transmitter holds a large video, while the receiver’s goal is to identify and retrieve specific clips containing particular objects of interest, such as people and vehicles. This task is common in applications such as security surveillance, traffic monitoring, and remote sensing object detection. Given the massive data volume of the original video and the receiver’s focus on selective clips, SemCom can be leveraged to greatly improve system efficiency.

(2) System configuration: The system’s default configuration is as follows. The transmitter utilizes the YOLOv8-DeepSORT object tracking model [130] as the perception encoder, which performs object tracking across the entire video and extracts corresponding keyframes. Additionally, the transmitter employs a multimodal LLM—InternVL-1.5 [131]—to process each keyframe and generate a sequence of descriptive entries, collectively forming the understanding embedding. To ensure that InternVL-1.5 produces standardized and consistent entries, we design the following understanding prompt.

[Understanding Prompt]: You are a video attribute extraction expert. Your task is to analyze the given video frame and accurately extract the attributes related to a person’s appearance. Please provide the output as a concise list of keywords following this format as output. Ensure the output is a single line and does not span multiple lines: [gender], [shirt color], [shirt length], [pants color], [pants length], [shoe color].

At the receiver, another multimodal LLM—Qwen [132]—is deployed to generate a comprehensive description of each keyframe based on the received entries. To ensure standardized outputs from Qwen, we utilize the following generating prompt.

[Generating Prompt]: You are a semantic expansion assistant. Given an input in the format: [time], [location], [gender], [shirt color], [shirt length], [pants color], [pants length], [shoe color], your task is to expand it into a complete sentence using the following structure: “This is a [gender] person wearing a [shirt color] [shirt length] shirt, [pants color] [pants length] pants, and [shoe color] shoes. [He/She] appeared at [location] at [time].”

Additionally, the receiver uses the all-MiniLM-L6-v2 model [133] as the functional head to generate retrieval results, specifically determining whether the target object is present in each keyframe. The transmitter and receiver communicate over an AWGN channel, with their channel adaptation modules implemented as two linear mapping layers. It is worth noting that, in practical scenarios, impairments such as channel fading and frequency offsets can be mitigated by incorporating channel equalization techniques within the channel adaptation modules.

(3) Baseline schemes: To demonstrate the superiority of the proposed LLM-native generative SemCom system, four baseline schemes are adopted for performance comparison, including:

•MPEG + 1/2 LDPC + 16QAM: This scheme employs the moving picture experts group (MPEG) compression method for source coding, combined with a rate-1/2 low-density parity-check (LDPC) code for channel coding and 16-quadrature amplitude modulation (QAM) for modulation.

•VAE-enabled scheme, with a pre-trained VAE [134] as the backbone for JSCC.

•GAN-enabled scheme, with a pre-trained GAN [135] as the backbone for JSCC.

•DM-enabled scheme, with a pre-trained DM [136] as the backbone for JSCC.

In each baseline scheme, after reconstructing the keyframes, the receiver applies a widely used object retrieval algorithm [137] to make the final decision. The first baseline is referred to as the traditional communication scheme, while our approach is denoted as the LLM-native generative scheme for brevity.

(4) Performance metrics: In this case study, we use three metrics to evaluate system performance: ① retrieval accuracy, which refers to the proportion of keyframes containing the target object among all retrieved keyframes; ② robustness, measured by the standard deviation of retrieval accuracy across varying signal-to-noise ratios (SNRs), reflecting the system’s sensitivity to channel noise; and ③ communication overhead, quantified by the total number of bits transmitted.

4.2. Results and analysis

We conduct experiments across different SNR regimes, ranging from 3 to 30 dB in increments of 3 dB. Fig. 7 shows the performance of the five schemes, where the height of each bar represents the average retrieval accuracy, and the error bars indicate the standard deviation of the retrieval accuracy. A longer error bar signifies greater variability in retrieval accuracy. Based on Fig. 7, we observe the following key insights.

(1) Retrieval accuracy: The LLM-native generative scheme achieves an average retrieval accuracy of 93.03%, significantly outperforming all baseline schemes. In comparison, the traditional communication scheme (i.e., MPEG + 1/2 LDPC + 16QAM) achieves an average retrieval accuracy of 39.39%, resulting in an approximate gain of 53% for the LLM-native generative scheme. This improvement is attributed to the LLM’s strong generalization capability. Specifically, unlike the small-scale object retrieval algorithm used in the baseline schemes, LLMs can identify more accurate attributes of the target object in the keyframes. Additionally, the standardized outputs of the two LLMs further enhance retrieval accuracy. Notably, the VAE-enabled scheme exhibits the lowest average retrieval accuracy. This is due to the VAE’s probabilistic generation method, which produces low-quality keyframes and subsequently degrades retrieval performance.

(2) Robustness: The LLM-native generative scheme has the smallest error bar, demonstrating superior robustness against channel noise. This robustness is attributed to the LLM’s extensive knowledge, which effectively compensates for the effects of noise. In contrast, the traditional communication scheme exhibits the largest error bar, indicating the weakest resistance to noise. This is due to the “cliff effect,” where the receiver fails to recover keyframes when the SNR falls below a certain threshold (9 dB in our experiment). Conversely, when the SNR exceeds this threshold, the scheme restores keyframes effectively, resulting in a bimodal performance distribution. Additionally, the DM-enabled scheme demonstrates greater robustness than the VAE-enabled and GAN-enabled schemes, owing to its stable training process.

(3) Communication overhead: Table 2 presents the communication overhead for all schemes. The LLM-native generative scheme exhibits a communication overhead that is orders of magnitude lower, reducing the transmission volume by 99.98% compared to the traditional communication scheme. This result is intuitive, as the LLM-native generative scheme transmits only concise textual entries rather than the raw video or high-dimensional keyframes.

(4) Visualization results: Fig. 8 presents retrieval samples from each scheme, with the SNR set to 9 dB. The task is to identify the man wearing a white short-sleeved shirt, black pants, and black shoes in the video. The keyframes retrieved by the LLM-native generative scheme consistently contain the target object (highlighted in red), whereas other schemes produce retrieval errors (highlighted in green). Additionally, the traditional communication scheme fails to recover keyframes at the receiver due to the low SNR.

These experimental results validate that the LLM-native generative scheme outperforms the baseline schemes in terms of retrieval accuracy, robustness, and communication overhead. These findings highlight the potential of LLM-native generative SemCom for a wide range of applications in future wireless networks.

5. Promising applications of generative SemCom

In this section, we summarize four practical applications for generative SemCom, including industrial IoT (IIoT), vehicle-to-everything (V2X), metaverse, and low-altitude economy.

5.1. Industrial IoT

The IIoT enables flexible resource configuration and sustainable manufacturing through the collaboration and interoperability of various industrial elements, including humans, sensors, and machines [138]. This integration requires the exchange of large volumes of data, such as production indicators, environmental information, and machinery operating statuses. However, existing industrial communication technologies are unable to efficiently align manufacturing intentions with data communication, resulting in excessive signaling overhead, low collaboration efficiency, and significant security risks. By employing the proposed generative SemCom architecture, the transmitter on the production line can leverage an LLM-based AI agent to deeply understand the input data and produce semantically meaningful information aligned with the manufacturing objective. Based on this understanding, the receiver can directly generate appropriate operational instructions, such as adjusting production schedules or performing machine maintenance, without needing to reconstruct the original data. This convergence of communication and computation significantly streamlines manufacturing processes and enhances production efficiency, thereby promoting the intelligent transformation of industrial manufacturing.

5.2. Vehicle-to-everything

V2X communication aims to enhance driving comfort, safety, and traffic efficiency through interconnections among vehicles, humans, and infrastructure. Implementing V2X in practical systems requires ultra-low communication latency, real-time data processing, and robust privacy protections. Generative SemCom offers a promising solution to meet these stringent demands [139]. By leveraging local LLM-based AI agents, vehicles can process multimodal sensor data—including camera feeds and light detection and ranging (LiDAR) signals—for real-time road condition analysis, such as obstacle detection and traffic light recognition. These analyses enable the generation of control commands for steering, acceleration, and braking, thereby supporting autonomous driving. Simultaneously, AI agents can interact with users to enhance their driving experience. Furthermore, generative SemCom facilitates instantaneous communication between vehicles and infrastructure. Vehicles can broadcast surrounding information to assist nearby vehicles with route optimization and roadside infrastructure with traffic control and scheduling. These capabilities are especially valuable in latency-sensitive scenarios, such as highways and congested intersections. Notably, because raw data is not transmitted, user privacy and security are also enhanced.

5.3. Metaverse

The metaverse represents a seamless integration of the physical and digital worlds, aiming to create a virtual space that enables immersive interaction and collaboration among users. Achieving this vision requires advanced communication technologies capable of delivering millisecond latency and secure information exchange [140]. Traditional communication systems struggle to meet these demands, especially when handling multimodal interactions and dynamic environmental updates. Generative SemCom offers a promising solution by deploying LLM-based AI agents within metaverse devices [141]. Since only semantic information is exchanged between devices, communication efficiency and security are significantly improved. For example, during human-avatar interactions, the transmitter’s AI agent analyzes user actions and abstracts them into semantic descriptions for transmission, while the receiver’s AI agent generates smooth, contextually consistent actions in real time based on the received descriptions. Similarly, in virtual environment creation, users exchange concise scene descriptions rather than detailed geometric data, enriching the collaborative experience. Generative SemCom can also support other metaverse applications, such as intelligent interactions with non-player characters and personalized digital content generation by mining user preferences [142].

5.4. Low-altitude economy

The low-altitude economy leverages the airspace below 1000 m for economic activities, primarily through the use of flying devices such as unmanned aerial vehicles (UAVs). It is projected that tens of thousands of such devices will operate in urban airspaces, creating a need for ultra-low latency, extensive connectivity, and anti-interference communication services to ensure efficient and orderly management. Generative SemCom is well-suited to meet these demands by deploying LLM-based AI agents on flying devices, which convert massive traffic data into concise semantic information, thereby reducing the overall traffic load on low-altitude networks. For example, in smart logistics, UAVs can send processed environmental data to a centralized platform, supporting real-time scheduling and decision-making. In video surveillance, UAVs analyze footage based on user requirements (e.g., searching for individuals) and transmit only task-relevant information for immediate response. Additionally, in smart agriculture, AI agents on UAVs assess crop health and identify areas of concern by analyzing multispectral images. This information enables ground equipment to generate precise intervention instructions such as pesticide application, improving the timeliness and effectiveness of agricultural practices.

6. Open issues and future directions

Generative SemCom is still in its early stages, with numerous technical and engineering challenges yet to be addressed. In this section, we outline three primary challenges associated with generative SemCom and discuss potential solutions to overcome them.

6.1. Deployment of LLMs on resource-constrained edge devices

The deployment of LLMs on resource-constrained edge devices is a major challenge for generative SemCom due to substantial storage requirements, high energy consumption, and long computational latency. First, the parameter count of existing LLMs ranges from billions to trillions, for example, 1.23 billion parameters for LLaMA 3.2-1B and 1.75 trillion for GPT-4, requiring tens of gigabytes to several terabytes of storage for on-device deployment. This demand far exceeds the typical storage capacity of edge devices, which generally have only a few hundred gigabytes available. Second, LLM training and inference involve extensive floating-point operations, resulting in energy consumption that is prohibitive for battery-powered edge devices. Additionally, the limited computational power of edge devices causes significant latency (often several seconds) during information understanding and content generation, which fails to meet the real-time demands of latency-sensitive applications such as holographic communication. To address these challenges, future research should prioritize the development of lightweight on-device LLMs. Potential solutions include designing advanced model compression techniques such as quantization (e.g., SmoothQuant [143]), pruning (e.g., LLM-pruner [144]), knowledge distillation (e.g., MiniLLM [145] and DeepSeek [146]), and low-rank factorization [147] to reduce model size while preserving performance. Another approach is to leverage collaborative storage and computational resources at edge servers for distributed LLM deployment [148]. Furthermore, customizing LLMs for specific application scenarios could help balance model size, computational efficiency, and overall system performance [149].

6.2. Dynamic evolution of AI agents at transceivers

The collaboration between the understanding AI agent and the generating AI agent is crucial to the performance of generative SemCom systems. However, the variability inherent in wireless networks—characterized by multimodal data, dynamic channels, and diverse tasks—poses significant challenges to this collaboration. On one hand, the uncertainty in channels and the complexity of tasks may introduce biases in the understanding AI agent, compromising the quality of content generated by the generating AI agent. On the other hand, knowledge discrepancies between the two AI agents are likely to worsen over time, negatively impacting system performance. These challenges call for the dynamic evolution of both AI agents. One approach is to incorporate popular continual learning methods, such as experience replay and meta-learning [150]. These techniques enable ongoing refinement by learning from newly acquired data, thereby enhancing adaptability to changing communication environments and task requirements. Additionally, developing new knowledge-sharing protocols and bidirectional feedback mechanisms shows promise for achieving efficient knowledge synergy between the transmitter and receiver [151]. For example, the transmitter might use knowledge distillation to selectively share critical insights with the receiver, assisting its AI agent to generate more effective content. In turn, the receiver could provide feedback on the quality of the generated content to the transmitter, prompting its AI agent to optimize the understanding process, thereby fostering a cycle of continuous self-improvement within the system.

6.3. Privacy and security concerns during transmission

The integration of LLMs into SemCom systems raises significant privacy and security concerns, primarily due to LLMs’ tendency to memorize data during training [152]. For example, the understanding AI agent at the transmitter may unintentionally encode fragments of training data into its output—that is, the understanding embedding. Consequently, unauthorized third parties could intercept communication signals and use the same LLM to reconstruct training data or launch membership inference attacks. Additionally, the generative nature of these systems introduces risks of exploitation by malicious actors. Specifically, such actors might manipulate the understanding embedding via poisoning attacks, causing the receiver to generate erroneous or misleading content. These vulnerabilities have severe implications for security-sensitive applications, such as virtual medical consultations and confidential business communications [153]. To mitigate these risks, it is crucial to develop advanced encryption technologies, including quantum-resistant algorithms and efficient privacy-preserving mechanisms (e.g., differential privacy), to protect the understanding embedding from unauthorized access. Incorporating adversarial training methods during LLM training may also enhance resilience against poisoning attacks [154]. Furthermore, physical layer security technologies, such as secure beamforming and covert communication algorithms [155], can reinforce system confidentiality. Future research should also focus on creating a secure framework for multiuser generative SemCom, leveraging blockchain technology for secure recording and robust verification of semantic transmission to safeguard against potential threats [156].

7. Conclusions

This paper presented a comprehensive overview and research outlook on GAI with SemCom. We began by introducing three types of SemCom systems enabled by classical GAI models, including VAEs, GANs, and DMs. Next, we proposed a novel LLM-native generative SemCom system, featuring two LLM-based AI agents at the transmitter and receiver, marking a paradigm shift from “information recovery” to “information regeneration.” To validate this design, we conducted a case study demonstrating the benefits of leveraging LLMs. We also highlighted four practical applications of generative SemCom—IIoT, V2X, metaverse, and the low-altitude economy, illustrating its real-world potential. Finally, we discussed three open challenges along with preliminary solutions. We believe that integrating advanced GAI models, particularly LLMs, promises to redefine the communication paradigm from a generative perspective. This study aims to serve as a valuable reference and provide insightful guidance for further in-depth research on generative SemCom within the context of 6G.

CRediT authorship contribution statement

Jinke Ren: Writing - review & editing, Writing - original draft, Visualization, Validation, Supervision, Software, Resources, Project administration, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Yaping Sun: Writing - review & editing, Writing - original draft, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Hongyang Du: Writing - original draft, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Weiwen Yuan: Methodology, Investigation, Formal analysis, Data curation. Chongjie Wang: Visualization, Validation, Software, Methodology. Xianda Wang: Visualization, Validation, Software. Yingbin Zhou: Investigation, Data curation. Ziwei Zhu: Visualization, Validation, Software. Fangxin Wang: Writing - review & editing, Methodology, Conceptualization. Shuguang Cui: Writing - review & editing, Supervision, Resources, Project administration, Methodology, Investigation, Funding acquisition, Formal analysis, Data curation, Conceptualization.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The work was supported in part by the Basic Research Project of Hetao Shenzhen–Hong Kong Science and Technology Innovation Cooperation Zone (HZQB-KCZYZ-2021067), the National Natural Science Foundation of China (62293482, 62301471, and 62471423), the Shenzhen Outstanding Talents Training Fund (202002), the Guangdong Research Projects (2017ZT07X152 and 2019CX01X104), the Guangdong Provincial Key Laboratory of Future Networks of Intelligence (2022B1212010001), the Shenzhen Key Laboratory of Big Data and Artificial Intelligence (ZDSYS201707251409055), and the National Science and Technology Major Project—Mobile Information Networks (2024ZD1300700).

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	W. Saad, M. Bennis, M. Chen. A vision of 6G wireless systems: applications, trends, technologies, and open research problems. IEEE Netw, 34 (3) (2019), pp. 134-142.

[2]	C.E. Shannon. A mathematical theory of communication. Bell Syst Tech J, 27 (3) (1948), pp. 379-423.

[3]	W. Weaver. Recent contributions to the mathematical theory of communication. ETC, 1 (1953), pp. 261-281.

[4]	D. Gündüz, Z. Qin, I.E. Aguerri, H.S. Dhillon, Z. Yang, A. Yener, et al. Beyond transmitting bits: context, semantics, and task-oriented communications. IEEE J Sel Areas Commun, 41 (1) (2022), pp. 5-41.

[5]	D. Wen, P. Liu, G. Zhu, Y. Shi, J. Xu, Y.C. Eldar, et al. Task-oriented sensing, computation, and communication integration for multi-device edge AI. IEEE Trans Wirel Commun, 23 (3) (2023), pp. 2486-2502.

[6]	W. Yang, H. Du, Z.Q. Liew, W.Y. Lim, Z. Xiong, D. Niyato, et al. Semantic communications for future Internet: fundamentals, applications, and challenges. IEEE Commun Surv Tutor, 25 (1) (2022), pp. 213-250.

[7]	H. Xie, Z. Qin, G.Y. Li, B.H. Juang. Deep learning enabled semantic communication systems. IEEE Trans Signal Process, 69 (2021), pp. 2663-2675.

[8]	D. Huang, F. Gao, X. Tao, Q. Du, J. Lu. Toward semantic communications: deep learning-based image semantic coding. IEEE J Sel Areas Commun, 41 (1) (2022), pp. 55-71.

[9]	Z. Weng, Z. Qin, X. Tao, C. Pan, G. Liu, G.Y. Li. Deep learning enabled semantic communications with speech recognition and synthesis. IEEE Trans Wirel Commun, 22 (9) (2023), pp. 6227-6240.

[10]	W. Yang, Z. Xiong, T.Q. Quek, X. Shen. Streamlined transmission: a semantic-aware XR deployment framework enhanced by generative AI. IEEE Netw, 38 (6) (2024), pp. 29-38.

[11]	Xia L, Sun Y, Liang C, Zhang L, Imran MA, Niyato D.Generative AI for semantic communication: architecture, challenges, and outlook. 2023. arXiv:2308. 15483.

[12]	C. Liang, H. Du, Y. Sun, D. Niyato, J. Kang, D. Zhao, et al. Generative AI-driven semantic communication networks: architecture, technologies and applications. IEEE Trans Cogn Commun Netw, 11 (1) (2024), pp. 27-47.

[13]	Grassucci E, Park J, Barbarossa S, Kim SL, Choi J, Comminiello D.Generative AI meets semantic communication: evolution and revolution of communication tasks. 2024. arXiv:2401. 06803.

[14]	J. Ren, Z. Zhang, J. Xu, G. Chen, Y. Sun, P. Zhang, et al. Knowledge base enabled semantic communication: a generative perspective. IEEE Wirel Commun, 31 (4) (2024), pp. 14-22.

[15]	Kingma DP, Welling M.Auto-encoding variational bayes. 2013. arXiv:1312. 6114.

[16]	Razavi A, Van den Oord A, Vinyals O. Generating diverse high-fidelity images with VQ-VAE-2. In:Proceedings of International Conference on Neural Information Processing Systems; 2019 Dec 8-14; Vancouver, BC, Canada. New York City: Curran Associates Inc.; 2019. p. 14866-76.

[17]	T. Li, X.M. Zhao, L. Li. Co-VAE: drug-target binding affinity prediction by co-regularized variational autoencoders. IEEE Trans Pattern Anal Mach Intell, 44 (12) (2021), pp. 8861-8873

[18]	Saidutta YM, Abdi A, Fekri F. VAE for joint source-channel coding of distributed Gaussian sources over AWGN MAC. In:Proceedings of International Workshop on Signal Processing Advances in Wireless Communications; 2020 May 26-29; Atlanta, GA, USA. New York City: IEEE; 2020. p. 1-5.

[19]	Erdemir E, Dragotti PL, Gündüz D. Privacy-aware communication over a wiretap channel with generative networks. In:Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing; 2022 May 23-27; Singapore. New York City: IEEE; 2022. p. 2989-93.

[20]	M.A. Alawad, M.Q. Hamdan, K.A. Hamdi. Innovative variational autoencoder for an end-to-end communication system. IEEE Access, 11 (2022), pp. 86834-86847

[21]	Z. Xi, Z. Yiqian, L. Congduan, M. Xiao. Variational neural inference enhanced text semantic communication system. China Commun, 21 (7) (2024), pp. 50-64.

[22]	Yao S, Xiao Z, Wang S, Dai J, Niu K, Zhang P. Variational speech waveform compression to catalyze semantic communications. In:Proceedings of IEEE Wireless Communications and Networking Conference; 2023 Mar 26-29; Glasgow, UK. New York City: IEEE; 2023. p. 1-6.

[23]	Y. Bo, Y. Duan, S. Shao, M. Tao. Joint coding-modulation for digital semantic communications via variational autoencoder. IEEE Trans Commun, 72 (9) (2024), pp. 5626-5640.

[24]	J. Seon, S. Lee, J. Kim, S.H. Kim, Y.G. Sun, H. Seo, et al. Deep reinforced segment selection and equalization for task-oriented semantic communication. IEEE Commun Lett, 28 (8) (2024), pp. 1865-1869.

[25]	H. Zhang, M. Tao, Y. Sun, K.B. Letaief. Improving learning-based semantic coding efficiency for image transmission via shared semantic-aware codebook. IEEE Trans Commun, 73 (2) (2024), pp. 1217-1232.

[26]	Chen D, Hua W. Hierarchical VAE based semantic communications for POMDP tasks. In:Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing; 2024 Apr 14-19; Seoul, Republic of Korea. New York City: IEEE; 2024. p. 5540-4.

[27]	Zhang G, Li H, Cai Y, Hu Q, Yu G, Zhang R.Learned image transmission with hierarchical variational autoencoder. 2024. arXiv:2408. 16340.

[28]	Li S, Sun Y, Zhang J, Cai K, Cui S, Xu X.End-to-end generative semantic communication powered by shared semantic knowledge base. 2024. arXiv:2405. 05738.

[29]	Xie B, Wu Y, Shi Y, Zhang W, Cui S, Debbah M.Robust image semantic coding with learnable CSI fusion masking over MIMO fading channels. 2024. arXiv:2406. 07389.

[30]	S. Ma, W. Qiao, Y. Wu, H. Li, G. Shi, D. Gao, et al. Task-oriented explainable semantic communications. IEEE Trans Wirel Commun, 22 (12) (2023), pp. 9248-9262.

[31]	Nemati M, Park J, Choi J. VQ-VAE empowered wireless communication for joint source-channel coding and beyond. In:Proceedings of IEEE Global Communications Conference; 2023 Dec 4-8; Kuala Lumpur, Malaysia. New York City: IEEE; 2023. p. 3155-60.

[32]	Q. Hu, G. Zhang, Z. Qin, Y. Cai, G. Yu, G.Y. Li. Robust semantic communications with masked VQ-VAE enabled codebook. IEEE Trans Wirel Commun, 22 (12) (2023), pp. 8707-8722.

[33]	P. Talli, F. Pase, F. Chiariotti, A. Zanella, M. Zorzi. Effective communication with dynamic feature compression. IEEE Trans Commun, 72 (9) (2024), pp. 5595-5610.

[34]	J. Choi, J. Park, S.W. Ko, J. Choi, M. Bennis, S.L. Kim. Semantics alignment via split learning for resilient multi-user semantic communication. IEEE Trans Veh Technol, 73 (10) (2024), pp. 15815-15819.

[35]	P. Si, R. Liu, L. Qian, J. Zhao, K.Y. Lam. Post-deployment fine-tunable semantic communication. IEEE Trans Wirel Commun, 240 (1) (2024), pp. 35-50.

[36]	Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. In:Proceedings of International Conference on Neural Information Processing Systems; 2014 Dec 8-13; Montreal, QC, Canada. New York City: Curran Associates Inc.; 2014. p. 2672-80.

[37]

Han T, Tang J, Yang Q, Duan Y, Zhang Z, Shi Z. Generative model based highly efficient semantic communication approach for image transmission. In:Proceedings of IEEE International Conference on Aacoustics, Speech and Signal processing; 2023 Jun 4-10; Rhodes Island, Greece. New York City: IEEE; 2023. p. 1-5.

[38]	Huang D, Tao X, Gao F, Lu J. Deep learning-based image semantic coding for semantic communications. In:Proceedings of IEEE Global Communications Conference; 2021 Dec 7-11; Madrid, Spain. New York City: IEEE; 2021. p. 1-6.

[39]	Tang S, Yang Q, Gündüz D, Zhang Z.Evolving semantic communication with generative model. 2024. arXiv:2403. 20237.

[40]	H. Zhang, S. Shao, M. Tao, X. Bi, K.B. Letaief. Deep learning-enabled semantic communication systems with task-unaware transmitter and dynamic data. IEEE J Sel Areas Commun, 41 (1) (2022), pp. 170-185

[41]	He Q, Yuan H, Feng D, Che B, Chen Z, Xia XG. Robust semantic transmission of images with generative adversarial networks. In:Proceedings of IEEE Global Communications Conference; 2022 Dec 4-8; Rio de Janeiro, Brazil. New York City: IEEE; 2022. p. 3953-8.

[42]	Wang J, Wang S, Dai J, Si Z, Zhou D, Niu K. Perceptual learned source-channel coding for high-fidelity image semantic transmission. In:Proceedings of IEEE Global Communications Conference; 2022 Dec 4-8; Rio de Janeiro, Brazil. New York City: IEEE; 2022. p. 3959-64.

[43]	Xin G, Fan P, Letaief KB, Peng C. Deep conditional generative semantic communication for image transmission. In:Proceedings of IEEE International Conference on Communications Workshops; 2024 Jun 9-13; Denver, CO, USA. New York City: IEEE; 2024. p. 1073-8.

[44]	K. Tan, J. Dai, Z. Liu, S. Wang, X. Qin, W. Xu, et al. Rate-distortion-perception controllable joint source-channel coding for high-fidelity generative semantic communications. IEEE Trans Cognit Commun Netw, 11 (2) (2024), pp. 672-689.

[45]	E. Erdemir, T.Y. Tung, P.L. Dragotti, D. Gündüz. Generative joint source-channel coding for semantic image transmission. IEEE J Sel Areas Commun, 41 (8) (2023), pp. 2645-2657

[46]	E. Bourtsoulatze, D. Burth Kurka, D. Gündüz. Deep joint source-channel coding for wireless image transmission. IEEE Trans Cogn Commun Netw, 5 (3) (2019), pp. 567-579

[47]	K. Yu, Q. He, G. Wu. Two-way semantic communications without feedback. IEEE Trans Veh Technol, 73 (6) (2024), pp. 9077-9082

[48]	C. Dong, H. Liang, X. Xu, S. Han, B. Wang, P. Zhang. Semantic communication system based on semantic slice models propagation. IEEE J Sel Areas Commun, 41 (1) (2022), pp. 202-213

[49]	M.U. Lokumarambage, V.S. Gowrisetty, H. Rezaei, T. Sivalingam, N. Rajatheva, A. Fernando. Wireless end-to-end image transmission system using semantic communications. IEEE Access, 11 (2023), pp. 37149-37163

[50]

Miao Y, Yan J, Wang Y, Li Z, Hu D. A semantic communication system based on vector quantization and generative model. In:Proceedings of IEEE International Conference on Communications, Information System and Computer Engineering; 2024 May 10-12; Guangzhou, China. New York City: IEEE; 2024. p. 46-50.

[51]	Zhou Y, Sun Y, Chen G, Xu X, Chen H, Huang B, et al. MOC-RVQ: multilevel codebook-assisted digital generative semantic communication. 2024. arXiv:2401.01272.

[52]	Q. Fu, H. Xie, Z. Qin, G. Slabaugh, X. Tao.Vector quantized semantic communication system. IEEE Wirel Commun Lett, 12 (6) (2023), pp. 982-986.

[53]	J. Mao, K. Xiong, M. Liu, Z. Qin, W. Chen, P. Fan, et al. A GAN-based semantic communication for text without CSI. IEEE Trans Wirel Commun, 23 (10) (2024), pp. 14498-14514.

[54]	L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, et al. Diffusion models: a comprehensive survey of methods and applications. ACM Comput Surv, 56 (4) (2023), pp. 1-39.

[55]	Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. In:Proceedings of International Conference on Neural Information Processing Systems; 2020 Dec 6-12; Vancouver, BC, Canada. New York City: Curran Associates Inc.; 2020. p. 6840-51.

[56]	Song Y, Ermon S. Generative modeling by estimating gradients of the data distribution. In:Proceedings of International Conference on Neural Information Processing Systems; 2019 Dec 8-14; Vancouver, BC, Canada. New York City: Curran Associates Inc.; 2019. p. 11918-30.

[57]	Song Y, Sohl-Dickstein J, Kingma DP, Kumar A, Ermon S, Poole B. Score-based generative modeling through stochastic different equations. In:Proceedings of International Conference on Learning Representations; 2021 May 3-7; Vienna, Austria. New York City: Curran Associates Inc.; 2021. p. 37799-812.

[58]	Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B. High-resolution image synthesis with latent diffusion models. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022 Jun 19-24; New Orleans, LA, USA. New York City: IEEE; 2022. p. 10684-95.

[59]

Kollovieh M, Ansari AF, Bohlke-Schneider M, Zschiegner J, Wang H, Wang YB. Predict, refine, synthesize:self-guiding diffusion models for probabilistic time series forecasting. In:Proceedings of the 37th International Conference on Neural Information Processing Systems; 2023 Dec 10-15; Vancouver, BC, Canada. New York City: Curran Associates Inc.; 2023. p. 28341-64.

[60]

Jiang Z, Liu X, Yang G, Li W, Li A, Wang G. DIFFSC:semantic communication framework with enhanced denoising through diffusion probabilistic models. In:Proceedings of IEEE International Conference on Acoustics, Speech and Signal Pprocessing; 2024 Apr 14-19; Seoul, Republic of Korea. New York City: IEEE; 2024. p. 13071-5.

[61]	Guo L, Chen W, Sun Y, Ai B, Pappas N, Quek T.Diffusion-driven semantic communication for generative models with bandwidth constraints. 2024. arXiv:2407. 18468.

[62]

Chen J, You D, Gündüz D, Dragotti PL. CommIN:semantic image communications as an inverse problem with INN-guided diffusion models. In:Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing; 2024 Apr 14-19; Seoul, Republic of Korea. New York City: IEEE; 2024. p. 6675-9.

[63]	Yuan W, Ren J, Wang C, Zhang R, Wei J, Kim DI, et al. Generative semantic communication for joint image transmission and segmentation. 2024. arXiv:2411.18005.

[64]

Qiang C, Li H, Ni H, Qu H, Fu R, Wang T, et al. Minimally-supervised speech synthesis with conditional diffusion model and language model:a comparative study of semantic coding. In:Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing; 2024 Apr 14-19; Seoul, Republic of Korea. New York City: IEEE; 2024. p. 10186-90.

[65]	Grassucci E, Barbarossa S, Comminiello D.Generative semantic communication: diffusion models beyond bit recovery. 2023. arXiv:2306. 04321.

[66]	Grassucci E, Choi J, Park J, Gramaccioni RF, Cicchetti G, Comminiello D.Rethinking multi-user semantic communications with deep generative models. 2024. arXiv:2405. 09866.

[67]	Pignata G, Grassucci E, Cicchetti G, Comminiello D.Lightweight diffusion models for resource-constrained semantic communication. 2024. arXiv:2410. 02491.

[68]	Zhang K, Li L, Lin W, Yan Y, Cheng W, Han Z.SC-CDM: enhancing quality of image semantic communication with a compact diffusion model. 2024. arXiv:2410. 02121.

[69]	Fan S, Bao Z, Dong C, Liang H, Xu X, Zhang P.Semantic feature decomposition based semantic communication system of images with large-scale visual generation models. 2024. arXiv:2410. 20126.

[70]	L. Qiao, M.B. Mashhadi, Z. Gao, C.H. Foh, P. Xiao, M. Bennis. Latency-aware generative semantic communications with pre-trained diffusion models. IEEE Wirel Commun Lett, 13 (10) (2024), pp. 2652-2656.

[71]	Wang Y, Yang W, Xiong Z, Zhao Y, Mao S, Quek TQ, et al. FAST-GSC: fast and adaptive semantic transmission for generative semantic communication. 2024. arXiv:2407.15395.

[72]	Pezone F, Musa O, Caire G, Barbarossa S. Semantic-preserving image coding based on conditional diffusion models. In:Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing; 2024 Apr 14-19; Seoul, Republic of Korea. New York City: IEEE; 2024. p. 13501-5.

[73]	Fu W, Xu L, Wu X, Wei H, Wang L. Multimodal generative semantic communication based on latent diffusion model. In:Proceedings of IEEE International Workshop on Machine Learning for Signal Processing; 2024 Sep 22-25; London, UK. New York City: IEEE; 2024. p. 1-6.

[74]	C. Mingkai, L. Minghao, Z. Zhe, X. Zhiping, W. Lei. Task-oriented semantic communication with foundation models. China Commun, 21 (7) (2024), pp. 65-77.

[75]	Jiang P, Wen CK, Li X, Jin S. Adaptive semantic image transmission using generative foundation model. In:Proceedings of IEEE International Workshop on Machine Learning for Signal Processing; 2024 Sep 22-25; London, UK. New York City: IEEE; 2024. p. 1-6.

[76]	Grassucci E, Marinoni C, Rodriguez A, Comminiello D. Diffusion models for audio semantic communication. In:Proceedings of IEEE International Conference on Acoustics, Speech and Signal processing; 2024 Apr 14-19; Seoul, Republic of Korea. New York City: IEEE; 2024. p. 13136-40.

[77]	M. Chen, M. Liu, C. Wang, X. Song, Z. Zhang, Y. Xie, et al. Cross-modal graph semantic communication assisted by generative AI in the metaverse for 6G. Research, 7 (2024), p. 0342.

[78]

Yang M, Gao D, Xie F, Li J, Song X, Shi G. SG2SC:a generative semantic communication framework for scene understanding-oriented image transmission. In:Proceedings of IEEE International Conference on Acoustics, Speech and Signal processing; 2024 Apr 14-19; Seoul, Republic of Korea. New York City: IEEE; 2024. p. 13486-90.

[79]	Zhang H, Bao Z, Liang H, Liu Y, Dong C, Li L. Diffusion-based wireless semantic communication for VR image. In:Proceedings of IEEE/CIC International Conference on Communications in China Workshop; 2024 Aug 7-9; Hangzhou, China. New York City: IEEE; 2024. p. 639-44.

[80]	Xu B, Meng R, Chen Y, Xu X, Dong C, Sun H. Latent semantic diffusion-based channel adaptive de-noising SemCom for future 6G systems. In:Proceedings of IEEE Global Communications Conference; 2023 Dec 4-8; Kuala Lumpur, Malaysia. New York City: IEEE; 2023. p. 1229-34.

[81]	T. Wu, Z. Chen, D. He, L. Qian, Y. Xu, M. Tao, et al. CDDM: channel denoising diffusion models for wireless semantic communications. IEEE Trans Wirel Commun, 23 (9) (2024), pp. 11168-11183.

[82]	Li N, Deng Y.Goal-oriented semantic communication for wireless image transmission via stable diffusion. 2024. arXiv:2408. 00428.

[83]	Pei J, Feng C, Wang P, Tabassum H, Shi D.Latent diffusion model-enabled real-time semantic communication considering semantic ambiguities and channel noises. 2024. arXiv:2406. 06644.

[84]	Zeng Y, He X, Chen X, Tong H, Yang Z, Guo Y, et al. DMCE:diffusion model channel enhancer for multi-user semantic communication systems. In:Proceedings of IEEE International Conference on Communications; 2024 Jun 9-13; Denver, CO, USA. New York City: IEEE; 2024. p. 855-60.

[85]	Jiang F, Peng Y, Dong L, Wang K, Yang K, Pan C, et al. Large generative model assisted 3D semantic communication. 2024. arXiv:2403.05783.

[86]	Duan Y, Wu T, Chen Z, Tao M. DM-MIMO:diffusion models for robust semantic communications over MIMO channels. In:Proceedings of IEEE/CIC International Conference on Communications in China; 2024 Aug 7-9; Hangzhou, China. New York City: IEEE; 2024. p. 1609-14.

[87]	Yang P, Zhang G, Cai Y.Rate-adaptive generative semantic communication using conditional diffusion models. 2024. arXiv:2409. 02597.

[88]	Xie H, Qin Z, Han Z, Letaief KB.Hybrid digital-analog semantic communications. 2024. arXiv:2405. 12580.

[89]	Ren X, Wu J, Xu H, Chen X. Diffusion model based secure semantic communications with adversarial purification. In:Proceedings of the IEEE International Conference on Big Data Security on Cloud; 2024 May 10-23; New York City, NY, USA. New York City: IEEE; 2024. p. 130-4.

[90]	H. Du, J. Wang, D. Niyato, J. Kang, Z. Xiong, D.I. Kim. AI-generated incentive mechanism and full-duplex semantic communications for information sharing. IEEE J Sel Areas Commun, 41 (9) (2023), pp. 2981-2997

[91]	J. Zheng, B. Du, H. Du, J. Kang, D. Niyato, H. Zhang. Energy-efficient resource allocation in generative AI-aided secure semantic mobile networks. IEEE Trans Mobile Comput, 23 (12) (2024), pp. 11422-11435

[92]	Liu J, Xiao M, Wen J, Kang J, Zhang R, Zhang T, et al. Optimizing resource allocation for multi-modal semantic communication in mobile AIGC networks: a diffusion-based game approach. 2024. arXiv:2409.17506.

[93]	B. Du, H. Du, H. Liu, D. Niyato, P. Xin, J. Yu, et al. YOLO-based semantic communication with generative AI-aided resource allocation for digital twins construction. IEEE Internet Things J, 11 (5) (2023), pp. 7664-7678.

[94]	Xu C, Mashhadi MB, Ma Y, Tafazolli R.Semantic-aware power allocation for generative semantic communications with foundation models. 2024. arXiv:2407. 03050.

[95]	Zhao WX, Zhou K, Li J, Tang T, Wang X, Hou Y, et al. A survey of large language models. 2023. arXiv:2303.18223.

[96]	Wan Z, Wang X, Liu C, Alam S, Zheng Y, Liu J, et al. Efficient large language models: a survey. 2023. arXiv:2312.03863.

[97]

Kenton JD, Toutanova LK. BERT:pre-training of deep bidirectional transformers for language understanding. In:Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2019 Jun 2-7; Minneapolis, MN, USA. New York City: Association for Computing Machiner; 2019. p. 4171-86.

[98]	Open AI; Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, et al. GPT-4 technical report. 2023. arXiv:2303.08774.

[99]	C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res, 21 (140) (2020), pp. 1-67.

[100]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In:Proceedings of International Conference on Neural Information Processing Systems; 2017 Dec 4-9; Long Beach, CA, USA. New York City: Curran Associates Inc.; 2017. p. 5998-6008.

[101]

Wang Y, Sun Z, Fan J, Ma H. On the uses of large language models to design end-to-end learning semantic communication. In:Proceedings of the IEEE Wireless Communications and Networking Conference; 2024 Apr 21-24; Dubai, United Arab Emirate. New York City: IEEE; 2024. p. 1-6.

[102]

M.K. Chang, C.T. Hsu, G.C. Yang. GenSC: generative semantic communication systems using BART-like model. IEEE Commun Lett, 28 (10) (2024), pp. 2298-2302.

[103]

Wang Z, Zou L, Wei S, Liao F, Zhuo J, Mi H, et al. Large language model enabled semantic communication systems. 2024. arXiv:2407.14112.

[104]

Wei X, Tong H, Yang N, Yin C.Language-oriented semantic communication for image transmission with fine-tuned diffusion model. 2024. arXiv:2409. 17104.

[105]

Nam H, Park J, Choi J, Bennis M, Kim SL. Language-oriented communication with semantic coding and knowledge distillation for text-to-image generation. In:Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing; 2024 Apr 14-19; Seoul, Republic of Korea. New York City: IEEE; 2024. p. 13506-10.

[106]

Y. Zhao, Y. Yue, S. Hou, B. Cheng, Y. Huang. LaMoSC: large language model-driven semantic communication system for visual transmission. IEEE Trans Cognit Commun Netw, 10 (6) (2024), pp. 2005-2018.

[107]

Du H, Liu G, Niyato D, Zhang J, Kang J, Xiong Z, et al. Generative Al-aided joint training-free secure semantic communications via multi-modal prompts. In:Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing; 2024 Apr 14-19; Seoul, Republic of Korea. New York City: IEEE; 2024. p. 12896-900.

[108]

Wang X, Ye D, Feng C, Yang HH, Chen X, Quek TQ.Trustworthy image semantic communication with GenAI: explainablity, controllability, and efficiency. 2024. arXiv:2408. 03806.

[109]

Ren M, Qiao L, Yang L, Gao Z, Chen J, Mashhadi MB, et al. Generative semantic communication via textual prompts: latency performance tradeoffs. 2024. arXiv:2409.09715.

[110]

Cao D, Wu J, Bashir AK. Multimodal large language models driven privacy-preserving wireless semantic communication in 6G. In:Proceedings of IEEE International Conference on Communications Workshops; 2024 Jun 9-13; Denver, CO, USA. New York City: IEEE; 2024. p. 171-6.

[111]

F. Zhao, Y. Sun, L. Feng, L. Zhang, D. Zhao. Enhancing reasoning ability in semantic communication through generative AI-assisted knowledge construction. IEEE Commun Lett, 28 (4) (2024), pp. 832-836

[112]

F. Jiang, Y. Peng, L. Dong, K. Wang, K. Yang, C. Pan, et al. Large AI model-based semantic communications. IEEE Wirel Commun, 31 (3) (2024), pp. 68-75.

[113]

Guo S, Wang Y, Ye J, Zhang A, Xu K.Semantic importance-aware communications with semantic correction using large language models. 2024. arXiv:2405. 16011.

[114]

Jiang F, Tang C, Dong L, Wang K, Yang K, Pan C.Visual language model based cross-modal semantic communication systems. 2024. arXiv:2407. 00020.

[115]

Zheng J, Ren J, Xu P, Yuan Z, Xu J, Wang F, et al. Generative semantic communication for text-to-speech synthesis. 2024. arXiv:2410.03459.

[116]

F. Jiang, L. Dong, Y. Peng, K. Wang, K. Yang, C. Pan, et al. Large AI model empowered multimodal semantic communications. IEEE Commun Mag, 63 (1) (2025), pp. 76-82.

[117]

Yang W, Xiong Z, Mao S, Quek TQ, Zhang P, Debbah M, et al. Rethinking generative semantic communication for multi-user systems with multi-modal LLM. 2024. arXiv:2408.08765.

[118]

Xie H, Qin Z, Tao X, Han Z.Towards intelligent communications: large model empowered semantic communications. 2024. arXiv:2402. 13073.

[119]

P. Jiang, C.K. Wen, X. Yi, X. Li, S. Jin, J. Zhang. Semantic communications using foundation models: design approaches and open issues. IEEE Wirel Commun, 31 (3) (2024), pp. 76-84.

[120]

Zhang F, Du Y, Chen K, Shao Y, Liew SC.Addressing out-of-distribution challenges in image semantic communication systems with multi-modal large language models. 2024. arXiv:2407. 15335.

[121]

Jiang P, Wen CK, Li X, Jin S, Li GY. Semantic satellite communications based on generative foundation model. 2024. arXiv:2404.11941.

[122]

Chen W, Xu W, Chen H, Zhang X, Qin Z, Zhang Y, et al. Semantic communication based on large language model for underwater image transmission. 2024. arXiv:2408.12616.

[123]

Kalita A.Large language models (LLMs) for semantic communication in edge-based IoT networks. 2024. arXiv:2407. 20970.

[124]

Durante Z, Huang Q, Wake N, Gong R, Sung Park J, Sarkar B, et al. Agent AI: surveying the horizons of multimodal interaction. 2024. arXiv:2401.03568.

[125]

Girdhar R, El-Nouby A, Liu Z, Singh M, Alwala KV, Joulin A, et al. ImageBind:one embedding space to bind them all. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023 Jun 18-22; Vancouver, BC, Canada. New York City: IEEE; 2023. p. 15180-90.

[126]

Wang X, Zhang X, Luo Z, Sun Q, Cui Y, Wang J, et al. Emu3: next-token prediction is all you need. 2024. arXiv:2409.18869.

[127]

Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, et al. LLaMA 2: open foundation and fine-tuned chat models. 2023. arXiv:2307.09288.

[128]

Han Z, Gao C, Liu J, Zhang J, Zhang SQ.Parameter-efficient fine-tuning for large models: a comprehensive survey. 2024. arXiv:2403. 14608.

[129]

Zhang D, Yu Y, Dong J, Li C, Su D, Chu C, et al. MM-LLMs: recent advances in multimodal large language models. 2024. arXiv:2401.13601.

[130]

Faisal MM. YOLOv8-deepsort-object-tracking 2023 [Internet]. San Francisco: GetHub; [cited 2025 Jun 30]. Available from: https://github.com/MuhammadMoinFaisal/YOLOv8-DeepSORT-Object-Tracking.

[131]

Chen Z, Wu J, Wang W, Su W, Chen G, Xing S, et al. InternVL: scaling up vision foundation models and aligning for generic visual-linguistic tasks. 2023. arXiv:2312.14238.

[132]

Yang A, Yang B, Hui B, Zheng B, Yu B, Zhou C, et al. Qwen2 technical report. 2024. arXiv:2407.10671.

[133]

Tanner H. All-MiniLM-L6-v2 2024 [Internet]. San Francisco: GetHub; [cited 2025 Jun 30]. Available from: https://github.com/henrytanner52/all-MiniLM-L6-v2.

[134]

Liu ZS, Siu WC, Wang LW, Li CT, Cani MP. Unsupervised real image super-resolution via generative variational autoencoder. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition workshops; 2020 Jun 16-18; Seattle, WA, USA. New York City: IEEE; 2020. p. 442-3.

[135]

Wang X, Xie L, Dong C, Shan Y. Real-ESRGAN:training real-world blind super-resolution with pure synthetic data. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021 Jun 20-25; Nashville, TN, USA. New York City: IEEE; 2021. p. 1905-14.

[136]

Lin X, He J, Chen Z, Lyu Z, Dai B, Yu F, et al. DiffBIR: towards blind image restoration with generative diffusion prior. 2023. arXiv:2308.15070.

[137]

Y. Lin, L. Zheng, Z. Zheng, Y. Wu, Z. Hu, C. Yan, et al. Improving person re-identification by attribute and identity learning. Pattern Recogn, 95 (2019), pp. 151-161.

[138]

E. Sisinni, A. Saifullah, S. Han, U. Jennehag, M. Gidlund. Industrial Internet of Things: challenges, opportunities, and directions. IEEE Trans Industr Inform, 14 (11) (2018), pp. 4724-4734.

[139]

Lu J, Yang W, Xiong Z, Xing C, Tafazolli R, Quek TQ, et al. Generative AI-enhanced multi-modal semantic communication in Internet of Vehicles: system design and methodologies. 2024. arXiv:2409.15642.

[140]

Y. Lin, Z. Gao, H. Du, D. Niyato, J. Kang, A. Jamalipour, et al. A unified framework for integrating semantic communication and AI-generated content in metaverse. IEEE Netw, 38 (4) (2023), pp. 174-181.

[141]

D. Huang, M. Ge, K. Xiang, X. Zhang, H. Yang. Privacy preservation of large language models in the metaverse era: research frontiers, categorical comparisons, and future directions. Int J Netw Manag, 35 (1) (2024), p. e2292.

[142]

Kurai R, Hiraki T, Hiroi Y, Hirao Y, Perusquia-Hernandez M, Uchiyama H, et al. MagicItem: dynamic behavior design of virtual objects with large language models in a consumer metaverse platform. 2024. arXiv:2406.13242.

[143]

Xiao G, Lin J, Seznec M, Wu H, Demouth J, Han S. SmoothQuant:accurate and efficient post-training quantization for large language models. In:Proceedings of International Conference on Machine Learning; 2023 Jul 23-29; Honolulu, HI, USA: New York City: Association for Computing Machinery; 2023. p. 38087-99.

[144]

Ma X, Fang G, Wang X. On the structural pruning of large language models. In:Proceedings of International Conference on Neural Information Processing Systems; 2023 Dec 10-16; New Orleans, LA, USA. New York City: Curran Associates Inc.; 2023. p. 21702-20.

[145]

Gu Y, Dong L, Wei F, Huang M. MiniLLM:knowledge distillation of large language models. In:Proceedings of International Conference on Learning Representations; 2024 May 7-11; Vienna, Austria. Trier: dblp: computer science bibliography; 2024.

[146]

DeepSeek AI, Liu A, Feng B, Xue B, Wang B, Wu B, et al. DeepSeek-v3 technical report. 2024. arXiv:2412.19437.

[147]

Saha R, Sagan N, Srivastava V, Goldsmith A, Pilanci M. Compressing large language models using low rank and low precision decomposition. In: Proceedings of the International Conference on Neural Information Processing Systems; 2024 Dece 10-15; Vancouver, BC, Canada. New York City: Curran Associates Inc.; 2024. p. 88981-9018.

[148]

Xue N, Sun Y, Chen Z, Tao M, Xu X, Qian L, et al. WDMoE: wireless distributed large language models with mixture of experts. 2024. arXiv:2405.03131.

[149]

Wu D, Wang X, Qiao Y, Wang Z, Jiang J, Cui S, et al. NetLLM:adapting large language models for networking. In:Proceedings of the ACM SIGCOMM Conference; 2024 Aug 4-8; Sydney, NSW, Australia. New York City: Association for Computing Machinery; 2024. p. 661-78.

[150]

L. Wang, X. Zhang, H. Su, J. Zhu. A comprehensive survey of continual learning: theory, method and application. IEEE Trans Pattern Anal Mach Intell, 46 (8) (2024), pp. 5362-5383.

[151]

Li S, Sun Y, Zhang J, Cai K, Chen H, Cui S, et al. Cooperative semantic knowledge base update policy for multiple semantic communication pairs. 2024. arXiv:2410.02405.

[152]

Wang X, Peng J, Xu K, Yao H, Chen T. Reinforcement learning-driven LLM agent for automated attacks on LLMs. In:Proceedings of the Fifth Workshop on Privacy in Natural Language Processing; 2024 Aug 15; Bangkok, Thailand. New York City: Association for Computing Machinery; 2024. p. 170-7.

[153]

Jiang F, Xu Z, Niu L, Wang B, Jia J, Li B, et al. POSTER:identifying and mitigating vulnerabilities in LLM-integrated applications. In:Proceedings of the ACM Asia Conference on Computer and Communications Security; 2024 Jul 1-5; Singapore. New York City: Association for Computing Machinery; 2024. p. 1949-51.

[154]

Yu L, Do V, Hambardzumyan K, Cancedda N.Robust LLM safeguarding via refusal feature adversarial training. 2024. arXiv:2409. 20089.

[155]

Xu R, Li G, Yang Z, Chen M, Liu Y, Li J. Covert and reliable semantic communication against cross-layer privacy inference over wireless edge networks. In: Proceedings of IEEE Wireless Communications and Networking Conference; 2024 Apr 21-24; Dubai, United Arab Emirates. New York City: IEEE; 2024. p. 1-6.

[156]

Y. Lin, Z. Gao, H. Du, D. Niyato, J. Kang, Z. Xiong, et al. Blockchain-based efficient and trustworthy AIGC services in metaverse. IEEE Trans Serv Comput, 17 (5) (2024), pp. 2067-2079.