Generative Video Communications: Concepts, Key Technologies, and Future Research Trends

Engineering ›› DOI: 10.1016/j.eng.2025.06.018

review-article

Author information +

History +

PDF (3215KB)

Abstract

With the rapid growth of video traffic and the evolution of video formats, traditional video communication systems are encountering many challenges, such as limited data compression capacity, high energy consumption, and a narrow range of services. These challenges stem from the constraints of current systems, which rely heavily on discriminative methods for visual content reconstruction and achieve communication gains only in the information and physical domains. To address these issues, this paper introduces generative video communication, a novel paradigm that leverages generative artificial intelligence technologies to enhance video content expression. The core objective is to improve the expressive capabilities of video communication by enabling new gains in the cognitive domain (i.e., content dimension) while complementing existing frameworks. This paper presents key technical pathways for the proposed paradigm, including elastic encoding, collaborative transmission, and trustworthy evaluation, and explores its potential applications in task-oriented and immersive communication. Through this generative approach, we aim to overcome the limitations of traditional video communication systems, offering more efficient, adaptable, and immersive video services.

Graphical abstract

Keywords

Video communications / Video compression / Video transmission / Video Evaluation

Cite this article

Download citation ▾

Wenjun Zhang, Guo Lu, Zhiyong Chen, Geoffrey Ye Li. Generative Video Communications: Concepts, Key Technologies, and Future Research Trends. Engineering DOI:10.1016/j.eng.2025.06.018

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Zheng J, Meister M.The unbearable slowness of being: why do we live at 10 bits/s?.Neuron 2025; 113(2):192-204.

[2]	Kingma DP, Welling M.Auto-encoding variational bayes.2013. arXiv: 1312.6114.

[3]	Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al.Generative adversarial nets.In: Proceedings of the 28th International Conference on Neural Information Processing Systems; 2014 Dec 8–13; Montreal, Q C, Canada. New York City: IEE E; 2014. p. 2672–80.

[4]	Ho J, Jain A, Abbeel P.Denoising diffusion probabilistic models.In: Proceedings of the 34th International Conference on Neural Information Processing Systems; 2020 Dec 6–12; Vancouver, B C, Canada. Red Hook: Curran Associates Inc.; 2020. p. 6840–51.

[5]	Tian Y, Lu G, Yan Y, Zhai G, Chen L, Gao Z.A coding framework and benchmark towards low-bitrate video understanding.IEEE Trans Pattern Anal Mach Intell 2024; 46(8):5852-5872.

[6]	Sullivan GJ, Ohm JR, Han WJ, Wiegand T.Overview of the high efficiency video coding (HEVC) standard.IEEE Trans Circ Syst Video Tech 2012; 22(12):1649-1668.

[7]

Yang R, Timofte R, Van Gool L.Perceptual learned video compression with recurrent conditional GAN.In: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCA I-22); 2022 Jul 23–29; Vienna, Austria. Sacramento: International Joint Conferences on Artificial Intelligence; 2022. p. 1537–44.

[8]	Li M, Shi Y, Wang J, Huang Y.High visual-fidelity learned video compression.In: Proceedings of the 31st ACM International Conference on Multimedia; 2023 Oct 29–Nov 3; Ottawa, O N, Canada. New York City: Association for Computing Machinery; 2023. p. 8057–66.

[9]	Zhu C, Lu G, He B, Xie R, Song L.Implicit-explicit integrated representations for multi-view video compression.IEEE Trans Image Process 2025; 34:1106-1118.

[10]	Chen H, Gwilliam M, Lim SN, Shrivastava A.HNeRV: a hybrid neural representation for videos.In: Proceedings of the 2023 IEE E/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023 Jun 17–24; Vancouver, B C, Canada. New York City: IEE E; 2023. p. 10270–9.

[11]	Li C, Lu G, Feng D, Wu H, Zhang Z, Liu X, et al.MISC: ultra-low bitrate image semantic compression driven by large multimodal model.IEEE Trans Image Process 2025; 34:335-349.

[12]	Wu T, Chen Z, He D, Qian L, Xu Y, Tao M, et al.CDDM: channel denoising diffusion models for wireless semantic communications.IEEE Trans Wirel Commun 2024; 23(9):11168-11183.

[13]	Wu H, Zhang Z, Zhang W, Chen C, Liao L, Li C, et al.Q-ALIGN: teaching LMMs for visual scoring via discrete text-defined levels.In: Proceedings of the 41st International Conference on Machine Learning; 2024 Jul 21–27; Vienna, Austria. New York City: JML R; 2024. p. 54015–29.

[14]

Wu H, Zhang Z, Zhang E, Chen C, Liao L, Wang A, et al.Q-Instruct: improving low-level visual abilities for multi-modality foundation models.In: Proceedings of the 2024 IEE E/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2024 Jun 16–22; Seattle, W A, USA. New York City: IEE E; 2024. p. 25490–500.