A Survey on Large Language Model-Powered Autonomous Driving

Yuxuan Zhu , Shiyi Wang , Wenqing Zhong , Nianchen Shen , Yunqi Li , Siqi Wang , Zhiheng Li , Cathy Wu , Zhengbing He , Li Li

Engineering ››

PDF (817KB)
Engineering ›› DOI: 10.1016/j.eng.2025.07.038
review-article
A Survey on Large Language Model-Powered Autonomous Driving
Author information +
History +
PDF (817KB)

Abstract

Artificial intelligence (AI) plays a crucial role in autonomous driving (AD), advancing its development toward greater intelligence and efficiency. In response to persistent challenges in current AD algorithms, many researchers believe that large language models (LLMs), with their powerful reasoning capabilities and extensive knowledge, may offer promising solutions, enabling AD systems to achieve deeper understanding and more informed decision-making. Both industry and academia have actively explored the application of LLMs in AD tasks, showing early signs of progress in addressing issues such as the long-tail problem. To examine whether and how LLMs can enhance AD, this paper provides a comprehensive analysis of their potential applications, including their optimization strategies in both modular and end-to-end approaches, with a particular focus on how LLMs can address existing problems and challenges in current solutions. Furthermore, we explore an important question: Can LLM-based artificial general intelligence (AGI) serve as a key for achieving high-level AD? We also analyze the potential limitations and challenges LLMs may face in advancing AD technology and extend the discussion to societal considerations, including critical safety and security concerns. This survey aims to provide a foundational reference for cross-disciplinary researchers and help guide future research directions.

Keywords

Large language models / Autonomous driving / Artificial general intelligence / End-to-end / ChatGPT

Cite this article

Download citation ▾
Yuxuan Zhu, Shiyi Wang, Wenqing Zhong, Nianchen Shen, Yunqi Li, Siqi Wang, Zhiheng Li, Cathy Wu, Zhengbing He, Li Li. A Survey on Large Language Model-Powered Autonomous Driving. Engineering DOI:10.1016/j.eng.2025.07.038

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Tang Y, Dai X, Zhao C, Cheng Q, Lv Y.Large language model-driven urban traffic signal control.In: Proceedings of the 2024 Australian & New Zealand Control Conference (ANZCC); 2024 Feb 1–2; Gold Coast, Australia. New York City: IEE E; 2024. p. 67–71.

[2]

Li L.Advanced motion control and sensing for intelligent vehicles. Springer US, Boston (2007)

[3]

Janai J, Güney F, Behl A, Geiger A.Computer vision for autonomous vehicles: problems, datasets and state of the art.Found Trends Comput Graph Vis 2020; 12(1–3):1-308.

[4]

Li Z, Wang W, Li H, Xie E, Sima C, Lu T, et al.BEVFormer: learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers.IEEE Trans Pattern Anal Mach Intell 2024; 47(3):3515454.

[5]

Varghese R, Sambath M.YOLOv8: a novel object detection algorithm with enhanced performance and robustness.In: Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS); 2024 Apr 18–19; Chennai, India. New York City: IEE E; 2024. p. 1–6.

[6]

Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q.CenterNet: keypoint triplets for object detection.In: Proceedings of the 2019 IEE E/CVF International Conference on Computer Vision (ICC V 2019); 2019 Oct 27–Nov 2; Seoul, South Korea. New York City: IEE E; 2019. p. 6568–77.

[7]

Garcia-Garcia A, Gomez-Donoso F, Garcia-Rodriguez J, Orts-Escolano S, Cazorla M, Azorin-Lopez J.PointNet: a 3D convolutional neural network for real-time object class recognition.In: Proceedings of the 2016 International Joint Conference on Neural Networks, IJCN N2016; 2016 Jul 24–29; Vancouver, B C, Canada. New York City: IEE E; 2016. p. 1578–84.

[8]

Ridel D, Rehder E, Lauer M, Stiller C, Wolf D.A literature review on the prediction of pedestrian behavior in urban scenarios.In: Proceedings of the International Conference on Intelligent Transportation Systems, ITS C2018; 2018 Nov 4–7; Maui, H I, USA. New York City: IEE E; 2018. p. 3105–12.

[9]

Mozaffari S, Al-Jarrah OY, Dianati M, Jennings P, Mouzakitis A.Deep learning-based vehicle behaviour prediction for autonomous driving applications: a review.IEEE Trans Intell Transp Syst 2022; 23(1):33-47.

[10]

Huang Y, Du J, Yang Z, Zhou Z, Zhang L, Chen H.A survey on trajectory-prediction methods for autonomous driving.IEEE Trans Intell Veh 2022; 7(3):652-674.

[11]

Min K, Kim D, Park J, Huh K.RNN-based path prediction of obstacle vehicles with deep ensemble.IEEE Trans Veh Technol 2019; 68(10):10252-10256.

[12]

Huang Z, Wang J, Pi L, Song X, Yang L.LSTM based trajectory prediction model for cyclist utilizing multiple interactions with environment.Pattern Recogn 2021; 112:107800.

[13]

Nikhil N, Morris BT.Convolutional neural network for trajectory prediction.In: Leal- Taixé L, Roth S, editors. Proceedings of the Computer Vision-ECC V 2018 Workshops; 2018 Sep 8–14; Munich, Germany. Cham: Springer, Cham; 2018. p. 186–96.

[14]

Zhang K, Feng X, Wu L, He Z.Trajectory prediction for autonomous driving using spatial-temporal graph attention transformer.IEEE Trans Intell Transp Syst 2022; 23(11):22343-22353.

[15]

Zhang Y, Zhu Z, Zheng W, Huang J, Huang G, Zhou J, et al.BEVerse: unified perception and prediction in birds-eye-view for vision-centric autonomous driving.2022. arXiv: 2205.09743.

[16]

Guo Y, Yao D, Li B, He Z, Gao H, Li L.Trajectory planning for an autonomous vehicle in spatially constrained environments.IEEE Trans Intell Transp Syst 2022; 23(10):18326-18336.

[17]

Guo Y, Guo Z, Wang Y, Yao D, Li B, Li L.A survey of trajectory planning methods for autonomous driving-part I: unstructured scenarios.IEEE Trans Intell Veh 2023; 9(9):5407-5434.

[18]

Dolgov D, Thrun S, Montemerlo M, Diebel J.Practical search techniques in path planning for autonomous driving.In: Proceedings of the AAAIWorkshop Papers 2008; 2008 Jul 13–14; Chicago, I L, USA. Washington: Association for the Advancement of Artificial Intelligence (AAAI); 2008.

[19]

LaValle S.Rapidly-exploring random trees: a new tool for path planning.Report. Ames: Computer Science Department, Iowa State University; 1998.

[20]

Bernhard J, Gieselmann R, Esterle K, Knoll A.Experience-based heuristic search: robust motion planning with deep Q-learning.In: Proceedings of the 2018 IEEE International Conference on Intelligent Transportation Systems (ITSC); 2018 Nov 4–7; Maui, H I, USA. New York City: IEEE Press; 2018. p. 3175–82.

[21]

Wang J, Chi W, Li C, Wang C, Meng MQH.Neural RRT*: learning-based optimal path planning.IEEE Trans Autom Sci Eng 2020; 17(4):1748-1758.

[22]

Ma N, Wang J, Liu J, Meng MQH.Conditional generative adversarial networks for optimal path planning.IEEE Trans Cogn Dev Syst 2022; 14(2):662-671.

[23]

Chen L, Li Y, Huang C, Xing Y, Tian D, Li L, et al.Milestones in autonomous driving and intelligent vehicles—part I: control, computing system design, communication, HD map, testing, and human behaviors.IEEE Trans Syst Man Cybern Syst 2023; 53(9):5831-5847.

[24]

Coulter RC.Implementation of the pure pursuit path tracking algorithm [dissertation]. Camegie Mellon University, Pittsburgh (1992)

[25]

Yu X, Wang H, Teng C, Sun X, Chen L, Cai Y.DGPR‐MPC: learning‐based model predictive controller for autonomous vehicle path following.IET Intel Transport Syst 2023; 17(10):1992-2003.

[26]

Zhu Y, Li Z, Wang F, Li L.Control sequences generation for testing vehicle extreme operating conditions based on latent feature space sampling.IEEE Trans Intell Veh 2023; 8(4):2712-2722.

[27]

Chib PS, Singh P.Recent advancements in end-to-end autonomous driving using deep learning.2023. arXiv: 2307.04370.

[28]

Pomerleau DA.ALVINN: an autonomous land vehicle in a neural network.In: Touretzky DS, editor. Proceedings of the 2nd International Conference on Neural Information Processing Systems; 1988 Jul 4–7; Urbino, Italy. Cambridge: MIT Press; 1988. p. 305–13.

[29]

Chen D, Krähenbühl P.Learning from all vehicles.In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022 Jun 18–24; New Orleans, L A, USA. New York City: IEE E; 2022. p. 17201–10.

[30]

Bojarski M, Del Testa D, Dworakowski D, Firner B, Flepp B, Goyal P, et al.End to end learning for self-driving cars [Internet].New York City: NVIDIA Developer; 2016 Aug 17 [cited 2024 Sep 16]. Available from: https://developer.nvidia.com/blog/deep-learning-self-driving-cars/.

[31]

Hawke J, Shen R, Gurau C, Sharma S, Reda D, Nikolov N, et al.Urban driving with conditional imitation learning.In: Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA); 2020 May 31–Aug 31; Paris, France. New York City: IEE E; 2019.

[32]

Kendall A, Hawke J, Janz D, Mazur P, Reda D, Allen JM, et al.Learning to drive in a day.In: Proceedings of the 2019 International Conference on Robotics and Automation (ICRA); 2019 May 20–24; Montreal, Q C, Canada. New York City: IEE E; 2019. p. 8248–54.

[33]

Chen D, Zhou B, Koltun V, Krähenbühl P.Learning by cheating.PMLR 2020; 100:66-75.

[34]

Vu D, Ngo B, Phan H.HybridNets: end-to-end perception network.Pattern Recognit Image Anal 2022; 35:106-118.

[35]

Hu Y, Yang J, Chen L, Li K, Sima C, Zhu X, et al.Planning-oriented autonomous driving.In: Proceedings of the 2023 IEE E/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023 Jun 17–24; Vancouver, B C, Canada. New York City: IEE E; 2023.

[36]

Graves A.Long short-term memory.A. Graves (Ed.), Supervised sequence labelling with recurrent neural networks, Springer, Berlin 2012; 37-45.

[37]

Hochreiter S, Schmidhuber J.Long short-term memory.Neural Comput 1997; 9(8):1735-1780.

[38]

Chung J, Gulcehre C, Cho K, Bengio Y.Empirical evaluation of gated recurrent neural networks on sequence modeling.In: Proceedings of the NIPS2014 Workshop on Deep Learning; 2024 Dec 8–13; Montreal, B C, Canada. Zurich: ML Research Press; 2014.

[39]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al.Attention is all you need.In: Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017 Dec 4–9; Long Beach, C A, USA. Red Hook: Curran Associates Inc.; 2023. p. 6000–10.

[40]

Devlin J, Chang MW, Lee K, Toutanova K.BERT: pre-training of deep bidirectional transformers for language understanding.In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2019 Jun 2–7; Minneapolis, M N, USA. Minneapolis: Association for Computational Linguistics (ACL); 2019. p. 4171–86.

[41]

Radford A, Narasimhan K, Salimans T, Sutskever I.Improving language understanding by generative pre-training.Technical report. San Francisco: OpenA I; 2018.

[42]

Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al.Language models are few-shot learners.In: Proceedings of the 34th International Conference on Neural Information Processing Systems; 2020 Dec 6–12; Vancouver, B C, Canada. Red Hook: Curran Associates Inc.; 2020. p. 877–1901.

[43]

OpenA I, Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, et al.GPT-4 technical report.Technical report. San Francisco: OpenA I; 2024.

[44]

Gemini Team, Anil R, Borgeaud S, Alayrac JB, Yu J, Soricut R, et al.Gemini: a family of highly capable multimodal models.2025. arXiv: 2312.11805.

[45]

.Anthropic.Introducing Claude 3. 5 Sonnet [Internet]. San Francisco: Anthropic PBC; 2024 Jun 21 [cited 2024 Sep 16]. Available from: https://www.anthropic.com/news/claude-3-5-sonnet.

[46]

Touvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, Lacroix T, et al.LLaMA: open and efficient foundation language models.2023. arXiv: 2302.13971.

[47]

.Cohere.Secure, production-ready efficiency for agentic intelligence [Internet]. Toronto: Cohere; undated [cited 2024 Sep 16]. Available from: https://cohere.com/command.

[48]

Park S, Kim H, Ro YM.Integrating language-derived appearance elements with visual cues in pedestrian detection.IEEE Trans Circuit Syst Video Technol 2023; 34(9):7975-7985.

[49]

Yang S, Liu J, Zhang R, Pan M, Guo Z, Li X, et al.LiDAR-LLM: exploring the potential of large language models for 3D LiDAR understanding.Proc Conf AAAI Artif Intell 2023; 39(9):9247-9255.

[50]

Wu D, Han W, Wang T, Liu Y, Zhang X, Shen J.Language prompt for autonomous driving.Proc Conf AAAI Artif Intell 2023; 39(8):8359-8367.

[51]

Sakaino H.Semantically enhanced scene captions with physical and weather condition changes.In: Proceedings of the 2023 IEE E/CVF International Conference on Computer Vision Workshops (ICCV W 2023); 2023 Oct 2–6; Paris, France. New York City: IEE E; 2023. p. 3656–68.

[52]

Tian X, Gu J, Li B, Liu Y, Wang Y, Zhao Z, et al.DriveVLM: the convergence of autonomous driving and large vision-language models.2024. arXiv: 2402.12289.

[53]

Wen L, Yang X, Fu D, Wang X, Cai P, Li X, et al.On the road with GPT-4V(ision): early explorations of visual-language model on autonomous driving.2023. arXiv: 2311.05332.

[54]

Renz K, Chen L, Marcu A-M, Hünermann J, Hanotte B, Karnsund A, et al.CarLLaVA: vision language models for camera-only closed-loop driving.2024. arXiv: 2406.10165.

[55]

Tanahashi K, Inoue Y, Yamaguchi Y, Yaginuma H, Shiotsuka D, Shimatani H, et al.Evaluation of large language models for decision making in autonomous driving.2023. arXiv: 2312.06351.

[56]

Fu D, Li X, Wen L, Dou M, Cai P, Shi B, et al.Drive like a human: rethinking autonomous driving with large language models.2023. arXiv: 2307.07162.

[57]

Elhafsi A, Sinha R, Agia C, Schmerling E, Nesnas IAD, Pavone M.Semantic anomaly detection with large language models.Auton Robot 2023; 47(8):1035-1055.

[58]

Tang T, Wei D, Jia Z, Gao T, Cai C, Hou C, et al.BEV-TSR: text-scene retrieval in BEV space for autonomous driving.Proc Conf AAAI Artif Intell 2024; 39(7):7275-7283.

[59]

Dong Z, Zhang W, Huang X, Ji H, Zhan X, Chen J.HuBo-VLM: unified vision-language model designed for human robot interaction tasks.2023. arXiv: 2308.12537.

[60]

Liao H, Shen H, Li Z, Wang C, Li G, Bie Y, et al.GPT-4 enhanced multimodal grounding for autonomous driving: leveraging cross-modal attention with large language models.Commun Transp Res 2024; 4:100116.

[61]

Choudhary T, Dewangan V, Chandhok S, Priyadarshan S, Jain A, Singh AK, et al.Talk2BEV: language-enhanced bird’s-eye view maps for autonomous driving.In: Proceedings of the 41st IEEE Conference on Robotics and Automation (ICR A 2024); 2024 May 13–17; Yokohama, Japan. New York City: IEE E; 2024. p. 16345–52.

[62]

Awadalla A, Gao I, Gardner J, Hessel J, Hanafy Y, Zhu W, et al.OpenFlamingo: an open-source framework for training large autoregressive vision-language models.In: Proceedings of the 2023 IEE E/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023 Jun 17–24; Vancouver, B C, Canada. New York City: IEE E; 2023.

[63]

Ma Y, Cao Y, Sun J, Pavone M, Xiao C.Dolphins: multimodal language model for driving.In: Proceedings of the 18th European Conference on Computer Vision-ECC V 2024; 2024 Sep 29–Oct 4; Milan, Italy. Cham: Springer, Cham; 2024.

[64]

Park S, Lee M, Kang J, Choi H, Park Y, Cho J, et al.VLAAD: vision and language assistant for autonomous driving.In: Proceedings of the 2024 IEEE Winter Conference on Applications of Computer Vision Workshops; 2024 Jan 1–6; Waikoloa, H I, USA. New York City: IEE E; 2024. p. 980–7.

[65]

Xu Z, Zhang Y, Xie E, Zhao Z, Guo Y, Wong KYK, et al.DriveGPT4: interpretable end-to-end autonomous driving via large language model.IEEE Robot Autom Lett 2024; 9(10):8186-8193.

[66]

Marcu AM, Chen L, Hünermann J, Karnsund A, Hanotte B, Chidananda P, et al.LingoQA: visual question answering for autonomous driving.In: Proceedings of the 18th European Conference on Computer Vision-ECC V 2024; 2024 Sep 29–Oct 4; Milan, Italy. Cham: Springer, Cham; 2024. p. 252–69.

[67]

Keysan A, Look A, Kosman E, Gürsun G, Wagner J, Yao Y, et al.Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving.2023. arXiv: 2309.05282.

[68]

Chib PS, Singh P.LG-Traj: LLM guided pedestrian trajectory prediction.2024. arXiv: 2403.08032.

[69]

Zheng X, Wu L, Yan Z, Tang Y, Zhao H, Zhong C, et al.Large language models powered context-aware motion prediction in autonomous driving.In: Proceedings of the 2024 IEE E/RSJ International Conference on Intelligent Robots and Systems (IROS); 2024 Oct14–18; Abu Dhabi, United Arab Emirates. New York City: IEE E; 2024. p. 980–5.

[70]

Sun Q, Zhang S, Ma D, Shi J, Li D, Luo S, et al.Large trajectory models are scalable motion predictors and planners.2024. arXiv: 2310.19620.

[71]

Ding X, Han J, Xu H, Zhang W, Li X.HiLM-D: towards high-resolution understanding in multimodal large language models for autonomous driving. 2023. arxiv-2309.05186.

[72]

de I Zarzà, de JCurtò, Roig G, Calafate CT.LLM multimodal traffic accident forecasting.Sensors 2023; 23(22):9225.

[73]

Schumann R, Zhu W, Feng W, Fu TJ, Riezler S, Wang WY.VELMA: verbalization embodiment of LLM agents for vision and language navigation in street view.In: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence; 2024 Feb 20–27; Vancouver, B C, Canada. Washington: AAAI Press; 2024. p. 18924–33.

[74]

Zhang C, Karatzoglou A, Craig H, Yankov D.Map GPT playground: smart locations and routes with GPT.In: Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems; 2023 Nov 13–16; Hamburg, Germany. New York City: Association for Computing Machinery (ACM); 2023. p. 1–4.

[75]

Liu Y, Wu F, Liu Z, Wang K, Wang F, Qu X.Can language models be used for real-world urban-delivery route optimization?.Innovation 2023; 4(6):100520.

[76]

Fu D, Lei W, Wen L, Cai P, Mao S, Dou M, Shi B, Qiao Y.LimSim++: a closed-loop platform for deploying multimodal LLMs in autonomous driving.In: Proceedings of the 2024 IEEE Intelligent Vehicles Symposium (IV); Jeju Island, Republic of Korea. New York City: IEE E; 2024. p. 1084–90.

[77]

Jin Y, Yang R, Yi Z, Shen X, Peng H, Liu X, et al.SurrealDriver: designing LLM-powered generative driver agent framework based on human drivers’ driving-thinking data.In: Proceedings of the 2024 IEE E/RSJ International Conference on Intelligent Robots and Systems (IROS); 2024 Oct14–18; Abu Dhabi, United Arab Emirates. New York City: IEE E; 2024. p. 966–71.

[78]

Wen L, Fu D, Li X, Cai X, Ma T, Cai P, et al.DiLu: a knowledge-driven approach to autonomous driving with large language models.2024. arXiv: 2309.16292.

[79]

Cui Y, Huang S, Zhong J, Liu Z, Wang Y, Sun C, et al.DriveLLM: charting the path toward full autonomous driving with large language models.IEEE Trans Intell Veh 2024; 9(1):1450-1464.

[80]

Sharan SP, Pittaluga F, Kumar BGV, Chandraker M.LLM-assist: enhancing closed-loop planning with language-based reasoning. 2023. arxiv-2401.00125.

[81]

Wang S, Zhu Y, Li Z, Wang Y, Li L, He Z.ChatGPT as your vehicle co-pilot: an initial attempt.IEEE Trans Intell Veh 2023; 8(12):4706-4721.

[82]

Chang C, Ge J, Guo J, Guo Z, Jiang B, Li L.Driving-RAG: driving scenarios embedding, search, and RAG applications. 2025. arxiv-2504.04419.

[83]

Mao J, Qian Y, Ye J, Zhao H, Wang Y.GPT-driver: learning to drive with GPT.2023. arXiv: 2310.01415.

[84]

Cui C, Yang Z, Zhou Y, Ma Y, Lu J, Li L, et al.Personalized autonomous driving with large language models: field experiments.In: Proceedings of the 2024 IEE E27th International Conference on Intelligent Transportation Systems (ITSC); 2024 Sep 24–27; Edmonton, A B, Canada. New York City: IEE E; 2024. p. 20–27.

[85]

Wang Y, Jiao R, Zhan SS, Lang C, Huang C, Wang Z, et al.Empowering autonomous driving with large language models: a safety perspective.2024. arXiv: 2312.00812.

[86]

Zheng Y, Xing Z, Zhang Q, Jin B, Li P, Zheng Y, et al.PlanAgent: a multi-modal large language agent for closed-loop vehicle motion planning.2024. arXiv: 2406.01587.

[87]

Pan C, Yaman B, Nesti T, Mallik A, Allievi AG, Velipasalar S, et al.VLP: vision language planning for autonomous driving.In: Proceedings of the 2024 IEE E/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2024 Jun 16–22; Seattle, W A, USA. New York City: IEE E; 2024. p. 14760–9.

[88]

Wang W, Xie J, Hu C, Zou H, Fan J, Tong W, et al.DriveMLM: aligning multi-modal large language models with behavioral planning states for autonomous driving.2023. arXiv: 2312.09245.

[89]

Sha H, Mu Y, Jiang Y, Chen L, Xu C, Luo P, et al.LanguageMPC: large language models as decision makers for autonomous driving.2023. arXiv: 2310.03026.

[90]

Azarafza M, Nayyeri M, Steinmetz C, Staab S, Rettberg A.Hybrid reasoning based on large language models for autonomous car driving.2024. arXiv: 2402.13602.

[91]

de I Zarzà, de JCurtò, Roig G, Calafate CT.LLM adaptive PID control for B5G truck platooning systems.Sensors 2023; 23(13):5899.

[92]

Han W, Guo D, Xu CZ, Shen J.DME-Driver: integrating human decision logic and 3D scene perception in autonomous driving.Proc Conf AAAI Artif Intell 2024; 39(3):3347-3355.

[93]

Sima C, Renz K, Chitta K, Chen L, Zhang H, Xie C, et al.DriveLM: driving with graph visual question answering.In: Proceedings of the Computer Vision-ECC V 2024: 18th European Conference; 2024 Sep 29–Oct 4; Milan, Italy. Berlin: Springer- Verlag; 2024. p. 256–74.

[94]

Shao H, Hu Y, Wang L, Waslander SL, Liu Y, Li H.LMDrive: closed-loop end-to-end driving with large language models.In: Proceedings of the 2024 IEE E/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2024 Jun 16–22; Seattle, W A, USA. New York City: IEE E; 2024. p. 15120–30.

[95]

Marathe A, Ramanan D, Walambe R, Kotecha K.WEDGE: a multi-weather autonomous driving dataset built from generative vision-language models.In: Proceedings of the 2023 IEE E/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); 2023 Jun 17–24; Vancouver, B C, Canada. New York City: IEE E; 2023. p. 3318–27.

[96]

Nie M, Peng R, Wang C, Cai X, Han J, Xu H, et al.Reason2Drive: towards interpretable and chain-based reasoning for autonomous driving.In: Proceedings of the European Conference on Computer Vision-ECC V 2024; 2024 Sep 29–Oct 4; Milan, Italy. Berlin: Springer; 2024. p. 292–308.

[97]

Kim J, Rohrbach A, Darrell T, Canny J, Akata Z.Textual explanations for self-driving vehicles.In: Proceedings of the Computer Vision-ECC V 2018, 15th European Conference; 2018 Sep 8–14; Munich, Germany. Berlin: Springer- Verlag; 2018. p. 577–93.

[98]

Qian T, Chen J, Zhuo L, Jiao Y, Jiang YG.NuScenes-QA: a multi-modal visual question answering benchmark for autonomous driving scenario.In: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence; 2024 Feb 20–27; Vancouver, B C, Canada. Washington: AAAI Press; 2024. p. 4542–50.

[99]

Sachdeva E, Agarwal N, Chundi S, Roelofs S, Li J, Kochenderfer M, et al.Rank2Tell: a multimodal driving dataset for joint importance ranking and reasoning.In: Proceedings of the 2024 IEE E/CVF Winter Conference on Applications of Computer Vision (WAC V 2024); 2024 Jan 3–8; Waikola, H I, USA. New York City: IEE E; 2024. p. 7498–507.

[100]

Ma Y, Cui C, Cao X, Ye W, Liu P, Lu J, et al.LaMPilot: an open benchmark dataset for autonomous driving with language model programs.In: Proceedings of the 2024 IEE E/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2024 Jun 16–22; Seattle, W A, USA. New York City: IEE E; 2024. p. 15141–51.

[101]

Leurent E.An environment for autonomous driving decision-making [Internet].San Francisco: GitHub, Inc; undated [cited 2025 Jul 9]. Available from: https://github.com/czh513/Auto-driving-RL-decision-making.

[102]

Cao X, Zhou T, Ma Y, Ye W, Cui C, Tang K, et al.MAPLM: a real-world large-scale vision-language benchmark for map and traffic scene understanding.In: Proceedings of the 2024 IEE E/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2024 Jun 16–22; Seattle, W A, USA. New York City: IEE E; 2024. p. 21819–30.

[103]

Tang K, Cao X, Cao Z, Zhou T, Li E, Liu A, et al.THMA: tencent HD map AI system for creating HD map annotations.Proc Conf AAAI Artif Intell 2022; 37(13):15585-15593.

[104]

Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, et al.nuScenes: a multimodal dataset for autonomous driving.In: Proceedings of the 2020 IEE E/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020 Jun 13–19; Seattle, W A, USA. New York City: IEE E; 2020. p. 11618–28.

[105]

Tan S, Ivanovic B, Weng X, Pavone M, Kraehenbuehl P.Language conditioned traffic generation. PML R 2023;229:2714–52.

[106]

Miceli-Barone AV, Lascarides A, Innes C.Dialogue-based generation of self-driving simulation scenarios using Large Language Models.In: Proceedings of the 3rd Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics (SpLU-RoboNL P 2023); 2023 Dec 6–7; Singapore City, Singapore. Stroudsburg: Association for Computational Linguistics; 2023. p. 1–12.

[107]

Yang K, Guo Z, Lin G, Dong H, Zuo D, Peng J, et al.Natural-language-driven simulation benchmark and copilot for efficient production of object interactions in virtual road scenes.2023. arXiv: 2312.04008.

[108]

Chang C, Wang S, Zhang J, Ge J, Li L.LLMScenario: large language model driven scenario generation.IEEE Trans Syst Man Cybern Syst 2024; 54(11):6581-6594.

[109]

Wei Y, Wang Z, Lu Y, Xu C, Liu C, Zhao H, et al.Editable scene simulation for autonomous driving via collaborative LLM-agents.In: Proceedings of the 2024 IEE E/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2024 Jun 16–22; Seattle, W A, USA. New York City: IEE E; 2024. p. 15077–87.

[110]

Zhao G, Wang X, Zhu Z, Chen X, Huang G, Bao X, et al.DriveDreamer-2: LLM-enhanced world models for diverse driving video generation.Proc Conf AAAI Artif Intell 2024; 39(10):10412-10420.

[111]

Mao J, Ye J, Qian Y, Pavone M, Wang Y.A language agent for autonomous driving. 2023. arxiv-2311.10813.

[112]

Chen L, Sinavski O, Hünermann J, Karnsund A, Willmott AJ, Birch D, et al.Driving with LLMs: fusing object-level vector modality for explainable autonomous driving.In: Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA); 2024 May 13–17; Yokohama, Japan. New York City: IEE E; 2024. p. 14093–100.

[113]

Jia F, Mao W, Liu Y, Zhao Y, Wen Y, Zhang C, et al.ADriver-I: a general world model for autonomous driving.2023. arXiv: 2311.13549.

[114]

Zheng P, Zhao Y, Gong Z, Zhu H, Wu S.SimpleLLM4AD: an end-to-end vision-language model with graph visual question answering for autonomous driving [Internet].Newark: PowerDrill; 2024 Jul 31 [2024 Aug 12]. Available from: https://powerdrill.ai/discover/discover-SimpleLLM4AD-An-End-to-End-clzbrinws137q01as3f0q1anm.

[115]

Wang T, Xie E, Chu R, Li Z, Luo P.DriveCoT: integrating chain-of-thought reasoning with end-to-end driving.2024. arXiv: 2403.16996.

[116]

Li Y, Tian M, Lin Z, Zhu J, Zhu D, Liu H, et al.Fine-grained evaluation of large vision-language models in autonomous driving.2025. arXiv: 2503.21505.

[117]

Ishida S, Corrado G, Fedoseev G, Yeo H, Russell L, Shotton J, et al.LangProp: a code optimization framework using large language models applied to driving.2024. arXiv: 2401.10314.

[118]

Cui C, Ma Y, Cao X, Ye W, Wang Z.Drive as you speak: enabling human-like interaction with large language models in autonomous vehicles.In: Proceedings of the 2024 IEE E/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW); 2024 Jan 1–6; Waikoloa, H I, USA. New York City: IEE E; 2024. p. 902–9.

[119]

Wang L, Ren Y, Jiang H, Cai P, Fu D, Wang T, et al.AccidentGPT: accident analysis and prevention from V2X environmental perception with multi-modal large model.2023. arXiv: 2312.13156.

[120]

Yang Y, Zhang Q, Li C, Marta DS, Batool N, Folkesson J.Human-centric autonomous systems with LLMs for user command reasoning.In: Proceedings of the 2024 IEE E/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW); 2024 Jan 1–6; Waikoloa, H I, USA. New York City: IEE E; 2024. p. 988–94.

[121]

Takemoto K.The moral machine experiment on large language models.R Soc Open Sci 2024; 11:231393.

[122]

Li B, Wang Y, Mao J, Ivanovic B, Veer S, Leung K, et al.Driving everywhere with large language model policy adaptation.In: Proceedings of the 2024 IEE E/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2024 Jun 16–22; Seattle, W A, USA. New York City: IEE E; 2024. p. 14948–57.

[123]

Bai Z, Wang P, Xiao T, He T, Han Z, Zhang Z, et al.Hallucination of multimodal large language models: a survey.2025. arXiv: 2404.18930.

[124]

Wang C, Chen X, Zhang N, Tian B, Xu H, Deng S, et al.MLLM can see? Dynamic correction decoding for hallucination mitigation.2025. arXiv: 2410.11779.

[125]

Yue Z, Zhang L, Jin Q.Less is more: mitigating multimodal hallucination from an EOS decision perspective.In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics; 2024 Aug 11–16; Bangkok, Thailand. Washington: Association for Computational Linguistics; 2024. p. 11766–81.

[126]

Zhou Z, Ning X, Hong K, Fu T, Xu J, Li S, et al.A survey on efficient inference for large language models.2024. arXiv: 2404.14294.

[127]

Forbes M, Holtzman A, Choi Y.Do neural language representations learn physical commonsense? 2019. ar Xiv:1908.02899.

[128]

Memery S, Lapata M, Subr K.SimLM: can language models infer parameters of physical systems? 2023. ar Xiv:2312.14215.

[129]

Wang Y, Guo S, Tan CW.From code generation to software testing: AI copilot with context-based RAG.IEEE Softw 2025; 42(4):34-42.

[130]

Wong MF, Wei Tan C.Aligning crowd-sourced human feedback for code generation with Bayesian inference.In: Proceedings of the 2024 IEEE Conference on Artificial Intelligence (CAI); 2024 Jun 25–27; Singapore City, Singapore. New York City: IEE E; 2024. p. 158–63.

[131]

Yao Y, Duan J, Xu K, Cai Y, Sun Z, Zhang Y.A survey on large language model (LLM) security and privacy: the good, the bad, and the ugly.High-Confid Comput 2024; 4(2):100211.

[132]

Ji J, Qiu T, Chen B, Zhang B, Lou H, Wang K, et al.AI alignment: a comprehensive survey.2025. arXiv: 2310.19852.

[133]

Shen T, Jin R, Huang Y, Liu C, Dong W, Guo Z, et al.Large language model alignment: a survey.2023. arXiv: 2309.15025.

[134]

Chen Y, Kirshner SN, Ovchinnikov A, Andiappan M, Jenkin T.A manager and an AI walk into a bar: does ChatGPT make biased decisions like we do?.Manuf Serv Oper Manag 2023; 27(2):354-368.

[135]

Chang C, Cao D, Chen L, Su K, Su K, Su Y, et al.MetaScenario: a framework for driving scenario data description, storage and indexing.IEEE Trans Intell Veh 2023; 8(2):1156-1175.

[136]

Li L, Huang WL, Liu Y, Zheng NN, Wang FY.Intelligence testing for autonomous vehicles: a new approach.IEEE Trans Intell Veh 2016; 1(2):158-166.

[137]

Li L, Zheng N, Wang FY.A theoretical foundation of intelligence testing and its application for intelligent vehicles.IEEE Trans Intell Transport Syst 2021; 22(10):6297-6306.

[138]

Wong MF, Tan CW.Aligning crowd-sourced human feedback for reinforcement learning on code generation by large language models. IEEE Trans Big Data. In press.

[139]

Ge J, Chang C, Zhang J, Li L, Na X, Lin Y, et al.LLM-based operating systems for automated vehicles: a new perspective.IEEE Trans Intell Veh 2024; 9(4):4563-4567.

PDF (817KB)

685

Accesses

0

Citation

Detail

Sections
Recommended

/