面向安全治理的大语言模型风险分析与应对策略研究
Risk Analysis and Response Strategies of Large Language Models for Security Governance
大语言模型(LLM)安全风险的认知碎片化、治理策略滞后现象凸显,亟需融合风险机理分析、量化评估、治理实践的综合框架。本文在辨析全球治理实践的演变与挑战、现有LLM风险分类分级框架呈碎片化与割裂化的基础上,揭示了LLM风险源于模型内部复杂性、外部交互的二元触发机制,将风险剖析为内生安全、应用安全两个维度,据此提出了“双维驱动”的风险分析与治理框架;引入了“风险标签卡片”作为标准化工具,采用“人工智能+人类专家协同”范式进行了真实安全案例的结构化解析,结合改进的DREAD风险矩阵模型,建立了从定性识别到定量分级的完整评估方法论;最终构建了LLM安全风险分类体系以及覆盖主要风险类型的高、中、低三级风险图谱,并从实施“双维驱动”的风险管控核心策略、健全系统性的治理保障体系两方面形成了LLM安全风险治理建议。研究提出的“双维驱动”的风险分析与治理框架,具有良好的理论兼容性与动态特性,为精准评估和治理LLM安全风险提供了理论工具,有效弥合了LLM安全风险治理实践存在的“理论 ‒ 操作鸿沟”,为持续追踪和理解LLM安全风险并制定安全政策提供了直接参考。
To address the challenges of fragmented understanding of Large Language Model (LLM) security risks and the inadequacy of LLM risk classification and grading frameworks, this study aims to construct a comprehensive framework that integrates risk mechanism analysis, quantitative assessment, and governance practices. Theoretically, this study synthesizes and reconstructs multiple foundational theories, including socio-technical systems, social systems theory, and safety science, to reveal that risks originate from a dual trigger mechanism of the model's "internal complexity" and "external interaction." It consequently dissects risks into two primary dimensions—"internal safety" and "application security"—providing a unified theoretical foundation for a systematic governance framework. Methodologically, the study introduces "Risk Label Cards" as a standardized tool and employs an "Artificial Intelligence + Human Expert Collaboration" approach to structurally analyze real-world security incidents. Combined with an improved DREAD (damage, reproducibility, exploitability, affected users, discoverability) risk matrix model, it establishes a complete assessment methodology that spans from qualitative identification to quantitative grading. The research culminates in the construction of a systematic risk classification system and a three-tiered (high, medium, low) risk landscape covering major risk types. The "dual-dimensional driven" risk analysis and governance framework constructed in this study provides a systematic theoretical tool for the precise assessment and governance of LLM risks, effectively bridging the "theory-practice gap" in governance. Furthermore, with its theoretical compatibility and dynamic characteristics, the framework provides a reference for continuously tracking and understanding the evolution of LLM security risks and for security policy research.
大语言模型 / 安全风险 / 安全治理 / 风险评估 / 分类分级 / 风险图谱
large language model / security risk / security governance / risk assessment / classification and grading / risk landscape
| [1] |
Raffel C, Shazeer N, Roberts A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer [J]. Journal of Machine Learning Research, 2020, 21(1): 5485‒5551. |
| [2] |
Gallegos I O, Rossi R A, Barrow J, et al. Bias and fairness in large language models: A survey [J]. Computational Linguistics, 2024, 50(3): 1097‒1179. |
| [3] |
Brown T B, Mann B, Ryder N, et al. Language models are few-shot learners [R]. Vancouver: The 34th International Conference on Neural Information Processing Systems, 2020. |
| [4] |
Wei A, Haghtalab N, Steinhardt J. Jailbroken: How does LLM safety training fail? [EB/OL]. (2023-07-05)[2025-10-15]. https://arxiv.org/abs/2307.02483. |
| [5] |
Kaplan J, McCandlish S, Henighan T, et al. Scaling laws for neural language models [EB/OL]. (2020-01-23)[2025-10-15]. https://arxiv.org/abs/2001.08361. |
| [6] |
Lehman J, Clune J, Misevic D, et al. The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities [J]. Artificial Life, 2020, 26(2): 274‒306. |
| [7] |
Bommasani R, Hudson D A, Adeli E, et al. On the opportunities and risks of foundation models [EB/OL]. (2021-08-16)[2025-10-15]. https://arxiv.org/abs/2108.07258. |
| [8] |
McKenzie I R, Lyzhov A, Pieler M, et al. Inverse scaling: When bigger isn't better [EB/OL]. (2023-06-15)[2025-10-15]. https://arxiv.org/abs/2306.09479. |
| [9] |
Wang H D, Fu W J, Tang Y Z, et al. A survey on responsible LLMs: Inherent risk, malicious use, and mitigation strategy [EB/OL]. (2025-01-16)[2025-10-15]. https://arxiv.org/abs/2501.09431. |
| [10] |
Sovacool B K, Hess D J. Ordering theories: Typologies and conceptual frameworks for sociotechnical change [J]. Social Studies of Science, 2017, 47(5): 703‒750. |
| [11] |
Gordon J S. Building moral robots: Ethical pitfalls and challenges [J]. Science and Engineering Ethics, 2020, 26(1): 141‒157. |
| [12] |
Askell A, Brundage M, Hadfield G. The role of cooperation in responsible AI development [EB/OL]. (2019-07-10)[2025-10-15]. https://arxiv.org/abs/1907.04534. |
| [13] |
Ibáñez J C, Olmeda M V. Operationalising AI ethics: How are companies bridging the gap between practice and principles? An exploratory study [J]. AI & Society, 2022, 37(4): 1663‒1687. |
| [14] |
Raji I D, Smart A, White R N, et al. Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing [R]. Barcelona: The 2020 Conference on Fairness, Accountability, and Transparency, 2020. |
| [15] |
Jobin A, Ienca M, Vayena E. The global landscape of AI ethics guidelines [J]. Nature Machine Intelligence, 2019, 1(9): 389‒399. |
| [16] |
Floridi L. The European legislation on AI: A brief analysis of its philosophical approach [J]. Philosophy & Technology, 2021, 34(2): 215‒222. |
| [17] |
Rahwan I, Cebrian M, Obradovich N, et al. Machine behaviour [J]. Nature, 2019, 568(7753): 477‒486. |
| [18] |
Dixon L, Li J, Sorensen J, et al. Measuring and mitigating unintended bias in text classification [R]. New Orleans: The 2018 AAAI/ACM Conference on AI, Ethics, and Society, 2018. |
| [19] |
World Economic Forum. Global risk report 2024 [EB/OL]. (2024-01-15)[2025-10-15]. https://www3.weforum.org/docs/WEF_The_Global_Risks_Report_2024.pdf. |
| [20] |
Tabassi E. Artificial intelligence risk management framework (AI RMF 1.0) [EB/OL]. (2023-01-26)[2025-10-15]. https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-ai-rmf-10. |
| [21] |
de Almeida P G R, dos Santos C D, Farias J S. Artificial intelligence regulation: A framework for governance [J]. Ethics and Information Technology, 2021, 23(3): 505‒525. |
| [22] |
Tarka J, Sedaei S. EU AI act update: Navigating the future [EB/OL]. (2025-07-16)[2025-10-15]. https://ogletree.com/insights-resources/blog-posts/eu-ai-act-update-navigating-the-future/. |
| [23] |
中华人民共和国国家互联网信息办公室. 生成式人工智能服务管理暂行办法 [EB/OL]. (2023-07-13)[2025-10-15]. https://www.cac.gov.cn/2023-07/13/c_1690898327029107.htm. |
| [24] |
Cyberspace Administration of China. Interim measures for the administration of generative artificial intelligence services [EB/OL]. (2023-07-13)[2025-10-15]. https://www.cac.gov.cn/2023-07/13/c_1690898327029107.htm. |
| [25] |
Zeng Y, Klyman K, Zhou A, et al. AI risk categorization decoded (AIR 2024): From government regulations to corporate policies [EB/OL]. (2024-06-25)[2025-10-15]. https://arxiv.org/abs/2406.17864. |
| [26] |
Wirtz B W, Weyerer J C, Kehl I. Governance of artificial intelligence: A risk and guideline-based integrative framework [J]. Government Information Quarterly, 2022, 39(4): 101685. |
| [27] |
Yampolskiy R V. Taxonomy of pathways to dangerous artificial intelligence [R]. Phoenix: AAAI Workshop: AI, Ethics, and Society, 2016. |
| [28] |
Hendrycks D, Mazeika M. X-risk analysis for AI research [EB/OL]. (2022-07-13)[2025-10-15]. https://arxiv.org/abs/2206.05862. |
| [29] |
Cui T Y, Wang Y L, Fu C P, et al. Risk taxonomy, mitigation, and assessment benchmarks of large language model systems [EB/OL]. (2024-01-11)[2025-10-15]. https://arxiv.org/abs/2401.05778. |
| [30] |
McGregor S. Preventing repeated real world AI failures by cataloging incidents: The AI incident database [R]. Virtual: The Thirty-Third Annual Conference on Innovative Applications of Artificial Intelligence, 2021. |
| [31] |
Pittaras N, McGregor S. A taxonomic system for failure cause analysis of open source AI incidents [EB/OL]. (2022-11-14)[2025-10-15]. https://arxiv.org/abs/2211.07280. |
| [32] |
Das B C, Amini M H, Wu Y Z. Security and privacy challenges of large language models: A survey [EB/OL]. (2024-01-30)[2025-10-15]. https://arxiv.org/abs/2402.00888. |
| [33] |
Weidinger L, Mellor J, Rauh M, et al. Ethical and social risks of harm from language models [EB/OL]. (2021-12-08)[2025-10-15]. https://arxiv.org/abs/2112.04359. |
| [34] |
Slattery P, Saeri A K, Grundy E A C, et al. The AI risk repository: A comprehensive meta-review, database, and taxonomy of risks from artificial intelligence [EB/OL]. (2024-08-14)[2025-10-15]. https://arxiv.org/abs/2408.12622. |
| [35] |
Alemanno A, den Butter F, Nijsen A, et al. Better business regulation in a risk society [M]. New York: Springer New York, 2014. |
| [36] |
Amodei D, Olah C, Steinhardt J, et al. Concrete problems in AI safety [EB/OL]. (2016-06-21)[2025-10-15]. https://arxiv.org/abs/1606.06565. |
| [37] |
Cave S, Dihal K. The whiteness of AI [J]. Philosophy & Technology, 2020, 33(4): 685‒703. |
| [38] |
Crawford K. The atlas of AI: Power, politics, and the planetary costs of artificial intelligence [M]. New Haven: Yale University Press, 2021. |
| [39] |
Luhmann N, Bednarz J, Baecker D, et al. Social systems [M]. Stanford: Stanford University Press, 1995. |
| [40] |
Luhmann N. Risk: A sociological theory [M]. New York: Routledge, 2017. |
| [41] |
Hollnagel E. Safety-I and safety-II: The past and future of safety management [M]. Farnham: Ashgate, 2014. |
| [42] |
Wang Y B, Yu Y C, Liang J, et al. A comprehensive survey on trustworthiness in reasoning with large language models [EB/OL]. (2025-09-04)[2025-10-15]. https://arxiv.org/abs/2509.03871. |
| [43] |
Weidinger L, Uesato J, Rauh M, et al. Taxonomy of risks posed by language models [R]. Seoul: The 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022. |
| [44] |
Chang Y P, Wang X, Wang J D, et al. A survey on evaluation of large language models [J]. ACM Transactions on Intelligent Systems and Technology, 2024, 15(3): 1‒45. |
| [45] |
Habbal A, Ali M K, Ali A M. Artificial intelligence trust, risk and security management (AI TRiSM): Frameworks, applications, challenges and future research directions [J]. Expert Systems with Applications, 2024, 240: 122442. |
| [46] |
Cox L A. What's wrong with risk matrices? [J]. Risk Analysis, 2008, 28(2): 497‒512. |
| [47] |
Meier U, Spiekermann S, Eicker S. DREAD: A risk assessment approach for e-business applications [R]. Washington DC: Tenth ACM Conference on Computer and Communications Security 2003, 2003. |
| [48] |
Goepel K D. Comparison of judgment scales of the analytical hierarchy process—A new approach [J]. International Journal of Information Technology & Decision Making, 2019, 18(2): 445‒463. |
| [49] |
Andrikyan W, Sametinger S M, Kosfeld F, et al. Artificial intelligence-powered chatbots in search engines: A cross-sectional study on the quality and risks of drug information for patients [J]. BMJ Quality & Safety, 2025, 34(2): e017476. |
中国工程院咨询项目“国家级大模型监管保险箍模式研究”(2025-XZ-08)
“广东省人工智能大语言模型的安全合规监管战略研究”(2024-GD-04)
教育部哲学社会科学重大课题研究项目(24JZD040)
国家自然科学基金项目(72293583)
国家自然科学基金项目(72293580)
/
| 〈 |
|
〉 |