Abstract
With the rapid development of Internet technology and the advent of the era of big data, more and more texts are provided on the Internet. These texts include not only security concepts, incidents, tools, guidelines, and policies, but also risk management approaches, best practices, assurances, technologies, and more. Through the integration of large-scale, heterogeneous, unstructured information, the identification and classification of entities can help handle issues. Due to the complexity and diversity of texts in the domain, it is difficult to identify security entities in the domain using the traditional methods. This paper describes various approaches and techniques for NER in this domain, including the rule-based approach, dictionary-based approach, and based approach, and discusses the problems faced by NER research in this domain, such as conjunction and disjunction, non-standardized naming convention, abbreviation, and massive nesting. Three future directions of NER in are proposed: (1) application of unsupervised or semi-supervised technology; (2) development of a more comprehensive ontology; (3) development of a more comprehensive model.