资源类型

期刊论文 3

年份

2023 1

2021 1

2015 1

关键词

命名实体识别(NER);信息抽取;网络空间安全;机器学习;深度学习 1

检索范围:

排序: 展示方式:

Named entity recognition for Chinese construction documents based on conditional random field

《工程管理前沿(英文)》 2023年 第10卷 第2期   页码 237-249 doi: 10.1007/s42524-021-0179-8

摘要: Named entity recognition (NER) is essential in many natural language processing (NLP) tasks such as information extraction and document classification. A construction document usually contains critical named entities, and an effective NER method can provide a solid foundation for downstream applications to improve construction management efficiency. This study presents a NER method for Chinese construction documents based on conditional random field (CRF), including a corpus design pipeline and a CRF model. The corpus design pipeline identifies typical NER tasks in construction management, enables word-based tokenization, and controls the annotation consistency with a newly designed annotating specification. The CRF model engineers nine transformation features and seven classes of state features, covering the impacts of word position, part-of-speech (POS), and word/character states within the context. The F1-measure on a labeled construction data set is 87.9%. Furthermore, as more domain knowledge features are infused, the marginal performance improvement of including POS information will decrease, leading to a promising research direction of POS customization to improve NLP performance with limited data.

关键词: NER     NLP     Chinese language     construction document    

Automatically building large-scale named entity recognition corpora from Chinese Wikipedia

Jie ZHOU,Bi-cheng LI,Gang CHEN

《信息与电子工程前沿(英文)》 2015年 第16卷 第11期   页码 940-956 doi: 10.1631/FITEE.1500067

摘要: Named entity recognition (NER) is a core component in many natural language processing applications. Most NER systems rely on supervised machine learning methods, which depend on time-consuming and expensive annotations in different languages and domains. This paper presents a method for automatically building silver-standard NER corpora from Chinese Wikipedia. We refine novel and language-dependent features by exploiting the text and structure of Chinese Wikipedia. To reduce tagging errors caused by entity classification, we design four types of heuristic rules based on the characteristics of Chinese Wikipedia and train a supervised NE classifier, and a combined method is used to improve the precision and coverage. Then, we realize type identification of implicit mention by using boundary information of outgoing links. By selecting the sentences related with the domains of test data, we can train better NER models. In the experiments, large-scale NER corpora containing 2.3 million sentences are built from Chinese Wikipedia. The results show the effectiveness of automatically annotated corpora, and the trained NER models achieve the best performance when combining our silver-standard corpora with gold-standard corpora.

关键词: NER corpora     Chinese Wikipedia     Entity classification     Domain adaptation     Corpus selection    

网络空间安全命名实体识别综述 Review Article

高宸1,张璇1,2,3,韩梦婷1,刘会1

《信息与电子工程前沿(英文)》 2021年 第22卷 第9期   页码 1153-1168 doi: 10.1631/FITEE.2000286

摘要: 由于网络空间安全领域文本的复杂性和多样性,使用传统的命名实体识别(NER)方法难以识别该领域中的安全实体。本文介绍该领域NER的各种方法和技术,包括基于规则的方法、基于字典的方法和基于机器学习的方法,并讨论该领域NER研究面临的问题,如实体词组的结合与分离、非标准化的命名约定、缩写和大量嵌套等。最后,提出NER在网络空间安全方面的3个研究方向:(1)应用无监督或半监督技术;(2)开发更全面的网络空间安全本体;(3)应用更加有效的深度学习模型。

关键词: 命名实体识别(NER);信息抽取;网络空间安全;机器学习;深度学习    

标题 作者 时间 类型 操作

Named entity recognition for Chinese construction documents based on conditional random field

期刊论文

Automatically building large-scale named entity recognition corpora from Chinese Wikipedia

Jie ZHOU,Bi-cheng LI,Gang CHEN

期刊论文

网络空间安全命名实体识别综述

高宸1,张璇1,2,3,韩梦婷1,刘会1

期刊论文