期刊首页 优先出版 当期阅读 过刊浏览 作者中心 关于期刊 English

《工程管理前沿(英文)》 >> 2023年 第10卷 第2期 doi: 10.1007/s42524-021-0179-8

Named entity recognition for Chinese construction documents based on conditional random field

发布日期: 2022-01-07

下一篇 上一篇

摘要

Named entity recognition (NER) is essential in many natural language processing (NLP) tasks such as information extraction and document classification. A construction document usually contains critical named entities, and an effective NER method can provide a solid foundation for downstream applications to improve construction management efficiency. This study presents a NER method for Chinese construction documents based on conditional random field (CRF), including a corpus design pipeline and a CRF model. The corpus design pipeline identifies typical NER tasks in construction management, enables word-based tokenization, and controls the annotation consistency with a newly designed annotating specification. The CRF model engineers nine transformation features and seven classes of state features, covering the impacts of word position, part-of-speech (POS), and word/character states within the context. The F1-measure on a labeled construction data set is 87.9%. Furthermore, as more domain knowledge features are infused, the marginal performance improvement of including POS information will decrease, leading to a promising research direction of POS customization to improve NLP performance with limited data.

相关研究