Journal Home Online First Current Issue Archive For Authors Journal Information 中文版

Frontiers of Information Technology & Electronic Engineering >> 2020, Volume 21, Issue 6 doi: 10.1631/FITEE.1800743

Learning to select pseudo labels: a semi-supervised method for named entity recognition

国防科技大学计算机学院,中国长沙市,410073

Received: 2018-11-22 Accepted: 2020-06-12 Available online: 2020-06-12

Next Previous

Abstract

models have achieved state-of-the-art performance in (NER); the good performance, however, relies heavily on substantial amounts of labeled data. In some specific areas such as medical, financial, and military domains, labeled data is very scarce, while is readily available. Previous studies have used to enrich word representations, but a large amount of entity information in is neglected, which may be beneficial to the NER task. In this study, we propose a for NER tasks, which learns to create high-quality labeled data by applying a pre-trained module to filter out erroneous pseudo labels. Pseudo labels are automatically generated for and used as if they were true labels. Our semi-supervised framework includes three steps: constructing an optimal single neural model for a specific NER task, learning a module that evaluates pseudo labels, and creating new labeled data and improving the NER model iteratively. Experimental results on two English NER tasks and one Chinese clinical NER task demonstrate that our method further improves the performance of the best single neural model. Even when we use only pre-trained static word embeddings and do not rely on any external knowledge, our method achieves comparable performance to those state-of-the-art models on the CoNLL-2003 and OntoNotes 5.0 English NER tasks.

Related Research