Frontiers of Information Technology & Electronic Engineering
>> 2018,
Volume 19,
Issue 4
doi:
10.1631/FITEE.1601846
Orginal Article
Syntactic word embedding based on dependency syntax and polysemous analysis
. School of Computer Science, Shaanxi Normal University, Xi’an 710119, China.. School of Computer, Qinghai Normal University, Xining 810800, China.
Available online: 2018-06-28
Next
Previous
Abstract
Most word embedding models have the following problems: (1) In the models based on bag-of-words contexts, the structural relations of sentences are completely neglected; (2) Each word uses a single embedding, which makes the model indiscriminative for polysemous words; (3) Word embedding easily tends to contextual structure similarity of sentences. To solve these problems, we propose an easy-to-use representation algorithm of syntactic word embedding (SWE). The main procedures are: (1) A polysemous tagging algorithm is used for polysemous representation by the latent Dirichlet allocation (LDA) algorithm; (2) Symbols ‘+’ and ‘−’ are adopted to indicate the directions of the dependency syntax; (3) Stopwords and their dependencies are deleted; (4) Dependency skip is applied to connect indirect dependencies; (5) Dependency-based contexts are inputted to a word2vec model. Experimental results show that our model generates desirable word embedding in similarity evaluation tasks. Besides, semantic and syntactic features can be captured from dependency-based syntactic contexts, exhibiting less topical and more syntactic similarity. We conclude that SWE outperforms single embedding learning models.