Resource Type

Journal Article 2

Year

2018 1

2017 1

Keywords

Dependency-based context 1

Polysemous word representation 1

Representation learning 1

Syntactic word embedding 1

Search scope:

排序: Display mode:

Zipfian interpretation of textbook vocabulary lists: comments on Xiao et al.’s Corpus-based research on English word recognition rates in primary school and word selection strategy Correspondence

Qiong HU, Ming YUE

Frontiers of Information Technology & Electronic Engineering 2017, Volume 18, Issue 7,   Pages 863-866 doi: 10.1631/FITEE.1700418

Abstract: Xiao et al. (2017)在对比分析4个语料库的基础上,提出国内小学六年级学生的识词率增长不能令人满意的观点,建议人教版小学英语通用教材总词汇在原有726个的基础上再增加903个,并删除twelfth(序数词,第十二)这样的低频词。作为外语教师和语言学家,我们赞同他们应用先进信息技术对传统词表进行评估的做法,但认为这项工作:1. 在构建参考语料库时需重视语料抽样的合理性;2. 在解读词频时需重视齐夫定律(Zipf’s law,即英语词频与词秩成反比)的作用——识字率增长随词汇量增加而减缓的情况是合理的;3. 在提出教材选词策略时,需考虑小学生认知特点和课业负担等现实因素限制,以及语言教育的总体目标,不能随便删除twelfth这样承载文化的词汇;4. 学龄儿童全国通用外语教材编写是项复杂的系统工程,需要各领域专家共同关注。

Keywords: 齐夫定律(Zipf’s law);语料库;英语;教科书;词表    

Syntactic word embedding based on dependency syntax and polysemous analysis None

Zhong-lin YE, Hai-xing ZHAO

Frontiers of Information Technology & Electronic Engineering 2018, Volume 19, Issue 4,   Pages 524-535 doi: 10.1631/FITEE.1601846

Abstract: Most word embedding models have the following problems: (1) In the models based on bag-of-words contexts, the structural relations of sentences are completely neglected; (2) Each word uses a single embedding, which makes the model indiscriminative for polysemous words; (3) Word embedding easily tends to contextual structure similarity of sentences. To solve these problems, we propose an easy-to-use representation algorithm of syntactic word embedding (SWE). The main procedures are: (1) A polysemous tagging algorithm is used for polysemous representation by the latent Dirichlet allocation (LDA) algorithm; (2) Symbols ‘+’ and ‘−’ are adopted to indicate the directions of the dependency syntax; (3) Stopwords and their dependencies are deleted; (4) Dependency skip is applied to connect indirect dependencies; (5) Dependency-based contexts are inputted to a word2vec model. Experimental results show that our model generates desirable word embedding in similarity evaluation tasks. Besides, semantic and syntactic features can be captured from dependency-based syntactic contexts, exhibiting less topical and more syntactic similarity. We conclude that SWE outperforms single embedding learning models.

Keywords: Dependency-based context     Polysemous word representation     Representation learning     Syntactic word embedding    

Title Author Date Type Operation

Zipfian interpretation of textbook vocabulary lists: comments on Xiao et al.’s Corpus-based research on English word recognition rates in primary school and word selection strategy

Qiong HU, Ming YUE

Journal Article

Syntactic word embedding based on dependency syntax and polysemous analysis

Zhong-lin YE, Hai-xing ZHAO

Journal Article