Speech emotion recognitionwith unsupervised feature learning

2015年第16卷第5期

摘要

关键词

相关研究

回顶部

《信息与电子工程前沿（英文）》 >> 2015年第16卷第5期 doi: 10.1631/FITEE.1400323

Speech emotion recognitionwith unsupervised feature learning

Department of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China

发布日期： 2016-01-05

HTML18 PDF 2 收藏 0

摘要

Emotion-based features are critical for achieving high performance in a speech emotion recognition (SER) system. In general, it is difficult to develop these features due to the ambiguity of the ground-truth. In this paper, we apply several unsupervised feature learning algorithms (including -means clustering, the sparse auto-encoder, and sparse restricted Boltzmann machines), which have promise for learning task-related features by using unlabeled data, to speech emotion recognition. We then evaluate the performance of the proposed approach and present a detailed analysis of the effect of two important factors in the model setup, the content window size and the number of hidden layer nodes. Experimental results show that larger content windows and more hidden nodes contribute to higher performance. We also show that the two-layer network cannot explicitly improve performance compared to a single-layer network.

关键词

Speech emotion recognition ; Unsupervised feature learning ; Neural network ; Affect computing