Latent source-specific generative factor learning for monaural speech separation using weighted-factor autoencoder

2020年第21卷第11期

摘要

关键词

相关研究

回顶部

《信息与电子工程前沿（英文）》 >> 2020年第21卷第11期 doi: 10.1631/FITEE.2000019

Latent source-specific generative factor learning for monaural speech separation using weighted-factor autoencoder

Affiliation(s): School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China; Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace, Zhenjiang 212013, China; less

收稿日期： 2020-01-13 录用日期： 2020-11-13 发布日期： 2020-11-13

HTML30 PDF 3 收藏 0

摘要

Much recent progress in monaural (MSS) has been achieved through a series of architectures based on s, which use an encoder to condense the input signal into compressed features and then feed these features into a decoder to construct a specific audio source of interest. However, these approaches can neither learn of the original input for MSS nor construct each audio source in mixed speech. In this study, we propose a novel weighted-factor (WFAE) model for MSS, which introduces a regularization loss in the objective function to isolate one source without containing other sources. By incorporating a latent attention mechanism and a supervised source constructor in the separation layer, WFAE can learn source-specific and a set of discriminative features for each source, leading to MSS performance improvement. Experiments on benchmark datasets show that our approach outperforms the existing methods. In terms of three important metrics, WFAE has great success on a relatively challenging MSS case, i.e., speaker-independent MSS.

关键词

语音分离 ; 生成因子 ; 自动编码器 ; 深度学习