• Home
  • Journals
  • Focus
  • Conferences
  • Researchers
  • Sign in

Outline

  • Abstract
  • Keywords

Figures(5)

标签(1)

Table 1

其他(2)

PDF
Document

Frontiers of Information Technology & Electronic Engineering

2018, Volume 19,  Issue 1, Pages 40-63
    • PDF
    • collect

    Past review, current progress, and challenges ahead on the cocktail party problem

    . Tencent AI Lab, Tencent, Bellevue 98004, USA.. Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.

    Available online:2018-04-23
    Show More
    10.1631/FITEE.1700814
    Cite this article
    Yan-min QIAN, Chao WENG, Xuan-kai CHANG, Shuai WANG, Dong YU.Past review, current progress, and challenges ahead on the cocktail party problem[J].Frontiers of Information Technology & Electronic Engineering,2018,19(1):40-63.

    Abstract

    The cocktail party problem, i.e., tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously, is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition (ASR) systems. In this overview paper, we review the techniques proposed in the last two decades in attacking this problem. We focus our discussions on the speech separation problem given its central role in the cocktail party environment, and describe the conventional single-channel techniques such as computational auditory scene analysis (CASA), non-negative matrix factorization (NMF) and generative models, the conventional multi-channel techniques such as beamforming and multi-channel blind source separation, and the newly developed deep learning-based techniques, such as deep clustering (DPCL), the deep attractor network (DANet), and permutation invariant training (PIT). We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment. We argue effectively exploiting information in the microphone array, the acoustic training set, and the language itself using a more powerful model. Better optimization objective and techiques will be the approach to solving the cocktail party problem.

    Keywords

    Cocktail party problem ; Computational auditory scene analysis ; Non-negative matrix factorization ; Permutation invariant training ; Multi-talker speech processing
    Previous article in issue
    article in issue Next
    登录后,您可以进行评论。请先登录

    评论

    评论

    • 所有评论
     咋就跳到顶部了
    2019-04-23 11:24:14
    回复 (0)
    inspur  手机账号
    2019-05-10 11:30:17
    回复 (0)

    Read

    596

    Download

    23

    Related Research

    Current Issue
      Current Issue
        Follow us
        Copyright © 2015 China Engineering Science Press.
        京ICP备11030251号
        Follow us
        Copyright © 2015 China Engineering Science Press.
        京ICP备11030251号