Computer Science > Sound

arXiv:2001.00378 (cs)

[Submitted on 2 Jan 2020 (v1), last revised 24 Sep 2021 (this version, v2)]

Title:Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends

Authors:Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Junaid Qadir, Björn W. Schuller

View PDF

Abstract:Research on speech processing has traditionally considered the task of designing hand-engineered acoustic features (feature engineering) as a separate distinct problem from the task of designing efficient machine learning (ML) models to make prediction and classification decisions. There are two main drawbacks to this approach: firstly, the feature engineering being manual is cumbersome and requires human knowledge; and secondly, the designed features might not be best for the objective at hand. This has motivated the adoption of a recent trend in speech community towards utilisation of representation learning techniques, which can learn an intermediate representation of the input signal automatically that better suits the task at hand and hence lead to improved performance. The significance of representation learning has increased with advances in deep learning (DL), where the representations are more useful and less dependent on human knowledge, making it very conducive for tasks like classification, prediction, etc. The main contribution of this paper is to present an up-to-date and comprehensive survey on different techniques of speech representation learning by bringing together the scattered research across three distinct research areas including Automatic Speech Recognition (ASR), Speaker Recognition (SR), and Speaker Emotion Recognition (SER). Recent reviews in speech have been conducted for ASR, SR, and SER, however, none of these has focused on the representation learning from speech -- a gap that our survey aims to bridge.

Comments:	Part of this work is accepted in IEEE Transactions on Affective Computing 2021. this https URL
Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2001.00378 [cs.SD]
	(or arXiv:2001.00378v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2001.00378

Submission history

From: Siddique Latif [view email]
[v1] Thu, 2 Jan 2020 10:12:23 UTC (415 KB)
[v2] Fri, 24 Sep 2021 05:09:30 UTC (415 KB)

Computer Science > Sound

Title:Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators