This talk will take you on a journey into the world of speech enhancement, a realm that exists only to separate an acoustic target speech signal from noise, interfering speech or music. While classical approaches were typically quite heavy on the mathematical side, the advent of deep learning radically changed the landscape. It allows for the topic to be formulated as a simple “learning from simulations” paradigm where data generation plays the central role. Starting from an early deep lerning noise reduction approach developed at Ohio State University, we proceed with gradual improvements that finally led to serious products over the last 2 years. The matter will be presented in a vivid fashion. And it will be supplemented with countless examples of how speech enhancement is deployed at Cerence, ranging from improved subjective quality in hands-free telephone calls to facilitating multi-user speech dialog systems such as MBUX (Mercedes Benz User Experience).
Friedrich Faubel holds a PhD from Saarland University. He was part of the International Research Training Group (IRTG) for Language Technology and Cognitive Systems. His PhD included research stays at Carnegie Mellon University (CMU) and the University of Edinburgh. In 2013, he joined the Auto Speech R&D department of Nuance. Since 2019, he is with the Audio AI department of Cerence where he works as a senior principle product R&D engineer. His main areas of expertise are speech enhancement for hands-free telephony, prompt and music cancellation for voice recognition applications as well as emergency vehicle detection for autonomous driving.