CMU-CS-19-125
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-19-125

Representation Learning for Voice Profiling

Daanish Ali Khan

M.S. Thesis

August 2019

CMU-CS-19-125.pdf


Keywords: Voice profiling, representation learning, deep learning

Voice-profiling is the deduction of a speakers characteristics from their voice, a problem that has many applications in audio forensics, law enforcement, security and health-care. Speaker characteristics that can determined include the speakers gender, age, and ethnicity along with other physical and demographic characteristics.

Prior work on computational voice-profiling techniques modelled the production of voice as a physical system, and defined multiple voice signal features that encode speaker characteristics. Recent advances in artificial neural networks has resulted in an improvement in performance across voice profiling tasks, but such methods are often purely data-driven; the representation and relationships between voice and speaker characteristics are learned from a large dataset, not necessarily leveraging the knowledge-based voice features from prior work.

We identify the key challenges of modern voice profiling as being: 1) learning a representation that captures the complex relationship between voice and speakerparameters, 2) designing a representation that is resilient to real world noise, and 3) learning a representation that is generalizable across recording conditions and speaker characteristics.

In this work, we combine domain-specific signal-processing features with state of the art neural network techniques to learn a generalizable audio representation for voice-profiling. The learned representation is evaluated on multiple voice-profiling tasks including prediction of speaker gender, native language, and geographical origin. We experimentally show significant improvements in real world performance of voice profiling using our proposed speech representation.

34 pages

Thesis Committee:
Bhiksha Raj (Co-Advisor)
Rita Singh (Co-Advisor)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu