Protecting gender and identity with disentangled speech representations

Dimitrios Stoidis and Andrea Cavallaro

in Proceedings Interspeech 2021

Centre for Intelligent Sensing

Queen Mary University of London

Abstract:

Besides its linguistic content, our speech is rich in biometric information that can be inferred by classifiers. Learning privacy-preserving representations for speech signals enables downstream tasks without sharing unnecessary, private information about an individual. In this paper, we show that protecting gender information in speech is more effective than modelling speaker-identity information only when generating a non-sensitive representation of speech. Our method relies on reconstructing speech by decoding linguistic content along with gender information using a variational autoencoder. Specifically, we exploit disentangled representation learning to encode information about different attributes into separate subspaces that can be factorised independently. We present a novel way to encode gender information and disentangle two sensitive biometric identifiers, namely gender and identity, in a privacy-protecting setting. Experiments on the LibriSpeech dataset show that gender recognition and speaker verification can be reduced to a random guess, protecting against classification-based attacks, while maintaining the utility of the signal for speech recognition.

View full paper on Arxiv: [here]
and ISCA Interspeech: [here]

Audio Samples

Samples used from LibriSpeech dataset train-clean 100:

We use five privacy settings to convert on target identity and gender:
Same Identity (SI)
Random Identity (RI)
Same Identity Random Gender (SIRG)
Random Identity Same Gender (RISG)
Random Gender (RG)

Identity: 103, Gender: female

Reference transcription:
"That had its source away back in the woods of the old Cuthbert place, it was reputed to be an intricate,
headlong brook in its earlier course through those woods with dark secrets of pool and cascade
but by the time it reached Lynde's hollow it was a quiet well conducted little stream."

Original
Same Identity (SI)
Random Identity (RI)
Same Identity Random Gender (SIRG)
Random Identity (1553) Same Gender (RISG)
Random Gender (RG)

Identity: 4397, Gender: male

Reference transcription:
"I was animated by a mountaineer's eagerness to get my feet into the snow once more and my head
into the clear sky, after after lying dormant all winter at the level of the sea, but in every walk with
nature one recieves far more than he seeks."

Original
Same Identity (SI)
Random Identity (RI)
Same Identity Random Gender (SIRG)
Random Identity (7067) Same Gender (RISG)
Random Gender (RG)
Youtube videos
Centre for Intelligent Sensing [here]
ISCA Interspeech: [here]
Cite as [bibtex]

@inproceedings{stoidis21_interspeech
author={Dimitrios Stoidis and Andrea Cavallaro},
title={{Protecting Gender and Identity with Disentangled Speech Representations}},
year=2021,
booktitle={Proc. Interspeech 2021},
pages={1699--1703},
doi={10.21437/Interspeech.2021-2163} }