CDPAM: Contrastive Learning for Perceptual Audio Similarity

Manocha, Pranay; Jin, Zeyu; Zhang, Richard; Finkelstein, Adam

CDPAM: Contrastive Learning for Perceptual Audio Similarity

Author(s): Manocha, Pranay; Jin, Zeyu; Zhang, Richard; Finkelstein, Adam

Download

To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1z31np03

Full metadata record

DC Field	Value	Language
dc.contributor.author	Manocha, Pranay	-
dc.contributor.author	Jin, Zeyu	-
dc.contributor.author	Zhang, Richard	-
dc.contributor.author	Finkelstein, Adam	-
dc.date.accessioned	2023-12-28T15:45:40Z	-
dc.date.available	2023-12-28T15:45:40Z	-
dc.date.issued	2021	en_US
dc.identifier.citation	Manocha, Pranay, Zeyu Jin, Richard Zhang, and Adam Finkelstein. "CDPAM: Contrastive learning for perceptual audio similarity." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 196-200. 2021. doi:10.1109/ICASSP39728.2021.9413711	en_US
dc.identifier.issn	1520-6149	-
dc.identifier.uri	https://arxiv.org/abs/2102.05109	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/pr1z31np03	-
dc.description.abstract	Many speech processing methods based on deep learning require an automatic and differentiable audio metric for the loss function. The DPAM approach of Manocha et al. [1] learns a full-reference metric trained directly on human judgments, and thus correlates well with human perception. However, it requires a large number of human annotations and does not generalize well outside the range of perturbations on which it was trained. This paper introduces CDPAM –a metric that builds on and advances DPAM. The primary improvement is to combine contrastive learning and multi-dimensional representations to build robust models from limited data. In addition, we collect human judgments on triplet comparisons to improve generalization to a broader range of audio perturbations. CDPAM correlates well with human responses across nine varied datasets. We also show that adding this metric to existing speech synthesis and enhancement methods yields significant improvement, as measured by objective and subjective tests.	en_US
dc.format.extent	196 - 200	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartof	ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	en_US
dc.rights	Author's manuscript	en_US
dc.title	CDPAM: Contrastive Learning for Perceptual Audio Similarity	en_US
dc.type	Conference Article	en_US
dc.identifier.doi	10.1109/ICASSP39728.2021.9413711	-
dc.identifier.eissn	2379-190X	-
pu.type.symplectic	http://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceeding	en_US

Files in This Item:

File	Description	Size	Format
ContrastiveLearningAudioSimilarity.pdf		980.91 kB	Adobe PDF	View/Download

Show Simple Item Record