A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences

Manocha, Pranay; Finkelstein, Adam; Zhang, Richard; Bryan, Nicholas J; Mysore, Gautham J; Jin, Zeyu

A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences

Author(s): Manocha, Pranay; Finkelstein, Adam; Zhang, Richard; Bryan, Nicholas J; Mysore, Gautham J; et al

Download

To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1c846

Full metadata record

DC Field	Value	Language
dc.contributor.author	Manocha, Pranay	-
dc.contributor.author	Finkelstein, Adam	-
dc.contributor.author	Zhang, Richard	-
dc.contributor.author	Bryan, Nicholas J	-
dc.contributor.author	Mysore, Gautham J	-
dc.contributor.author	Jin, Zeyu	-
dc.date.accessioned	2021-10-08T19:51:07Z	-
dc.date.available	2021-10-08T19:51:07Z	-
dc.date.issued	2020	en_US
dc.identifier.citation	Manocha, Pranay, Adam Finkelstein, Richard Zhang, Nicholas J. Bryan, Gautham J. Mysore, and Zeyu Jin. "A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences." Proc. Interspeech (2020): pp. 2852-2856. doi:10.21437/Interspeech.2020-1191	en_US
dc.identifier.uri	https://arxiv.org/pdf/2001.04460.pdf	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/pr1c846	-
dc.description.abstract	Many audio processing tasks require perceptual assessment. The “gold standard” of obtaining human judgments is time-consuming, expensive, and cannot be used as an optimization criterion. On the other hand, automated metrics are efficient to compute but often correlate poorly with human judgment, particularly for audio differences at the threshold of human detection. In this work, we construct a metric by fitting a deep neural network to a new large dataset of crowdsourced human judgments. Subjects are prompted to answer a straightforward, objective question: are two recordings identical or not? These pairs are algorithmically generated under a variety of perturbations, including noise, reverb, and compression artifacts; the perturbation space is probed with the goal of efficiently identifying the just-noticeable difference (JND) level of the subject. We show that the resulting learned metric is well-calibrated with human judgments, outperforming baseline methods. Since it is a deep network, the metric is differentiable, making it suitable as a loss function for other tasks. Thus, simply replacing an existing loss (e.g., deep feature loss) with our metric yields significant improvement in a denoising network, as measured by subjective pairwise comparison.	en_US
dc.format.extent	2852 - 2856	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartof	Proc. Interspeech	en_US
dc.rights	Author's manuscript	en_US
dc.title	A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences	en_US
dc.type	Conference Article	en_US
dc.identifier.doi	10.21437/Interspeech.2020-1191	-
pu.type.symplectic	http://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceeding	en_US

Files in This Item:

File	Description	Size	Format
DifferentiablePerceptual.pdf		1.03 MB	Adobe PDF	View/Download

Show Simple Item Record