Bandwidth Extension is All You Need

Su, Jiaqi; Wang, Yunyun; Finkelstein, Adam; Jin, Zeyu

Bandwidth Extension is All You Need

Author(s): Su, Jiaqi; Wang, Yunyun; Finkelstein, Adam; Jin, Zeyu

Download

To refer to this page use: http://arks.princeton.edu/ark:/88435/pr12f7jr1j

Full metadata record

DC Field	Value	Language
dc.contributor.author	Su, Jiaqi	-
dc.contributor.author	Wang, Yunyun	-
dc.contributor.author	Finkelstein, Adam	-
dc.contributor.author	Jin, Zeyu	-
dc.date.accessioned	2023-12-23T22:24:32Z	-
dc.date.available	2023-12-23T22:24:32Z	-
dc.date.issued	2021	en_US
dc.identifier.citation	Su, Jiaqi, Yunyun Wang, Adam Finkelstein, and Zeyu Jin. "Bandwidth extension is all you need." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 696-700. 2021. doi:10.1109/ICASSP39728.2021.9413575	en_US
dc.identifier.issn	1520-6149	-
dc.identifier.uri	https://gfx.cs.princeton.edu/pubs/Su_2021_BEI/ICASSP2021_Su_Wang_BWE.pdf	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/pr12f7jr1j	-
dc.description.abstract	Speech generation and enhancement have seen recent breakthroughs in quality thanks to deep learning. These methods typically operate at a limited sampling rate of 16-22kHz due to computational complexity and available datasets. This limitation imposes a gap between the output of such methods and that of high-fidelity (≥44kHz) real-world audio applications. This paper proposes a new bandwidth extension (BWE) method that expands 8-16kHz speech signals to 48kHz. The method is based on a feed-forward WaveNet architecture trained with a GAN-based deep feature loss. A mean-opinion-score (MOS) experiment shows significant improvement in quality over state-of-the-art BWE methods. An AB test reveals that our 16-to-48kHz BWE is able to achieve fidelity that is typically indistinguishable from real high-fidelity recordings. We use our method to enhance the output of recent speech generation and denoising methods, and experiments demonstrate significant improvement in sound quality over these baselines. We propose this as a general approach to narrow the gap between generated speech and recorded speech, without the need to adapt such methods to higher sampling rates.	en_US
dc.format.extent	696 - 700	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartof	ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	en_US
dc.rights	Author's manuscript	en_US
dc.title	Bandwidth Extension is All You Need	en_US
dc.type	Conference Article	en_US
dc.identifier.doi	10.1109/ICASSP39728.2021.9413575	-
dc.identifier.eissn	2379-190X	-
pu.type.symplectic	http://www.symplectic.co.uk/publications/atom-terms/1.0/conference-proceeding	en_US

Files in This Item:

File	Description	Size	Format
BandwidthExtension.pdf		1.38 MB	Adobe PDF	View/Download

Show Simple Item Record