Fftnet: A Real-Time Speaker-Dependent Neural Vocoder
Author(s): Jin, Zeyu; Finkelstein, Adam; Mysore, Gautham J; Lu, Jingwan
DownloadTo refer to this page use:
http://arks.princeton.edu/ark:/88435/pr1xn8j
Abstract: | We introduce FFTNet, a deep learning approach synthesizing audio waveforms. Our approach builds on the recent WaveNet project, which showed that it was possible to synthesize a natural sounding audio waveform directly from a deep convolutional neural network. FFTNet offers two improvements over WaveNet. First it is substantially faster, allowing for real-time synthesis of audio waveforms. Second, when used as a vocoder, the resulting speech sounds more natural, as measured via a “mean opinion score” test. |
Publication Date: | 2018 |
Citation: | Jin, Zeyu, Adam Finkelstein, Gautham J. Mysore, and Jingwan Lu. "Fftnet: A Real-Time Speaker-Dependent Neural Vocoder." 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018): pp. 2251-2255. doi:10.1109/ICASSP.2018.8462431 |
DOI: | 10.1109/ICASSP.2018.8462431 |
EISSN: | 2379-190X |
Pages: | 2251 - 2255 |
Type of Material: | Conference Article |
Journal/Proceeding Title: | 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Version: | Author's manuscript |
Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.