Cute: A concatenative method for voice conversion using exemplar-based unit selection
Author(s): Jin, Zeyu; Finkelstein, Adam; Diverdi, Stephen; Lu, Jingwan; Mysore, Gautham J
DownloadTo refer to this page use:
http://arks.princeton.edu/ark:/88435/pr1652h
Abstract: | State-of-the art voice conversion methods re-synthesize voice from spectral representations such as MFCCs and STRAIGHT, thereby introducing muffled artifacts. We propose a method that circumvents this concern using concatenative synthesis coupled with exemplar-based unit selection. Given parallel speech from source and target speakers as well as a new query from the source, our method stitches together pieces of the target voice. It optimizes for three goals: matching the query, using long consecutive segments, and smooth transitions between the segments. To achieve these goals, we perform unit selection at the frame level and introduce triphone-based preselection that greatly reduces computation and enforces selection of long, contiguous pieces. Our experiments show that the proposed method has better quality than baseline methods, while preserving high individuality. |
Publication Date: | 2016 |
Citation: | Jin, Zeyu, Adam Finkelstein, Stephen DiVerdi, Jingwan Lu, and Gautham J. Mysore. "Cute: A concatenative method for voice conversion using exemplar-based unit selection." 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016): pp. 5660-5664. doi:10.1109/ICASSP.2016.7472761 |
DOI: | 10.1109/ICASSP.2016.7472761 |
EISSN: | 2379-190X |
Pages: | 5660 - 5664 |
Type of Material: | Conference Article |
Journal/Proceeding Title: | IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Version: | Author's manuscript |
Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.