To refer to this page use:
|Abstract:||State-of-the art voice conversion methods re-synthesize voice from spectral representations such as MFCCs and STRAIGHT, thereby introducing muffled artifacts. We propose a method that circumvents this concern using concatenative synthesis coupled with exemplar-based unit selection. Given parallel speech from source and target speakers as well as a new query from the source, our method stitches together pieces of the target voice. It optimizes for three goals: matching the query, using long consecutive segments, and smooth transitions between the segments. To achieve these goals, we perform unit selection at the frame level and introduce triphone-based preselection that greatly reduces computation and enforces selection of long, contiguous pieces. Our experiments show that the proposed method has better quality than baseline methods, while preserving high individuality.|
|Citation:||Jin, Zeyu, Adam Finkelstein, Stephen DiVerdi, Jingwan Lu, and Gautham J. Mysore. "Cute: A concatenative method for voice conversion using exemplar-based unit selection." 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016): pp. 5660-5664. doi:10.1109/ICASSP.2016.7472761|
|Pages:||5660 - 5664|
|Type of Material:||Conference Article|
|Journal/Proceeding Title:||IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)|
Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.