1 / 14
文档名称:

An RNN-based prosodic information synthesizer for Mandarin….pdf

格式:pdf   页数:14页
下载后只包含 1 个 PDF 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

An RNN-based prosodic information synthesizer for Mandarin….pdf

上传人:薄荷牛奶 2016/6/7 文件大小:0 KB

下载得到文件列表

An RNN-based prosodic information synthesizer for Mandarin….pdf

相关文档

文档介绍

文档介绍:226 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 6, NO. 3, MAY 1998 An RNN-Based Prosodic Information Synthesizer for Mandarin Text-to-Speech Sin-Horng Chen, Senior Member, IEEE, Shaw-Hwa Hwang, and Yih-Ru Wang Abstract— A new RNN-based prosodic information synthesizer for Mandarin Chinese text-to-speech (TTS) is proposed in this paper. Its four-layer recurrent work (RNN) generates prosodic information such as syllable pitch contours, syllable energy levels, syllable initial and ?nal durations, as well as inter- syllable pause durations. The input layer and ?rst hidden layer operate with a word-synchronized clock to represent current- word phonologic states within the prosodic structure of text to be synthesized. The second hidden layer and output layer operate on a syllable-synchronized clock and use outputs from the preceding layers, along with additional syllable-level inputs fed directly to the second hidden layer, to generate desired prosodic parameters. The RNN was trained on a large set of actual ut- terances panied by associated texts, and can automatically learn many human-prosody phonologic rules, including the well- known Sandhi Tone 3 F0-change rule. Experimental results show that all synthesized prosodic parameter sequences matched quite well with their original counterparts, and a pitch-synchronous- overlap-add-based (PSOLA-based) Mandarin TTS system was also used for testing of our approach. While subjective tests are dif?cult to perform and remain to be done in the future, we have carried out informal listening tests by a signi?cant number of native Chinese speakers and the results con?rmed that all synthesized speech sounded quite natural. Index Terms— Mandarin, pitch contour, prosodic information synthesizer, recurrent work, text-to-speech. I. I NTRODUCTION I N THIS paper, a new data-driven method of prosodic information synthesis for Mandarin text-to-speech (TTS) is presented. The basic idea is to use a model to expl