Gated Recurrent Attention for Multi-Style Speech Synthesis

Sung Jun Cheon Joun Yeop Lee Byoung Jin Choi Hyeonseung Lee Nam Soo Kim

Introduction

We propose a novel attention model based on gated recurrence, which we call gated recurrent attention (GRA). GRA controls the contextual information by employing two gates. To show the GRA's alignment and style modeling performance, we upload some samples synthesized by Tacotron-GST with the location-sensitive attention (LSA) and GRA. The models were trained only on MAILABS-US corpus then synthesized the samples with style-references in the MAILABS-US and VCTK.


MAILABS style-references

Elliot

Reference

Reference

Reference

LSA

LSA

LSA

GRA

GRA

GRA

Judy

Reference

Reference

Reference

LSA

LSA

LSA

GRA

GRA

GRA

Mary

Reference

Reference

Reference

LSA

LSA

LSA

GRA

GRA

GRA


VCTK style-references

p248

Reference

Reference

LSA

LSA

GRA

GRA

p270

Reference

Reference

LSA

LSA

GRA

GRA

p295

Reference

Reference

LSA

LSA

GRA

GRA

p270

Reference

Reference

LSA

LSA

GRA

GRA