Gated Recurrent Attention for Multi-Style Speech Synthesis
Introduction
We propose a novel attention model based on gated recurrence, which we call gated recurrent attention (GRA). GRA controls the contextual information by employing two gates. To show the GRA's alignment and style modeling performance, we upload some samples synthesized by Tacotron-GST with the location-sensitive attention (LSA) and GRA. The models were trained only on MAILABS-US corpus then synthesized the samples with style-references in the MAILABS-US and VCTK.
MAILABS style-references
Elliot
Reference
Reference
Reference
LSA
LSA
LSA
GRA
GRA
GRA
Judy
Reference
Reference
Reference
LSA
LSA
LSA
GRA
GRA
GRA
Mary
Reference
Reference
Reference
LSA
LSA
LSA
GRA
GRA
GRA
VCTK style-references
p248
Reference
Reference
LSA
LSA
GRA
GRA
p270
Reference
Reference
LSA
LSA
GRA
GRA
p295
Reference
Reference
LSA
LSA
GRA
GRA
p270
Reference
Reference
LSA
LSA
GRA
GRA