Matcha compared to Vits #97

yygg678 · 2024-09-22T05:35:46Z

I replicated the results of VITS and Matcha-TTS on a single speaker Chinese dataset and found that the timbre similarity of Matcha-TTS is lower than that of VITS, especially in the high-frequency details of the spectrum. Below are the spectrograms of VITS and Matcha-TTS. Is there any way to improve the timbre similarity of Matcha-TTS?

shivammehta25 · 2024-09-26T14:29:43Z

Hi! That is a cool experiment.

Did you fine-tune the vocoder too? Why I am asking this is because: VITS has a built-in vocoder as it is an end-to-end TTS system. On the other hand, Matcha is an acoustic model where we learn to generate text-to-(log-mel-spectrogram). Currently, we have been using off-the-shelf neural vocoders namely, HiFiGAN, without fine-tuning them for matcha's Once log-mel-spectrogram output.

I think to fix this, you will have to fine-tune the vocoder. One way to do this would be to extract alignments. Then, use this extracted alignments to generate instead of the duration predictor's outputs, save the log-mel-spectrogram and then finetune, the vocoder.

One easier experiment might be to try switching the vocoder. You can switch from HiFiGAN to BigVGan off the shelf they use the same SFT parameters so, you don't need to retrain Matcha with different SFT settings.

Hope this helps, let me know if you have more questions :)

One side note is Matcha also has a temperature parameter: the more the temperature the more variance will be to the generated output it is also used only during the inference/generation so you can easily play with it. However, I still feel this is a vocoder artefact as end-to-end models have a waveform generation objective to optimise, while acoustic models do not.

yygg678 · 2024-09-29T02:11:52Z

ok, thanks！ I will try to fine-tune the vocoder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matcha compared to Vits #97

Matcha compared to Vits #97

yygg678 commented Sep 22, 2024

shivammehta25 commented Sep 26, 2024

yygg678 commented Sep 29, 2024

Matcha compared to Vits #97

Matcha compared to Vits #97

Comments

yygg678 commented Sep 22, 2024

shivammehta25 commented Sep 26, 2024

yygg678 commented Sep 29, 2024