Prosody Loss #15

inconnu11 · 2022-10-19T02:02:51Z

Hi, I am adding your MDN prosody modeling code segment to my tacotron but I encountered several problems about the code segment about prosody modeling. First, the prosody loss is added into the total loss only after the prosody_loss_enable_steps but in the training steps before the prosody_loss_enable_steps the prosody representation is already added with the text encoding. Does it means in the training steps before the prosody_loss_enable_steps, the prosody representation is optimized without the prosody loss?
Second, in the training steps, the backward gradient of training prosody predictor should be acted like "stop gradient" but it seems little relevant code.
Thanks!

The text was updated successfully, but these errors were encountered:

keonlee9420 · 2022-10-22T06:21:25Z

Hi @inconnu11 , thanks for your attention.

My intention was to prevent the prosody encoder learning meaningless representations at the first few training steps. But you can remove prosody_loss_enable_steps (by setting it as 1 for example) if you don't care. Otherwise, there should be no gain from backprop through prosody encoder even it's still added to the text hidden.

inconnu11 · 2022-10-26T02:17:09Z

Hi, I got it and thanks for the reply. But when I run the code with the default setting with LJSpeech corpus except toggle the type of prosody modelings to 'du2021', the prosody loss at prosody_loss_enable_steps(10w by default) is nan.

keonlee9420 · 2022-10-30T14:17:09Z

Hmm, it's weird. If you have room for that, could you please do some sanity checks on your side? For example, removing some part of the code to make it simpler until the nan loss disappear would be one. It will definitely be helpful for others interested in this issue.

inconnu11 · 2022-10-31T03:02:27Z

I'd like to do so. But it takes too long to train it. I have to train the model for 7days with one gpu T4. Are there any parts
of the code can speed up the training process?

cpdu · 2022-10-31T13:02:09Z

Hi,

I'm the author of this paper. My code for calculating the MDN loss is here with a small numerical stability trick:

Does that help?

inconnu11 · 2022-11-02T13:56:48Z

Hi, I change the mdn loss calculation from fig1 to fig2 . But it doesn't seem to work.

original MDN loss:

newer MDN loss:

cpdu · 2022-11-03T12:17:45Z

The MDN loss (i.e. negative log-likelihood) can be negative value. However, in your log, it is almost 0 before becoming nan.
I guess maybe you can check whether you calculate the likelihood correctly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prosody Loss #15

Prosody Loss #15

inconnu11 commented Oct 19, 2022

keonlee9420 commented Oct 22, 2022

inconnu11 commented Oct 26, 2022

keonlee9420 commented Oct 30, 2022

inconnu11 commented Oct 31, 2022

cpdu commented Oct 31, 2022

inconnu11 commented Nov 2, 2022

cpdu commented Nov 3, 2022

Prosody Loss #15

Prosody Loss #15

Comments

inconnu11 commented Oct 19, 2022

keonlee9420 commented Oct 22, 2022

inconnu11 commented Oct 26, 2022

keonlee9420 commented Oct 30, 2022

inconnu11 commented Oct 31, 2022

cpdu commented Oct 31, 2022

inconnu11 commented Nov 2, 2022

cpdu commented Nov 3, 2022