Do not use test and validation datasets whilst building the vocabulary #179

Omer · 2017-04-03T01:33:02Z

I'm rather new to the concepts of RNNs (and haven't written any python in years!) so I hope I'm not making a fundamental mistake. 😄

Reading preprocess.py I could not help but notice that we build the vocabulary using the entire input file (i.e. we also use the test and validation data). Was this an intentional decision? Wouldn't that mean using vocabulary that is not present in the training data during training?

Thank you for taking the time implementing this awesome library!

Do not use test and validation datasets whilst building the vocabulary

a06eb6b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not use test and validation datasets whilst building the vocabulary #179

Do not use test and validation datasets whilst building the vocabulary #179

Omer commented Apr 3, 2017

Do not use test and validation datasets whilst building the vocabulary #179

Are you sure you want to change the base?

Do not use test and validation datasets whilst building the vocabulary #179

Conversation

Omer commented Apr 3, 2017