Synthesis Experiments | Benchmark Pt.11
11 April 2021BPE
Based on previous outputs, there seemed to be a problem where there are a lot of unknown tokens in the output translation (see the last line of synthesis/train_model/pass-1/fs_generate*.out
, where BLEU = 1.87
for example). It was then recommended that the corpus all be tokenized before training. The results did improve significantly, and the third pass (synthesis/train_model/pass-3
), I got BLEU = 10.92
. Still not great, but at least greatly improved.