Synthesis Experiments | Benchmark Pt.11

BPE

Based on previous outputs, there seemed to be a problem where there are a lot of unknown tokens in the output translation (see the last line of synthesis/train_model/pass-1/fs_generate*.out, where BLEU = 1.87 for example). It was then recommended that the corpus all be tokenized before training. The results did improve significantly, and the third pass (synthesis/train_model/pass-3), I got BLEU = 10.92. Still not great, but at least greatly improved.