Synthesis Experiments | Benchmark Pt.1

21 October 2020

Selecting training datasets

Considering the fact that we will mainly be working with medical-related terminologies, I went on OPUS to search for datasets that are more appropriate for this purpose than the EN-DE toy dataset. I came across EMEA - “a parallel corpus made out of PDF documents from the European Medicines Agency”. I chose the EN-FR datasets (in nice MOSES format) to experiment with, partly because I am familiar with both languages.

TODO

Modify synthesis.py to take in two files for terminologies provided by Facebook
Run experiment to get preliminary results

Lisa Z.

Synthesis Experiments | Benchmark Pt.1

Selecting training datasets

TODO

Related Posts

Synthesis Experiments | Benchmark Pt.11 11 Apr 2021

I'm Back! 20 Mar 2021

Synthesis Experiments | Benchmark Pt.10 25 Jan 2021