Synthesis Experiments | Benchmark Pt.4

Tips for moving forward

Things to keep in mind after meeting with the mentors today:

Points of confusion

I had a misconception regarding how embeddings are generated. The embeddings are generated for the terminologies, not the parallel corpus, but via the monolingual corpus input. We find similar pairs to the terminologies in the monolingual data embeddings, and if these similar pairs exist in the parallel data, compute similarity between the similar pairs and the original glossary terms.