Synthesis Experiments | Benchmark Pt.5

22 November 2020

Using Other Datasets

I soon ran into the problem where the CommonCrawl datasets available to use were a bit too messy & not yet processed, and the clean data were not recent enough to contain the terminologies. While the hunt for good datasets to use is still on, I tried NeuLab’s covid-datashare repository. The datasets here were not quite long enough, and did not contain all the single-word terminologies as I had hoped, causing the program to crash because certain embeddings could not be generated.

Lisa Z.

Synthesis Experiments | Benchmark Pt.5

Using Other Datasets

Related Posts

Synthesis Experiments | Benchmark Pt.11 11 Apr 2021

I'm Back! 20 Mar 2021

Synthesis Experiments | Benchmark Pt.10 25 Jan 2021