CLSP Cluster Notes | Day 5
2 September 2020fairseq-train
To train the model, in addition to the tutorial on fairseq’s webpage, I had to specify an optimizer. I went with SGD
:
It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data). Especially in high-dimensional optimization problems this reduces the computational burden, achieving faster iterations in trade for a lower convergence rate.
Probably should have chosen Adam
, which seems more popular now. Perhaps I will train the model on Adam as well and compare output.
I set --max-tokens
to 500
- not sure if 4000
would be too much to ask of the GPU.
Speaking of GPU, I also included Guanghui’s script to acquire one.
Job stopped on the 18th epoch. I will go do my homework now - fairseq-generate
coming up next.