Resuming and Fine-tuning

The --resume option to the train.sh script enables you to resume training from a --checkpoint=/path/to/checkpoint.pt file including the optimizer state. Resuming from a checkpoint will continue training from the last step recorded in the checkpoint, and the files that will be seen by the model will be the ones that would be seen if the model training was not interrupted. In the case of resuming training when using tar files, the order of the files that will be seen by the model is the same as the order that the model saw when the training started from scratch, i.e. not the same as if training had not been interrupted.

The --fine_tune option ensures that training starts anew, with a new learning rate schedule and optimizer state from the specified checkpoint.

To freeze the encoder weights during training change the enc_freeze option in the config file to:

enc_freeze: true