Validation
Validation Command
Quick Start
To run validation, execute:
./scripts/val.sh
By default, a checkpoint saved at /results/RNN-T_best_checkpoint.pt
, with the testing-1023sp_run.yaml
model config, is evaluated on the /datasets/LibriSpeech/librispeech-dev-clean.json
manifest.
Arguments
Customise validation by specifying the --checkpoint
, --model_config
, and --val_manifests
arguments to adjust the model checkpoint, model YAML configuration, and validation manifest file(s), respectively.
To save the predictions, pass --dump_preds
as described here.
See args/val.py
and
args/shared.py
for the complete set of arguments and their respective docstrings.
Further Detail
- All references and hypotheses are normalized with the Whisper normalizer before calculating WERs, as described in the WER calculation docs. To switch off normalization, modify the respective config file entry to read
standardize_wer: false
. - During validation the state resets technique is applied by default in order to increase the model's accuracy.
- The model's accuracy can be improved by using beam search and an n-gram language model.
- Validating on long utterances is calibrated to not run out of memory on a single 11 GB GPU. If a smaller GPU is used, or utterances are longer than 2 hours, refer to this document.
Next Step
See the hardware export documentation for instructions on exporting a hardware checkpoint for inference on an accelerator.