Validation

Validation Command

Quick Start

To run validation, execute:

./scripts/val.sh

By default, a checkpoint saved at /results/RNN-T_best_checkpoint.pt, with the testing-1023sp_run.yaml model config, is evaluated on the /datasets/LibriSpeech/librispeech-dev-clean-flac.json manifest.

Arguments

Customise validation by specifying the --checkpoint, --model_config, and --val_manifests arguments to adjust the model checkpoint, model YAML configuration, and validation manifest file(s), respectively.

Predictions are saved as described here.

See args/val.py and args/shared.py for the complete set of arguments and their respective docstrings.

Further Detail

All references and hypotheses are normalized with the Whisper normalizer before calculating WERs, as described in the WER calculation docs. To switch off normalization, modify the respective config file entry to read standardize_wer: false.
During validation the state resets technique is applied by default in order to increase the model’s accuracy.
The model’s accuracy can be improved by using beam search and an n-gram language model.
Validating on long utterances is calibrated to not run out of memory on a single 11 GB GPU. If a smaller GPU is used, or utterances are longer than 2 hours, refer to this document.
By default during validation, all input audios are padded with 0.96s of silence at the end so that the model has time to output the final tokens. You can change this using the --val_final_padding_secs flag.

Next Step

See the hardware export documentation for instructions on exporting a hardware checkpoint for inference on an accelerator.

CAIMAN-ASR