ML training flow

This document describes the flow of training the base model on LibriSpeech. This configuration is used as an example as it is quicker to train than large.

Environment Setup

Clone the repo, build the image and set up the container with the appropriate volumes (as described here) with the following commands:

git clone https://github.com/MyrtleSoftware/caiman-asr.git && cd caiman-asr/training
./scripts/docker/build.sh
./scripts/docker/launch.sh <DATASETS> <CHECKPOINTS> <RESULTS>

Data Preparation

From inside the container, run the following command to download LibriSpeech, prepare JSON manifests, create a tokenizer, and a populated yaml configuration file configs/base-8703sp_run.yaml.

./scripts/prepare_librispeech.sh

More details on preparing LibriSpeech into a JSON format can be found here.

Training

Modify <NUM_GPU> based on your machine and then run the following command to train a base model. A more detailed description of the training process can be found here.

./scripts/train.sh \
  --data_dir /datasets/LibriSpeech \
  --train_manifests librispeech-train-clean-100-flac.json librispeech-train-clean-360-flac.json librispeech-train-other-500-flac.json \
  --val_manifests librispeech-dev-clean-flac.json \
  --model_config configs/base-8703sp_run.yaml \
  --num_gpus 2 \
  --global_batch_size 1024 \
  --grad_accumulation_batches 8 \
  --batch_split_factor 8 \
  --val_batch_size 1 \
  --training_steps 42000

In particular, this command assumes you're using a 2 x RTX4090 (24GB) system. See here for how to adjust these numbers for your system.

Validation

The following command will run the validation script and calculate the WER [%]. See here for more details.

./scripts/val.sh --model_config configs/base-8703sp_run.yaml