Supported Dataset Formats

CAIMAN-ASR supports reading data from four formats:

JSONtraining + validationAll audio as wav or flac files in a single directory hierarchy with transcripts in json file(s) referencing these audio files.[link]
Webdatasettraining + validationAudio <key>.{flac,wav} files stored with associated <key>.txt transcripts in tar file shards. Format described here[link]
DirectoriesvalidationAudio (wav or flac) files and the respective text transcripts are in two separate directories.[link]
Hugging FacevalidationHugging Face Hub datasets; see here for more info.[link]

To train on your own proprietary dataset you will need to arrange for it to be in the WebDataset or JSON format. A worked example of how to do this for the JSON format is provided in


If you have a feature request to support training/validation on a different format, please open a GitHub issue.