anilkeshwani/MLS_english_train_strat_sample_aligned_hubert
收藏Hugging Face2024-11-28 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/anilkeshwani/MLS_english_train_strat_sample_aligned_hubert
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个特征字段,如ID、转录文本、分词结果、标准化文本、uroman编码、语音标记、对齐标记的开始时间和结束时间。数据集分为训练集和开发集,训练集包含2,702,009个样本,开发集包含3,807个样本。数据集的下载大小为6,655,821,032字节,总大小为21,525,072,952字节。
The dataset includes multiple feature fields such as ID, transcript, tokenized text, normalized text, uroman encoding, speech tokens, aligned token start time, and aligned token end time. The dataset is divided into a training set and a development set, with the training set containing 2,702,009 samples and the development set containing 3,807 samples. The download size of the dataset is 6,655,821,032 bytes, and the total size is 21,525,072,952 bytes.
提供机构:
anilkeshwani



