hshankar113/Vaani-TamilNadu-Merged
收藏Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/hshankar113/Vaani-TamilNadu-Merged
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含音频文件及其相关元数据,如语言、时长、说话者信息(包括ID、已知语言、性别)、地理位置信息(州、区、邮编)、居住年限、转录可用性、转录文本、参考图像、说话者图像哈希值、话语序列ID和原始子集。数据集分为训练集和验证集,训练集包含813,431个样本,验证集包含90,382个样本。
The dataset contains audio files along with associated metadata such as language, duration, speaker information (including ID, known languages, gender), geographical details (state, district, pincode), years of stay, transcription availability, transcript text, reference image, speaker image hash, utterance sequence ID, and original subset. The dataset is split into training and validation sets, with the training set containing 813,431 examples and the validation set containing 90,382 examples.
提供机构:
hshankar113



