babs/language_classification
收藏Hugging Face2024-06-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/babs/language_classification
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个特征,包括客户端ID、路径、音频数据、句子、年龄、性别和语言等。音频数据的采样率为48000。语言特征是一个分类标签,涵盖了从阿拉伯语到威尔士语的多种语言。数据集分为训练集、验证集和测试集,分别包含22194、5888和5963个示例,总下载大小为4661574412字节,数据集总大小为4989805421.245字节。
This dataset includes multiple features such as client ID, path, audio data, sentence, age, gender, and language. The audio data has a sampling rate of 48000. The language feature is a class label covering multiple languages from Arabic to Welsh. The dataset is divided into training, validation, and test sets, containing 22194, 5888, and 5963 examples respectively, with a total download size of 4661574412 bytes and a total dataset size of 4989805421.245 bytes.
提供机构:
babs
原始信息汇总
数据集概述
特征信息
- client_id: 数据类型为字符串。
- path: 数据类型为字符串。
- audio: 包含音频信息,采样率为48000。
- sentence: 数据类型为字符串。
- age: 数据类型为字符串。
- gender: 数据类型为字符串。
- language: 数据类型为分类标签,包含以下语言类别:
- 0: Arabic
- 1: Basque
- 2: Breton
- 3: Catalan
- 4: Chinese_China
- 5: Chinese_Hongkong
- 6: Chinese_Taiwan
- 7: Chuvash
- 8: Czech
- 9: Dhivehi
- 10: Dutch
- 11: English
- 12: Esperanto
- 13: Estonian
- 14: French
- 15: Frisian
- 16: Georgian
- 17: German
- 18: Greek
- 19: Hakha_Chin
- 20: Indonesian
- 21: Interlingua
- 22: Italian
- 23: Japanese
- 24: Kabyle
- 25: Kinyarwanda
- 26: Kyrgyz
- 27: Latvian
- 28: Maltese
- 29: Mangolian
- 30: Persian
- 31: Polish
- 32: Portuguese
- 33: Romanian
- 34: Romansh_Sursilvan
- 35: Russian
- 36: Sakha
- 37: Slovenian
- 38: Spanish
- 39: Swedish
- 40: Tamil
- 41: Tatar
- 42: Turkish
- 43: Ukranian
- 44: Welsh
数据分割
- train: 包含22194个样本,大小为3314521547.844字节。
- validation: 包含5888个样本,大小为862669934.664字节。
- test: 包含5963个样本,大小为812613938.737字节。
数据集大小
- 下载大小: 4661574412字节。
- 数据集大小: 4989805421.245字节。
配置信息
- config_name: default
- data_files:
- train: 路径为
data/train-* - validation: 路径为
data/validation-* - test: 路径为
data/test-*
- train: 路径为
- data_files:



