renumics/speech_commands_enrichment_only
收藏数据集概述
基本信息
- 数据集名称: SpeechCommands
- 语言: 英语 (en)
- 许可证: CC-BY-4.0
- 多语言性: 单语种
- 大小类别: 10K<n<100K, 100K<n<1M
- 源数据集: 扩展自 speech_commands
- 任务类别: 音频分类
- 任务ID: 关键词识别
- 配置名称: v0.01, v0.02
- 标签: spotlight, enriched, renumics, enhanced, audio, classification, extended
数据集结构
配置详情
配置: enrichment_only
- 特征:
label_string: 字符串probability: 浮点数 (float64)probability_vector: 浮点数序列 (float32)prediction: 整数 (int64)prediction_string: 字符串embedding_reduced: 浮点数序列 (float32)
- 分割:
train: 8763867 字节, 51093 样本validation: 1165942 字节, 6799 样本test: 528408 字节, 3081 样本
- 下载大小: 0 字节
- 数据集大小: 10458217 字节
配置: raw_and_enrichment_combined
- 特征:
file: 字符串audio: 音频 (采样率: 16000)label: 类别标签 (名称: 0-30)is_unknown: 布尔值speaker_id: 字符串utterance_id: 整数 (int8)logits: 浮点数序列 (float64)embedding: 浮点数序列 (float32)label_string: 字符串probability: 浮点数 (float64)probability_vector: 浮点数序列 (float32)prediction: 整数 (int64)prediction_string: 字符串embedding_reduced: 浮点数序列 (float32)
- 分割:
train: 1803565876.375 字节, 51093 样本validation: 240795605.125 字节, 6799 样本test: 109673146.875 字节, 3081 样本
- 下载大小: 0 字节
- 数据集大小: 2154034628.375 字节
数据文件
- 配置: enrichment_only
train: enrichment_only/train-*validation: enrichment_only/validation-*test: enrichment_only/test-*
- 配置: raw_and_enrichment_combined
train: raw_and_enrichment_combined/train-*validation: raw_and_enrichment_combined/validation-*test: raw_and_enrichment_combined/test-*
数据实例
核心词示例
python { "file": "no/7846fd85_nohash_0.wav", "audio": { "path": "no/7846fd85_nohash_0.wav", "array": array([ -0.00021362, -0.00027466, -0.00036621, ..., 0.00079346, 0.00091553, 0.00079346]), "sampling_rate": 16000 }, "label": 1, # "no" "is_unknown": False, "speaker_id": "7846fd85", "utterance_id": 0 }
辅助词示例
python { "file": "tree/8b775397_nohash_0.wav", "audio": { "path": "tree/8b775397_nohash_0.wav", "array": array([ -0.00854492, -0.01339722, -0.02026367, ..., 0.00274658, 0.00335693, 0.0005188]), "sampling_rate": 16000 }, "label": 28, # "tree" "is_unknown": True, "speaker_id": "1b88bf70", "utterance_id": 0 }
背景噪声示例
python { "file": "silence/doing_the_dishes.wav", "audio": { "path": "silence/doing_the_dishes.wav", "array": array([ 0. , 0. , 0. , ..., -0.00592041, -0.00405884, -0.00253296]), "sampling_rate": 16000 }, "label": 30, # "silence" "is_unknown": False, "speaker_id": "None", "utterance_id": 0 # doesnt make sense here }
数据字段
file: 音频文件的相对路径audio: 包含音频文件路径、解码后的音频数组和采样率label: 音频样本中的单词或背景噪声类别is_unknown: 单词是否为辅助词speaker_id: 说话者的唯一IDutterance_id: 同一说话者内的单词发音增量ID
数据分割
- v0.01:
train: 51093 样本validation: 6799 样本test: 3081 样本
- v0.02:
train: 84848 样本validation: 9982 样本test: 4890 样本




