renumics/speech_commands_enrichment_only

Name: renumics/speech_commands_enrichment_only
Creator: renumics
Published: 2023-09-28 12:25:09
License: 暂无描述

Hugging Face2023-09-28 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/renumics/speech_commands_enrichment_only

下载链接

链接失效反馈

官方服务：

资源简介：

SpeechCommands数据集包含一秒长的.wav音频文件，每个文件包含一个单独的英语单词或背景噪音。这些单词来自一组小的命令，由不同的说话者录制。数据集旨在帮助训练简单的机器学习模型，特别是用于关键词检测任务。数据集有两个版本，v0.01和v0.02，分别包含64,727和105,829个音频文件。数据集还提供了两种配置：仅包含增强信息的enrichment_only和包含原始数据与增强信息结合的raw_and_enrichment_combined。

The SpeechCommands dataset consists of 1-second-long .wav audio files, each containing a single English word or background noise. These words are sourced from a small set of commands, recorded by various speakers. The dataset is designed to facilitate the training of simple machine learning models, particularly for keyword detection tasks. There are two versions of the dataset, v0.01 and v0.02, which contain 64,727 and 105,829 audio files respectively. The dataset also provides two configurations: enrichment_only, which only includes augmented information, and raw_and_enrichment_combined, which combines raw data with augmented information.

提供机构：

renumics

原始信息汇总

数据集概述

基本信息

数据集名称: SpeechCommands
语言: 英语 (en)
许可证: CC-BY-4.0
多语言性: 单语种
大小类别: 10K<n<100K, 100K<n<1M
源数据集: 扩展自 speech_commands
任务类别: 音频分类
任务ID: 关键词识别
配置名称: v0.01, v0.02
标签: spotlight, enriched, renumics, enhanced, audio, classification, extended

数据集结构

配置详情

配置: enrichment_only

特征:
- label_string: 字符串
- probability: 浮点数 (float64)
- probability_vector: 浮点数序列 (float32)
- prediction: 整数 (int64)
- prediction_string: 字符串
- embedding_reduced: 浮点数序列 (float32)
分割:
- train: 8763867 字节, 51093 样本
- validation: 1165942 字节, 6799 样本
- test: 528408 字节, 3081 样本
下载大小: 0 字节
数据集大小: 10458217 字节

配置: raw_and_enrichment_combined

特征:
- file: 字符串
- audio: 音频 (采样率: 16000)
- label: 类别标签 (名称: 0-30)
- is_unknown: 布尔值
- speaker_id: 字符串
- utterance_id: 整数 (int8)
- logits: 浮点数序列 (float64)
- embedding: 浮点数序列 (float32)
- label_string: 字符串
- probability: 浮点数 (float64)
- probability_vector: 浮点数序列 (float32)
- prediction: 整数 (int64)
- prediction_string: 字符串
- embedding_reduced: 浮点数序列 (float32)
分割:
- train: 1803565876.375 字节, 51093 样本
- validation: 240795605.125 字节, 6799 样本
- test: 109673146.875 字节, 3081 样本
下载大小: 0 字节
数据集大小: 2154034628.375 字节

数据文件

配置: enrichment_only
- train: enrichment_only/train-*
- validation: enrichment_only/validation-*
- test: enrichment_only/test-*
配置: raw_and_enrichment_combined
- train: raw_and_enrichment_combined/train-*
- validation: raw_and_enrichment_combined/validation-*
- test: raw_and_enrichment_combined/test-*

数据实例

核心词示例

python { "file": "no/7846fd85_nohash_0.wav", "audio": { "path": "no/7846fd85_nohash_0.wav", "array": array([ -0.00021362, -0.00027466, -0.00036621, ..., 0.00079346, 0.00091553, 0.00079346]), "sampling_rate": 16000 }, "label": 1, # "no" "is_unknown": False, "speaker_id": "7846fd85", "utterance_id": 0 }

辅助词示例

python { "file": "tree/8b775397_nohash_0.wav", "audio": { "path": "tree/8b775397_nohash_0.wav", "array": array([ -0.00854492, -0.01339722, -0.02026367, ..., 0.00274658, 0.00335693, 0.0005188]), "sampling_rate": 16000 }, "label": 28, # "tree" "is_unknown": True, "speaker_id": "1b88bf70", "utterance_id": 0 }

背景噪声示例

python { "file": "silence/doing_the_dishes.wav", "audio": { "path": "silence/doing_the_dishes.wav", "array": array([ 0. , 0. , 0. , ..., -0.00592041, -0.00405884, -0.00253296]), "sampling_rate": 16000 }, "label": 30, # "silence" "is_unknown": False, "speaker_id": "None", "utterance_id": 0 # doesnt make sense here }

数据字段

file: 音频文件的相对路径
audio: 包含音频文件路径、解码后的音频数组和采样率
label: 音频样本中的单词或背景噪声类别
is_unknown: 单词是否为辅助词
speaker_id: 说话者的唯一ID
utterance_id: 同一说话者内的单词发音增量ID

数据分割

v0.01:
- train: 51093 样本
- validation: 6799 样本
- test: 3081 样本
v0.02:
- train: 84848 样本
- validation: 9982 样本
- test: 4890 样本

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集