gilkeyio/AudioMNIST

Name: gilkeyio/AudioMNIST
Creator: gilkeyio
Published: 2023-11-22 15:28:13
License: 暂无描述

Hugging Face2023-11-22 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/gilkeyio/AudioMNIST

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: mit size_categories: - 10K<n<100K task_categories: - audio-classification configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* dataset_info: features: - name: speaker_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: digit dtype: class_label: names: '0': '0' '1': '1' '2': '2' '3': '3' '4': '4' '5': '5' '6': '6' '7': '7' '8': '8' '9': '9' - name: gender dtype: class_label: names: '0': male '1': female - name: accent dtype: string - name: age dtype: int64 - name: native_speaker dtype: bool - name: origin dtype: string splits: - name: train num_bytes: 1493209727.0 num_examples: 24000 - name: test num_bytes: 360966680.0 num_examples: 6000 download_size: 1483680961 dataset_size: 1854176407.0 --- # Dataset Card for "AudioMNIST" The [audioMNIST](https://github.com/soerenab/AudioMNIST) dataset has 50 English recordings per digit (0-9) of 60 speakers. There are 60 participants in total, with 12 being women and 48 being men, all featuring a diverse range of accents and country of origin. Their ages vary from 22 to 61 years old. This is a great dataset to explore a simple audio classification problem: either the digit or the gender. ## Bias, Risks, and Limitations * The genders represented in the dataset are unbalanced, with around 80% being men. * The majority of the speakers, around 70%, have a German accent ### Citation Information The original creators of the dataset ask you to cite [their paper](https://arxiv.org/abs/1807.03418) if you use this data: ``` @ARTICLE{becker2018interpreting, author = {Becker, S\"oren and Ackermann, Marcel and Lapuschkin, Sebastian and M\"uller, Klaus-Robert and Samek, Wojciech}, title = {Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals}, journal = {CoRR}, volume = {abs/1807.03418}, year = {2018}, archivePrefix = {arXiv}, eprint = {1807.03418}, } ```

language: - 英语（en） license: MIT许可证（mit） size_categories: - 10000 < 样本数量 < 100000 task_categories: - 音频分类（audio-classification） configs: - config_name: 默认配置（default） data_files: - split: 训练集（train） path: data/train-* - split: 测试集（test） path: data/test-* dataset_info: features: - name: 说话人ID（speaker_id） dtype: 字符串类型（string） - name: 音频（audio） dtype: audio: sampling_rate: 16000 Hz - name: 数字标签（digit） dtype: class_label: names: '0': '0' '1': '1' '2': '2' '3': '3' '4': '4' '5': '5' '6': '6' '7': '7' '8': '8' '9': '9' - name: 性别（gender） dtype: class_label: names: '0': 男性（male） '1': 女性（female） - name: 口音（accent） dtype: 字符串类型（string） - name: 年龄（age） dtype: 64位整数（int64） - name: 母语使用者（native_speaker） dtype: 布尔类型（bool） - name: 来源地（origin） dtype: 字符串类型（string） splits: - name: 训练集（train） num_bytes: 1493209727.0 num_examples: 24000 - name: 测试集（test） num_bytes: 360966680.0 num_examples: 6000 download_size: 1483680961 字节 dataset_size: 1854176407.0 字节 --- # 「AudioMNIST」数据集卡片 [AudioMNIST](https://github.com/soerenab/AudioMNIST)数据集包含60位说话人针对0至9每个数字录制的50条英语语音样本。该数据集总计涵盖60名参与者，其中女性12名、男性48名，口音与来源国均呈现多样化特征，年龄跨度为22至61岁。本数据集非常适合用于探索简单的音频分类任务，例如数字识别或性别分类。 ## 偏差、风险与局限性 * 数据集中的性别分布不均衡，男性占比约80%。 * 约70%的说话人带有德国口音。 ### 引用信息若您使用本数据集，请引用该数据集原创作者的[论文](https://arxiv.org/abs/1807.03418)： @ARTICLE{becker2018interpreting, author = {Becker, S"oren and Ackermann, Marcel and Lapuschkin, Sebastian and M"uller, Klaus-Robert and Samek, Wojciech}, title = {Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals}, journal = {CoRR}, volume = {abs/1807.03418}, year = {2018}, archivePrefix = {arXiv}, eprint = {1807.03418}, }

提供机构：

gilkeyio

原始信息汇总

数据集概述

基本信息

语言: 英语
许可证: MIT
数据规模: 10K<n<100K
任务类别: 音频分类

配置

默认配置:
- 训练数据: data/train-*
- 测试数据: data/test-*

数据集信息

特征

speaker_id: 字符串
audio: 音频，采样率16000
digit: 类别标签，包含数字0-9
gender: 类别标签，包含男性（0）和女性（1）
accent: 字符串
age: 整数
native_speaker: 布尔值
origin: 字符串

数据分割

训练集:
- 字节数: 1493209727.0
- 样本数: 24000
测试集:
- 字节数: 360966680.0
- 样本数: 6000

数据大小

下载大小: 1483680961
数据集大小: 1854176407.0

搜集汇总

数据集介绍

构建方式

AudioMNIST数据集的构建基于50个英语发音人的数字（0-9）录音，共计60位发音人参与，其中女性12人，男性48人。这些发音人拥有多样化的口音和国籍，年龄介于22至61岁之间。数据集通过将录音分割为训练集和测试集，为音频分类任务提供了基础数据结构。

使用方法

使用AudioMNIST数据集时，用户可以直接通过HuggingFace提供的接口下载并加载训练集和测试集。数据集的结构包括发音人ID、音频数据、数字标签、性别、口音、年龄、母语和来源等信息，这些特征可以用于构建和训练音频分类模型。用户在应用此数据集时应注意其内在的偏差和局限性。

背景与挑战

背景概述

AudioMNIST数据集，创建于2018年，由Becker等人精心构建，旨在为音频信号分类研究提供基础资源。该数据集汇集了60位不同性别、年龄、口音及国籍的发言者所录制的数字（0-9）音频，总计3000个音频样本。其核心研究问题聚焦于简单音频信号的分类，尤其是数字识别与性别识别，对音频信号处理、机器学习等领域产生了显著影响。

当前挑战

尽管AudioMNIST数据集为相关研究提供了宝贵的资源，但在实际应用中仍面临诸多挑战。首先，数据集中的性别比例失衡，男性占80%，这可能导致模型在性别分类上出现偏差。其次，约70%的发言者具有德国口音，这可能限制了模型在处理其他口音或语言时的泛化能力。此外，构建过程中如何有效平衡数据集的多样性与代表性，以确保模型的公平性和鲁棒性，亦是当前亟待解决的问题。

常用场景

经典使用场景

在音频信号处理的学术领域中，gilkeyio/AudioMNIST数据集被广泛用于探索基础的声音分类问题。该数据集包含数字0至9的英语录音，每数字有50个录音，由60位不同口音和地域的男女演讲者录制。其经典的运用场景是构建和训练深度神经网络模型，以实现对录音中数字或性别的准确识别。

解决学术问题

该数据集解决了音频分类任务中的性别和数字识别问题，为研究者提供了一个均衡的音频样本集合，尽管存在性别比例和口音分布上的不平衡。它帮助学者们理解深度神经网络在音频信号分类中的应用和解释性，为音频数据集的偏见、风险和局限性研究提供了实证基础。

实际应用

在实用层面，AudioMNIST数据集被应用于语音识别系统的开发，如自动语音识别（ASR）和语音助手，以提升这些系统对不同性别和口音的识别能力。此外，它也为教育领域提供了丰富的学习资源，帮助学生和研究人员更好地理解音频信号处理和深度学习技术。

数据集最近研究