b111111/Silver-Audio-Dataset

Name: b111111/Silver-Audio-Dataset
Creator: b111111
Published: 2026-04-09 08:49:13
License: 暂无描述

Hugging Face2026-04-09 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/b111111/Silver-Audio-Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - feature-extraction language: - ko tags: - audio - homecam - numpy viewer: false size_categories: - 100M<n<1B --- ## Dataset Overview - The dataset is a curated collection of `.npy` files containing MFCC features extracted from raw audio recordings. - It has been specifically designed for training and evaluating machine learning models in the context of real-world emergency sound detection and classification tasks. - The dataset captures diverse audio scenarios, making it a robust resource for developing safety-focused AI systems, such as the `SilverAssistant` project. ## Dataset Descriptions - The dataset used for this audio model consists of `.npy` files containing MFCC features extracted from raw audio recordings. These recordings include various real-world scenarios, such as: - `violent_crime`: Violence / Criminal activities (폭력/범죄) - `fall`: Fall down (낙상) - `help_request`: Cries for help (도움 요청) - `daily-1`, `daily-2`: Normal indoor sounds (일상) - Feature Extraction Process 1. Audio Collection: - Audio samples were sourced from datasets, such as AI Hub, to ensure coverage of diverse scenarios. - These include emergency and non-emergency sounds to train the model for accurate classification. 2. MFCC Extraction: - The raw audio signals were processed to extract Mel-Frequency Cepstral Coefficients (MFCC). - The MFCC features effectively capture the frequency characteristics of the audio, making them suitable for sound classification tasks. ![MFCC Output](./pics/mfcc-output.png) 3. Output Format: - The extracted MFCC features are saved as `13 x n` numpy arrays, where: - 13: Represents the number of MFCC coefficients (features). - n: Corresponds to the number of frames in the audio segment. 4. Saved Dataset: - The processed `13 x n` MFCC arrays are stored as `.npy` files, which serve as the direct input to the model. - Adaptation in `SilverAssistant` project: [HuggingFace SilverAudio Model](https://huggingface.co/SilverAvocado/Silver-Audio) ## Data Source - Source: [AI Hub 위급상황 음성/음향](https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=&topMenu=&aihubDataSe=data&dataSetSn=170)

任务类别： - 特征提取语言： - 韩语标签： - 音频 - homecam - numpy 数据集查看：禁用数据规模区间： - 100M < n < 1B ## 数据集概览 - 本数据集为经筛选整理的`.npy`格式文件集合，内含从原始音频录制文件中提取的梅尔频率倒谱系数（Mel-Frequency Cepstral Coefficients，MFCC）特征。 - 本数据集专为真实场景下的紧急声音检测与分类任务中的机器学习模型训练与评估而设计。 - 本数据集涵盖多样化音频场景，是开发面向安全场景的人工智能系统（如`SilverAssistant`项目）的可靠资源。 ## 数据集详情 - 本音频模型所用数据集由内含MFCC特征的`.npy`格式文件组成，这些特征提取自原始音频录制文件，涵盖以下多种真实场景： - `violent_crime`：暴力/犯罪活动（韩文标注：폭력/범죄） - `fall`：跌倒事件（韩文标注：낙상） - `help_request`：求助呼救（韩文标注：도움 요청） - `daily-1`、`daily-2`：常规室内环境音（韩文标注：일상，即日常场景） - 特征提取流程 1. 音频采集： - 音频样本取自AI Hub等数据集，以确保覆盖多样化场景。 - 涵盖紧急与非紧急声音，用于训练模型实现精准分类。 2. MFCC特征提取： - 对原始音频信号进行处理，提取梅尔频率倒谱系数（Mel-Frequency Cepstral Coefficients，MFCC）。 - MFCC特征可有效捕捉音频的频率特性，适用于声音分类任务。 ![MFCC输出结果](./pics/mfcc-output.png) 3. 输出格式： - 提取得到的MFCC特征以`13 × n`的numpy数组格式保存，各维度含义如下： - `13`：表示MFCC系数（特征）的数量。 - `n`：对应音频片段的帧数。 4. 数据集存储： - 处理后的`13 × n`维度MFCC数组以`.npy`格式存储，可直接作为模型的输入数据。 - `SilverAssistant`项目适配方案：[HuggingFace SilverAudio模型](https://huggingface.co/SilverAvocado/Silver-Audio) ## 数据来源 - 数据来源：[AI Hub 紧急情况语音/音频数据集](https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=&topMenu=&aihubDataSe=data&dataSetSn=170)

提供机构：

b111111

5,000+

优质数据集

54 个

任务类型

进入经典数据集