noisy-alpaca-test/MUSAN-noise

Name: noisy-alpaca-test/MUSAN-noise
Creator: noisy-alpaca-test
Published: 2024-05-18 08:52:26
License: 暂无描述

Hugging Face2024-05-18 更新2024-06-26 收录

下载链接：

https://hf-mirror.com/datasets/noisy-alpaca-test/MUSAN-noise

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: speech_input dtype: string - name: clean_audio dtype: audio: sampling_rate: 16000 - name: noisy_10dB dtype: audio: sampling_rate: 16000 - name: noisy_5dB dtype: audio: sampling_rate: 16000 - name: noisy_0dB dtype: audio: sampling_rate: 16000 - name: noisy_-5dB dtype: audio: sampling_rate: 16000 - name: noisy_-10dB dtype: audio: sampling_rate: 16000 - name: noisy_-20dB dtype: audio - name: noisy_10dB_transcription_whisper-small.en dtype: string - name: noisy_5dB_transcription_whisper-small.en dtype: string - name: noisy_0dB_transcription_whisper-small.en dtype: string - name: noisy_-5dB_transcription_whisper-small.en dtype: string - name: noisy_-10dB_transcription_whisper-small.en dtype: string - name: noisy_10dB_transcription_whisper-medium.en dtype: string - name: noisy_5dB_transcription_whisper-medium.en dtype: string - name: noisy_0dB_transcription_whisper-medium.en dtype: string - name: noisy_-5dB_transcription_whisper-medium.en dtype: string - name: noisy_-10dB_transcription_whisper-medium.en dtype: string - name: noisy_10dB_transcription_whisper-large-v3 dtype: string - name: noisy_5dB_transcription_whisper-large-v3 dtype: string - name: noisy_0dB_transcription_whisper-large-v3 dtype: string - name: noisy_-5dB_transcription_whisper-large-v3 dtype: string - name: noisy_-10dB_transcription_whisper-large-v3 dtype: string - name: output dtype: string - name: clean_audio_transcription_whisper-small.en dtype: string - name: clean_audio_transcription_whisper-medium.en dtype: string - name: clean_audio_transcription_whisper-large-v3 dtype: string splits: - name: test num_bytes: 6807301326.1 num_examples: 5135 download_size: 6722106139 dataset_size: 6807301326.1 configs: - config_name: default data_files: - split: test path: data/test-* --- # Dataset Card for "noise" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

dataset_info: 特征字段: - 名称：speech_input 数据类型：字符串 - 名称：clean_audio 数据类型：音频采样率：16000Hz - 名称：noisy_10dB 数据类型：音频采样率：16000Hz - 名称：noisy_5dB 数据类型：音频采样率：16000Hz - 名称：noisy_0dB 数据类型：音频采样率：16000Hz - 名称：noisy_-5dB 数据类型：音频采样率：16000Hz - 名称：noisy_-10dB 数据类型：音频采样率：16000Hz - 名称：noisy_-20dB 数据类型：音频 - 名称：noisy_10dB_transcription_whisper-small.en 数据类型：字符串 - 名称：noisy_5dB_transcription_whisper-small.en 数据类型：字符串 - 名称：noisy_0dB_transcription_whisper-small.en 数据类型：字符串 - 名称：noisy_-5dB_transcription_whisper-small.en 数据类型：字符串 - 名称：noisy_-10dB_transcription_whisper-small.en 数据类型：字符串 - 名称：noisy_10dB_transcription_whisper-medium.en 数据类型：字符串 - 名称：noisy_5dB_transcription_whisper-medium.en 数据类型：字符串 - 名称：noisy_0dB_transcription_whisper-medium.en 数据类型：字符串 - 名称：noisy_-5dB_transcription_whisper-medium.en 数据类型：字符串 - 名称：noisy_-10dB_transcription_whisper-medium.en 数据类型：字符串 - 名称：noisy_10dB_transcription_whisper-large-v3 数据类型：字符串 - 名称：noisy_5dB_transcription_whisper-large-v3 数据类型：字符串 - 名称：noisy_0dB_transcription_whisper-large-v3 数据类型：字符串 - 名称：noisy_-5dB_transcription_whisper-large-v3 数据类型：字符串 - 名称：noisy_-10dB_transcription_whisper-large-v3 数据类型：字符串 - 名称：output 数据类型：字符串 - 名称：clean_audio_transcription_whisper-small.en 数据类型：字符串 - 名称：clean_audio_transcription_whisper-medium.en 数据类型：字符串 - 名称：clean_audio_transcription_whisper-large-v3 数据类型：字符串数据集划分: - 名称：test 字节数：6807301326.1 样本数量：5135 download_size: 6722106139 dataset_size: 6807301326.1 数据集配置: - 配置名称：default 数据文件: - 划分：test 路径：data/test-* --- # 「噪声」数据集卡片 [需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

noisy-alpaca-test

原始信息汇总

数据集概述

特征信息

speech_input: 类型为字符串。
clean_audio: 音频类型，采样率为16000 Hz。
noisy_10dB: 音频类型，采样率为16000 Hz。
noisy_5dB: 音频类型，采样率为16000 Hz。
noisy_0dB: 音频类型，采样率为16000 Hz。
noisy_-5dB: 音频类型，采样率为16000 Hz。
noisy_-10dB: 音频类型，采样率为16000 Hz。
noisy_-20dB: 音频类型。
noisy_10dB_transcription_whisper-small.en: 类型为字符串。
noisy_5dB_transcription_whisper-small.en: 类型为字符串。
noisy_0dB_transcription_whisper-small.en: 类型为字符串。
noisy_-5dB_transcription_whisper-small.en: 类型为字符串。
noisy_-10dB_transcription_whisper-small.en: 类型为字符串。
noisy_10dB_transcription_whisper-medium.en: 类型为字符串。
noisy_5dB_transcription_whisper-medium.en: 类型为字符串。
noisy_0dB_transcription_whisper-medium.en: 类型为字符串。
noisy_-5dB_transcription_whisper-medium.en: 类型为字符串。
noisy_-10dB_transcription_whisper-medium.en: 类型为字符串。
noisy_10dB_transcription_whisper-large-v3: 类型为字符串。
noisy_5dB_transcription_whisper-large-v3: 类型为字符串。
noisy_0dB_transcription_whisper-large-v3: 类型为字符串。
noisy_-5dB_transcription_whisper-large-v3: 类型为字符串。
noisy_-10dB_transcription_whisper-large-v3: 类型为字符串。
output: 类型为字符串。
clean_audio_transcription_whisper-small.en: 类型为字符串。
clean_audio_transcription_whisper-medium.en: 类型为字符串。
clean_audio_transcription_whisper-large-v3: 类型为字符串。

数据分割

test: 包含5135个样本，数据大小为6807301326.1字节。

数据集大小

下载大小: 6722106139字节。
数据集大小: 6807301326.1字节。

配置信息

config_name: default
- data_files:
  - split: test
    - path: data/test-*

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是一个包含多种噪声条件下音频转录文本的数据集，主要用于研究噪声对语音识别的影响。数据集包含5,135行数据，总大小为6.72 GB，格式为parquet，涵盖了从干净音频到不同噪声水平（如10dB、5dB、0dB、-5dB、-10dB、-20dB）下的音频转录结果。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集