RAWDysPeech: A Preprocessed Raw Audio Dataset For Speech Dysarthria

Name: RAWDysPeech: A Preprocessed Raw Audio Dataset For Speech Dysarthria
Creator: Mendeley Data
License: 暂无描述

doi.org2025-01-21 收录

下载链接：

http://doi.org/10.17632/3mhnr7frht.1

下载链接

链接失效反馈

官方服务：

资源简介：

RAWDysPeech: A Preprocessed Raw Audio Dataset For Speech Dysarthria is a Speech Dysarthria Dataset for the applicaton of Audio Classification, Speech Detection and similar avenues of research in ASR. RAWDysPeech consists of raw audio files segregated into two classes: 1 and 0, where 1 is for speech involving Dysarthria and 0 is for normal speech. We combine and preprocess some of the most popular speech datasets available open sourced. TORGO, UASPEECH, Ultrax, EasyCall are a few to be named. Here's a brief description of the steps taken to preprocess and combine and we also encourage you to cite the original authors if this dataset helps in your research. -------------------------------------------------------- This dataset provides preprocessed speech recordings from the UASPEECH database, specifically enhanced for machine learning applications using advanced noise reduction and signal processing techniques. Dataset Description The dataset contains audio recordings that have been processed using: I. FFT-based noise reduction: Hanning window application for better frequency analysis 16-bit audio depth processing 44.1 kHz sampling rate[1] Stereo channel support with dual MEMS microphone configuration[2] II. Preprocessing Steps Signal Processing Background noise subtraction using ambient noise sampling Frequency spectrum analysis with FFT Amplitude scaling and normalization Single-sided FFT amplitude doubling for accurate frequency representation[1] III. Audio Parameters Bit Depth: 16-bit (pyaudio.paInt16) Sample Rate: 44.1 kHz Buffer Size: 44100 frames Channel Configuration: Supports both mono and stereo recording[2] IV. File Format Audio files are saved in .WAV format Timestamps are included in filenames (YYYY_MM_DD_HH_MM_SS_pyaudio) Data is organized in dedicated data folders with automated directory creation[1] V. Applications Speech Recognition Systems Dysarthric Speech Analysis Audio Classification Tasks Speech Pattern Recognition Acoustic Model Training Technical Implementation The preprocessing pipeline includes real-time audio capture, noise profiling, FFT analysis, and spectrogram generation, making it suitable for both research and practical applications --------------------------------------------------- Citations: [1] Heejin Kim, Mark Hasegawa Johnson, Jonathan Gunderson, Adrienne Perlman, Thomas Huang, Kenneth Watkin, Simone Frame, Harsh Vardhan Sharma, Xi Zhou, March 17, 2023, "UASpeech", IEEE Dataport, doi: https://dx.doi.org/10.21227/f9tc-ab45. [2] Rudzicz, F., Namasivayam, A.K., Wolff, T. (2012) The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation, 46(4), pages 523--541. [3] Shah, Arya; Qureshi, Aymen; Polprasert, Chantri (2024), “ADAPTIVE: A Novel Dataset For Acoustic DysArthria deTection through temPoral Inference and Voice Engineering”, Mendeley Data, V1, doi: 10.17632/j5bgddf6rp.1

RAWDysPeech：一项针对语音构音障碍的预处理原始音频数据集，适用于音频分类、语音检测及其在自动语音识别（ASR）领域的研究。RAWDysPeech 包含被划分为两类（1和0）的原始音频文件，其中1代表包含构音障碍的语音，0代表正常语音。我们结合并预处理了若干最流行的开源语音数据集，包括 TORGO、UASPEECH、Ultrax、EasyCall 等数据集。以下是预处理和合并的简要步骤，同时我们也鼓励您在研究中引用原始作者。 -------------------------------------------------------- 本数据集提供了经过预处理的语音录音，来自 UASPEECH 数据库，并特别针对机器学习应用进行了优化，采用了先进的噪声降低和信号处理技术。数据集描述数据集包含使用以下方法处理的音频录音： I. 基于快速傅里叶变换（FFT）的噪声降低：汉宁窗应用以优化频率分析 16位音频深度处理 44.1 kHz 采样率[1] 支持双 MEMS 麦克风配置的立体声通道[2] II. 预处理步骤信号处理使用环境噪声采样进行背景噪声消除基于 FFT 的频率谱分析幅度缩放和归一化单边 FFT 幅度加倍以准确表示频率[1] III. 音频参数位深：16位（pyaudio.paInt16）采样率：44.1 kHz 缓冲区大小：44100 帧通道配置：支持单声道和立体声录音[2] IV. 文件格式音频文件以 .WAV 格式保存文件名中包含时间戳（YYYY-MM-DD_HH-MM-SS_pyaudio）数据组织在专用的数据文件夹中，并自动创建目录[1] V. 应用语音识别系统构音障碍语音分析音频分类任务语音模式识别声学模型训练技术实现预处理流程包括实时音频捕获、噪声分析、FFT 分析和声谱图生成，适用于研究和实际应用。 --------------------------------------------------- 参考文献： [1] Heejin Kim, Mark Hasegawa Johnson, Jonathan Gunderson, Adrienne Perlman, Thomas Huang, Kenneth Watkin, Simone Frame, Harsh Vardhan Sharma, Xi Zhou, 2023年3月17日, "UASpeech", IEEE Dataport, doi: https://dx.doi.org/10.21227/f9tc-ab45. [2] Rudzicz, F., Namasivayam, A.K., Wolff, T. (2012) The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation, 46(4), 页码 523--541. [3] Shah, Arya; Qureshi, Aymen; Polprasert, Chantri (2024), “ADAPTIVE: A Novel Dataset For Acoustic DysArthria deTection through temPoral Inference and Voice Engineering”, Mendeley Data, V1, doi: 10.17632/j5bgddf6rp.1

提供机构：

Mendeley Data

5,000+

优质数据集

54 个

任务类型

进入经典数据集