Hani89/medical_asr_recording_dataset

Name: Hani89/medical_asr_recording_dataset
Creator: Hani89
Published: 2023-10-10 05:41:22
License: 暂无描述

Hugging Face2023-10-10 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Hani89/medical_asr_recording_dataset

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含数千个常见医疗症状的音频片段，如“膝盖疼痛”或“头痛”，总时长超过8小时。每个音频片段由个人贡献者根据给定症状创建。这些音频片段可用于训练医疗领域的对话代理。数据集包含音频片段和相应的文本转录。音频文件以1D数组形式加载，采样率为16K。数据集结构包括音频路径、波形数组、采样率和文本转录。

This dataset contains thousands of audio clips of common medical symptoms, such as "knee pain" or "headache", with a total duration of over 8 hours. Each audio clip is created by individual contributors based on the given symptom. These audio clips can be used to train dialogue agents in the medical field. The dataset includes audio clips and their corresponding text transcriptions. Audio files are loaded as 1-dimensional arrays with a sampling rate of 16 kHz. The dataset structure includes audio paths, waveform arrays, sampling rates, and text transcriptions.

提供机构：

Hani89

原始信息汇总

数据集概述

配置

默认配置：
- 训练集：路径为 data/train-*
- 测试集：路径为 data/test-*

数据特征

音频：
- 数组：序列类型，数据类型为 float32
- 路径：数据类型为 string
- 采样率：数据类型为 int64
句子：数据类型为 string

数据分割

训练集：
- 字节数：3128740048
- 样本数：5328
测试集：
- 字节数：776455056
- 样本数：1333

数据大小

下载大小：3882364624 字节
数据集大小：3905195104 字节

许可证

许可证：Apache 2.0

任务类别

自动语音识别

语言

英语

数据集大小类别

1K<n<10K

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集