数千种有关常见医学症状语音数据集，超过8.5小时录音

Name: 数千种有关常见医学症状语音数据集，超过8.5小时录音
Creator: 帕依提提
License: 暂无描述

帕依提提2024-03-04 收录

下载链接：

https://www.payititi.com/opendatasets/show-26335.html

下载链接

链接失效反馈

官方服务：

资源简介：

该数据包含数千种有关常见医学症状（如“膝盖疼痛”或“头痛”）的语音，总计超过 8 个小时。每种语音都是由个体人类贡献者基于给定的症状所创建。这些音频片段可用于训练医疗领域的会话代理。该Appen数据集通过多任务工作流而构建。首先是贡献者编写文本短语来描述给定的症状。举例来说，对于“头痛”，贡献者可以写成“我需要治疗我的偏头痛”。后续任务会捕获已被接受的文本字符串的语音。该数据集包含音频语音和相应的转录文本。此输入数据由症状提示组成。人工采标者基于这些提示创建他们的文本短语，然后在该工作流中随后的环节中所使用他们来采集语音。上方的“数据 (Data)”选项卡包含有更多信息，以及最终根据这些提示制作的录音数据。

This dataset contains thousands of speech recordings related to common medical symptoms (e.g., "knee pain" or "headache"), totaling over 8 hours in duration. Each speech recording was created by individual human contributors based on the given symptom. These audio clips can be used to train conversational agents in the medical domain. This Appen dataset was constructed through a multi-task workflow. First, contributors write textual phrases to describe the given symptom. For example, for "headache", a contributor might write "I need treatment for my migraine". Subsequent tasks then capture the speech corresponding to the accepted textual strings. This dataset includes both audio speech recordings and their corresponding transcribed text. This input data consists of symptom prompts. Human annotators create their own textual phrases based on these prompts, which are then used in subsequent stages of this workflow to collect speech recordings. The "Data" tab above contains additional information, as well as the final recorded audio data generated based on these prompts.

提供机构：

帕依提提

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是一个包含数千种常见医学症状语音记录的集合，总时长超过8.5小时，附带转录文本，适用于医疗会话代理的训练。

以上内容由遇见数据集搜集并总结生成