Dataset for Automated Medical Transcription

NIAID Data Ecosystem2026-03-14 收录

下载链接：

https://zenodo.org/record/4279040

下载链接

链接失效反馈

官方服务：

资源简介：

We generated this dataset to train a machine learning model for automatically generating psychiatric case notes from doctor-patient conversations. Since, we didn't have access to real doctor-patient conversations, we used transcripts from two different sources to generate audio recordings of enacted conversations between a doctor and a patient. We employed eight students who worked in pairs to generate these recordings. Six of the transcripts that we used to produce this recordings were hand-written by Cheryl Bristow and rest of the transcripts were adapted from Alexander Street which were generated from real doctor-patient conversations. Our study requires recording the doctor and the patient(s) in seperate channels which is the primary reason behind generating our own audio recordings of the conversations. We used Google Cloud Speech-To-Text API to transcribe the enacted recordings. These newly generated transcripts are auto-generated entirely using AI powered automatic speech recognition whereas the source transcripts are either hand-written or fine-tuned by human transcribers (transcripts from Alexander Street). We provided the generated transcripts back to the students and asked them to write case notes. The students worked independently using a software that we developed earlier for this purpose. The students had past experience of writing case notes and we let the students write case notes as they practiced without any training or instructions from us. NOTE: Audio recordings are not included in Zenodo due to large file size but they are available in the GitHub repository.

创建时间：

2023-01-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集