five

Transcribing audio data: raw transcripts from several transcription tools

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/6469895
下载链接
链接失效反馈
官方服务:
资源简介:
This record contains a Dutch audio fragment and the raw transcripts from several automatic audio transcription tools. The audio fragment was recorded specifically for the purpose of testing these tools and was run through: Amberscript HappyScribe Kaldi NVIVO transcription Sonix SpokenOnline Transcribe Trint Microsoft Word 365 Online The raw transcripts were downloaded as .docx or .txt files and the .docx files saved as .odt. No edits to the transcripts were made before saving them, except an incidental removal of a personal email address or hyperlink. For comparison purposes, the audio file itself ("Test_interview_20220203.mp3") and a clean transcript ("Test_interview_cleaned_transcript.odt") are also made available. At the time you are downloading these files, the quality of the (Dutch) speech-to-text conversion may have been improved by the respective supplier. For maximum clarity, each filename therefore includes the date on which the transcription was run (YYYY-MM-DD format).
创建时间:
2022-10-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作