five

Podcast annotation dataset for paper "Identifying Introductions in Podcast Episodes from Automatically Generated Transcripts "

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5762441
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset for paper "Identifying Introductions in Podcast Episodes from Automatically Generated Transcripts". Please refer to the paper for details. Compared to the dataset used in the paper, 20 out of the 445 episodes have been removed due to copyright issues.  Each data file contains the following fields: - "episode_intro_start": the time stamp for episode introduction start (in milliseconds) - "episode_intro_end": the time stamp for episode introduction end (in milliseconds) - "program_intro_start": the time stamp for program introduction start (in milliseconds) - "program_intro_end": the time stamp for program introduction end (in milliseconds) - "program_name": name of the podcast program - "episode_name": name of the podcast episode - "transcription": JSON string containing the transcription, including the timestamps. - "annotator": anonymized annotator ID.
创建时间:
2021-12-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作