five

common-pile/youtube_filtered

收藏
Hugging Face2025-06-06 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/common-pile/youtube_filtered
下载链接
链接失效反馈
官方服务:
资源简介:
Creative Commons YouTube数据集是一个大规模的视频分享平台数据集,包含了超过2,000个持续发布原创开放许可内容的YouTube频道。这些内容覆盖了演讲、教程、评论、视频杂文、演讲和视频博客等多种类型。数据集由超过1.1百万个开放许可视频组成,总时长超过470,000小时,每个视频都已被转录为文本。该数据集适用于需要高质量语音基础文本内容的研究和任务。

The Creative Commons YouTube dataset is a large-scale video-sharing platform dataset consisting of over 2,000 YouTube channels that consistently release original openly licensed content. The content spans various genres, including lectures, tutorials, reviews, video essays, speeches, and vlogs. The dataset comprises over 1.1 million openly licensed videos totaling more than 470,000 hours of content, with each video transcribed into text using the Whisper speech recognition model. This dataset is suitable for research and tasks that require high-quality speech-based textual content.
提供机构:
common-pile
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作