YouTube-Commons
收藏huggingface.co2025-01-08 收录
下载链接:
https://huggingface.co/datasets/PleIAs/YouTube-Commons
下载链接
链接失效反馈官方服务:
资源简介:
📺 YouTube-Commons 📺
YouTube-Commons is a collection of audio transcripts of 2,063,066 videos shared on YouTube under a CC-By license.
Content
The collection comprises 22,709,724 original and automatically translated transcripts from 3,156,703 videos (721,136 individual channels).
In total, this represents nearly 45 billion words (44,811,518,375).
All the videos where shared on YouTube with a CC-BY license: the dataset provide all the necessary provenance… See the full description on the dataset page: https://huggingface.co/datasets/PleIAs/YouTube-Commons.
📺 YouTube-Commons 📺
YouTube-Commons 数据集汇聚了在 YouTube 上以 CC-By 许可协议分享的 2,063,066 个视频的音频字幕。
内容
本集合包含来自 3,156,703 个视频(721,136 个独立频道)的 22,709,724 条原始及自动翻译的字幕,总计近 450 亿词(44,811,518,375 词)。所有共享于 YouTube 且带有 CC-BY 许可协议的视频均包含在本数据集中,数据集提供了所有必要的来源信息……请参阅数据集页面上的完整描述:https://huggingface.co/datasets/PleIAs/YouTube-Commons。
提供机构:
huggingface.co



