hridaydutta123/YT-100K
收藏Hugging Face2025-11-29 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/hridaydutta123/YT-100K
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含两个大规模多语言评论数据集,YT-30M和YT-100K,源自YouTube。YT-30M包含3200万条评论,YT-100K是从YT-30M中随机选取的10万条样本。每条评论关联了视频ID、评论ID、评论者名称、评论者频道ID、评论内容、点赞数、原始频道ID和视频类别(如新闻与政治、科学与技术等)。数据经过匿名化处理,去除了所有个人身份信息。
This work introduces two large-scale multilingual comment datasets, YT-30M (and YT-100K) from YouTube. YT-30M contains 32M comments, while YT-100K is a randomly selected 100K sample from YT-30M. Each comment is associated with videoID, commentID, commenterName, commenterChannelID, comment text, votes, originalChannelID, and category of the YouTube channel. The dataset is anonymized by removing all Personally Identifiable Information (PII). It supports multiple languages including English, Russian, Hindi, Chinese, Bengali, Spanish, Portuguese, Malayalam, Telegu, and Japanese.
提供机构:
hridaydutta123



