Dataset for: "Disturbed YouTube for Kids: Characterizing and Detecting Inappropriate Videos Targeting Young Children"
收藏Mendeley Data2024-03-27 更新2024-06-28 收录
下载链接:
https://zenodo.org/record/3632781
下载链接
链接失效反馈官方服务:
资源简介:
Dataset for paper: Disturbed YouTube for Kids: Characterizing and Detecting Inappropriate Videos Targeting Young Children The dataset consists of five files: 1. groundtruth_videos.json: This is the ground truth dataset. We have 4797 manually annotated videos (1513 suitable, 929 disturbing, 419 restricted, and 1936 irrelevant). You can distinguish among the different labels by observing the 'classification_label' field. 2. elsagate_related_videos.json: Contains the data for 233K elsagate-related YouTube videos (1K seed and 232K recommended) that were obtained as described in the paper. 3. other_child_related_videos.json: Contains the data for 155K other child-related YouTube videos (2K seed and 153K recommended) that were obtained as described in the paper. 4. random_videos.json: Contains the data for 482K random YouTube videos (8K seed and 474K recommended) that were obtained as described in the paper. 5. popular_videos.json: Contains the data for 11K popular YouTube videos (500 seed and 10.5K recommended) that were obtained between November 18 and November 21, 2018, as described in the paper. For each video in all sets, you can check the predicted label of our classifier by observing the 'prediction' field.
论文数据集:《受干扰的儿童版YouTube:针对低龄儿童的不当视频的特征刻画与检测》。本数据集包含5个文件:1. groundtruth_videos.json:该文件为基准真值数据集,包含4797条人工标注视频,其中适宜类1513条、不良类929条、受限类419条、无关类1936条,可通过'classification_label'字段区分不同类别标签。2. elsagate_related_videos.json:包含23.3万条与Elsagate相关的YouTube视频数据(含1000条种子视频与23.2万条推荐视频),数据采集方式如论文所述。3. other_child_related_videos.json:包含15.5万条其他儿童相关YouTube视频数据(含2000条种子视频与15.3万条推荐视频),数据采集方式如论文所述。4. random_videos.json:包含48.2万条随机抽取的YouTube视频数据(含8000条种子视频与47.4万条推荐视频),数据采集方式如论文所述。5. popular_videos.json:包含1.1万条热门YouTube视频数据(含500条种子视频与1.05万条推荐视频),数据采集时间为2018年11月18日至11月21日,采集方式如论文所述。针对所有数据集中的每条视频,均可通过'prediction'字段获取本研究所提分类器的预测标签。
创建时间:
2023-06-28



