BaitBuster-Bangla: A Comprehensive Dataset for Clickbait Detection in Bangla with Multi-Feature and Multi-Modal Analysis
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/3c6ztw5nft
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is a multi-feature and multi-modal dataset for Bangla clickbait detection in video sharing platforms. The dataset is collected from YouTube using its official public API with the objective of classifying clickbait content in the Bangla language. The dataset consists of 253,070 entries with 18 columns covering a curated list of 28 Not Clickbait, and 26 Clickbait Bangla youtube channels. The dataset provides valuable information for studying clickbait content and includes various metadata related to the videos, user engagement statistics, and labels. The dataset has been labeled in three different strategies: i) pre-defined auto labels, ii) labels by human annotator, and iii) labels by fine-tuned AI model. However, human labels are are available for 10000 entries. The dataset is available in three different formats: xlsx, csv, and parquet.
本数据集为面向视频分享平台的孟加拉语点击诱饵(clickbait)检测任务打造的多特征多模态数据集。本数据集通过YouTube官方公开API从该平台采集,旨在对孟加拉语内容中的点击诱饵进行分类识别。本数据集共计253070条数据,包含18个字段,其数据源自精选的28个非点击诱饵类与26个点击诱饵类孟加拉语YouTube频道。本数据集为点击诱饵内容研究提供了极具价值的信息,涵盖与视频相关的各类元数据、用户互动统计数据以及标注标签。本数据集采用三种不同策略进行标注:一)预定义自动标注;二)人工标注员标注;三)微调AI模型标注。不过仅10000条数据配有人类标注标签。本数据集提供xlsx、csv以及parquet三种格式的版本可供获取。
创建时间:
2024-03-01



