Bangla Clickbait Corpus

Name: Bangla Clickbait Corpus
Creator: BRAC University
Published: 2023-11-11 01:38:46
License: 暂无描述

arXiv2023-11-11 更新2024-06-21 收录

下载链接：

https://github.com/mdmotaharmahtab/BanglaBait

下载链接

链接失效反馈

官方服务：

资源简介：

Bangla Clickbait Corpus是首个针对孟加拉语点击诱饵检测的数据集，由BRAC大学创建，包含15,056个标记的新闻文章和65,406个未标记的新闻文章。数据集从点击诱饵密集的新闻网站提取，每篇文章由三位专家语言学家标记，包括文章标题、内容及其他元数据。该数据集旨在为未来研究提供基础，解决孟加拉语文章中的点击诱饵问题，并推动半监督生成对抗网络（SS-GANs）等技术在文本分类任务中的应用。

Bangla Clickbait Corpus is the first dataset for Bengali clickbait detection, created by BRAC University. It contains 15,056 labeled news articles and 65,406 unlabeled news articles. The dataset is extracted from clickbait-heavy news websites, and each article is annotated by three expert linguists, covering article titles, content and other metadata. This dataset aims to provide a foundation for future research, address the clickbait problem in Bengali news articles, and facilitate the application of technologies such as Semi-Supervised Generative Adversarial Networks (SS-GANs) in text classification tasks.

提供机构：

BRAC University

创建时间：

2023-11-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集