five

A Curated Bengali News Dataset for Fake News Detection Across Sports and Politics Domains

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/6syvfm736v
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains a curated Bengali fake news detection dataset comprising 10,205 full-text news articles collected from Bangladeshi online sources and annotated with binary labels (Real and Fake) across two major domains: Politics and Sports. The dataset is designed to support research in low-resource Natural Language Processing (NLP), misinformation detection, and cross-domain text classification. The dataset is provided in a single clean CSV file with three columns: - Category: Domain of the article (Politics or Sports) - Label: Authenticity label (Real or Fake) - News_Article: Full Bengali Unicode news text The corpus includes: - 6,165 Real articles (60.4%) - 4,040 Fake articles (39.6%) - 5,962 Sports articles (58.4%) - 4,243 Politics articles (41.6%) All articles were collected through web scraping using BeautifulSoup and Selenium from reputable Bengali news portals for real news and from unreliable or satirical sources and public Facebook pages for fake news. Labels were assigned through source-based verification and cross-checking with fact-checking platforms. Text length statistics show a strong linguistic contrast between real and fake news: - Average length of real articles: 2,027 characters - Average length of fake articles: 920 characters - Total corpus size: around 16.2 million characters This dataset is particularly valuable for: - Binary fake news classification - Cross-domain learning (Politics ↔ Sports) - Low-resource language NLP research - Transformer model evaluation - Linguistic analysis of misinformation The dataset focuses on Bangladesh-centric Bengali news content and does not include personal user data or private information. All content was collected from publicly accessible sources in compliance with platform redistribution policies.
创建时间:
2026-03-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作