"COMBINED DATASET"

Name: "COMBINED DATASET"
Creator: IEEE DataPort
Published: 2026-01-06 19:38:56
License: 暂无描述

DataCite Commons2026-01-06 更新2026-05-03 收录

下载链接：

https://ieee-dataport.org/documents/combined-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

"The dataset used in this study comprises 82,946 labeled news statements collected from multiple publicly available fact-checking and news verification sources focusing on Indian and Indic language content. Each instance is annotated for binary classification, with labels Real (0) and Fake (1). The dataset exhibits a natural class imbalance, consisting of 53,713 Fake samples (64.76%) and 29,233 Real samples (35.24%).The corpus spans a diverse set of languages, including English, Hindi, Tamil, Gujarati, Malayalam, Punjabi, Bengali, Telugu, Marathi, Nepali, and other low-resource languages. It also contains romanized and code-mixed text, reflecting realistic social media usage patterns in multilingual Indian settings. Language identifiers were retained to support language-wise evaluation.Data from different sources were merged into a unified format, retaining only semantically meaningful fields: news text, label, and language. The dataset\u2019s scale, linguistic diversity, and presence of code-mixing make it suitable for evaluating multilingual transformer models for Indic fake news detection."

提供机构：

IEEE DataPort

创建时间：

2026-01-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集