AlexanderHolmes0/true-fake-news
收藏Hugging Face2024-04-12 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/AlexanderHolmes0/true-fake-news
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: mit
size_categories:
- 10K<n<100K
task_categories:
- text-classification
- question-answering
- text-generation
dataset_info:
features:
- name: label
dtype:
class_label:
names:
'0': 'true'
'1': fake
- name: text
dtype: string
splits:
- name: train
num_bytes: 82978144
num_examples: 33672
- name: test
num_bytes: 28512596
num_examples: 11224
download_size: 67949019
dataset_size: 111490740
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
tags:
- news
---
# True-Fake-News
<!-- Provide a quick summary of the dataset. -->
These are collected news articles from various sources with curated labels aligning to `true` of `fake` classification.
### Dataset Description
<!-- Provide a longer summary of what this dataset is. -->
The dataset contains two types of articles fake and real News. This dataset was collected from realworld sources; the truthful articles were obtained by crawling articles from Reuters.com (News website). As for the fake news articles, they were collected from different sources. The fake news articles were collected from unreliable websites that were flagged by Politifact (a fact-checking organization in the USA) and Wikipedia. The dataset contains different types of articles on different topics, however, the majority of articles focus on political and World news topics.
### Dataset Sources [optional]
<!-- Provide the basic links for the dataset. -->
- **Repository:** [Kaggle Repo](https://www.kaggle.com/datasets/emineyetm/fake-news-detection-datasets/data)
## Uses
<!-- Address questions around how the dataset is intended to be used. -->
Text classification or question answering would be ways to use this dataset.
## Dataset Structure
| Classification | Total Number of Articles | Article Type | Article Count |
|----------------|--------------------------|--------------|---------------|
| Real-News | 21,417 | World | 10,145 |
| | | Political | 11,272 |
| Fake-News | 23,481 | Government | 1,570 |
| | | Middle East | 778 |
| | | US | 783 |
| | | Left-Leaning | 4,459 |
| | | Political | 6,841 |
| | | General | 9,050 |
提供机构:
AlexanderHolmes0
原始信息汇总
数据集概述
基本信息
- 语言: 英语
- 许可: MIT
- 大小类别: 10K<n<100K
- 任务类别:
- 文本分类
- 问答
- 文本生成
数据集详情
- 特征:
label: 分类标签,包含两个类别:true和faketext: 文本内容,数据类型为字符串
- 分割:
train: 训练集,包含33672个样本,大小为82978144字节test: 测试集,包含11224个样本,大小为28512596字节
- 下载大小: 67949019字节
- 数据集大小: 111490740字节
配置
- 默认配置:
- 训练集路径:
data/train-* - 测试集路径:
data/test-*
- 训练集路径:
标签
- 新闻
数据集描述
该数据集包含来自不同来源的新闻文章,标签分为true和fake两类。真实新闻文章来自Reuters.com,而假新闻文章来自被Politifact和Wikipedia标记为不可靠的网站。数据集主要涵盖政治和世界新闻主题。
使用场景
该数据集适用于文本分类和问答任务。
数据集结构
| 分类 | 文章总数 | 文章类型 | 文章数量 |
|---|---|---|---|
| 真实新闻 | 21,417 | 世界 | 10,145 |
| 政治 | 11,272 | ||
| 假新闻 | 23,481 | 政府 | 1,570 |
| 中东 | 778 | ||
| 美国 | 783 | ||
| 左倾 | 4,459 | ||
| 政治 | 6,841 | ||
| 一般 | 9,050 |



