five

AlexanderHolmes0/true-fake-news

收藏
Hugging Face2024-04-12 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/AlexanderHolmes0/true-fake-news
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: mit size_categories: - 10K<n<100K task_categories: - text-classification - question-answering - text-generation dataset_info: features: - name: label dtype: class_label: names: '0': 'true' '1': fake - name: text dtype: string splits: - name: train num_bytes: 82978144 num_examples: 33672 - name: test num_bytes: 28512596 num_examples: 11224 download_size: 67949019 dataset_size: 111490740 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* tags: - news --- # True-Fake-News <!-- Provide a quick summary of the dataset. --> These are collected news articles from various sources with curated labels aligning to `true` of `fake` classification. ### Dataset Description <!-- Provide a longer summary of what this dataset is. --> The dataset contains two types of articles fake and real News. This dataset was collected from realworld sources; the truthful articles were obtained by crawling articles from Reuters.com (News website). As for the fake news articles, they were collected from different sources. The fake news articles were collected from unreliable websites that were flagged by Politifact (a fact-checking organization in the USA) and Wikipedia. The dataset contains different types of articles on different topics, however, the majority of articles focus on political and World news topics. ### Dataset Sources [optional] <!-- Provide the basic links for the dataset. --> - **Repository:** [Kaggle Repo](https://www.kaggle.com/datasets/emineyetm/fake-news-detection-datasets/data) ## Uses <!-- Address questions around how the dataset is intended to be used. --> Text classification or question answering would be ways to use this dataset. ## Dataset Structure | Classification | Total Number of Articles | Article Type | Article Count | |----------------|--------------------------|--------------|---------------| | Real-News | 21,417 | World | 10,145 | | | | Political | 11,272 | | Fake-News | 23,481 | Government | 1,570 | | | | Middle East | 778 | | | | US | 783 | | | | Left-Leaning | 4,459 | | | | Political | 6,841 | | | | General | 9,050 |
提供机构:
AlexanderHolmes0
原始信息汇总

数据集概述

基本信息

  • 语言: 英语
  • 许可: MIT
  • 大小类别: 10K<n<100K
  • 任务类别:
    • 文本分类
    • 问答
    • 文本生成

数据集详情

  • 特征:
    • label: 分类标签,包含两个类别:truefake
    • text: 文本内容,数据类型为字符串
  • 分割:
    • train: 训练集,包含33672个样本,大小为82978144字节
    • test: 测试集,包含11224个样本,大小为28512596字节
  • 下载大小: 67949019字节
  • 数据集大小: 111490740字节

配置

  • 默认配置:
    • 训练集路径: data/train-*
    • 测试集路径: data/test-*

标签

  • 新闻

数据集描述

该数据集包含来自不同来源的新闻文章,标签分为truefake两类。真实新闻文章来自Reuters.com,而假新闻文章来自被Politifact和Wikipedia标记为不可靠的网站。数据集主要涵盖政治和世界新闻主题。

使用场景

该数据集适用于文本分类和问答任务。

数据集结构

分类 文章总数 文章类型 文章数量
真实新闻 21,417 世界 10,145
政治 11,272
假新闻 23,481 政府 1,570
中东 778
美国 783
左倾 4,459
政治 6,841
一般 9,050
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作