aihpi/20_newsgroups_demo
收藏Hugging Face2024-07-11 更新2024-07-13 收录
下载链接:
https://hf-mirror.com/datasets/aihpi/20_newsgroups_demo
下载链接
链接失效反馈官方服务:
资源简介:
20newsgroups Demo数据集是一个过滤版本,仅包含atheism.alt和soc.religion.christian两个主题。该数据集基于SetFit/20_newsgroups数据集,用于AI Maker Community的研讨会。原始数据集是Scikit-learn提供的20 newsgroups数据集,包含约18000个新闻组帖子,分为20个主题,并按照特定日期前后发布的消息分为训练集和测试集。我们遵循了推荐的做法,从每篇新闻文章中删除了标题、签名块和引用。
This dataset is a filtered version of the 20 newsgroups dataset, containing only the atheism.alt and soc.religion.christian topics. It includes three main features: text (text content), label (integer type), and label_text (string type). The dataset is divided into a training set with 1766 samples. The original dataset contains around 18000 newsgroups posts across 20 topics, split into training and testing sets. This version removes headers, signature blocks, and quotations from each news article for more realistic training.
提供机构:
aihpi



