TouTiaoCNews
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/aceimnorstuvwxz/toutiao-text-classfication-dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个简短的新闻文本分类数据集,包含了382,688个样本,这些样本被划分为15个不同的类别。此外,整个数据集按照0.8:0.1:0.1的比例被划分为训练集、验证集和测试集,以便于进行更有效的模型训练与评估。该数据集的规模为382,688个样本,所涉及的任务是文本分类。
This is a short news text classification dataset consisting of 382,688 samples classified into 15 distinct categories. Additionally, the entire dataset is partitioned into training, validation, and test sets at a ratio of 0.8:0.1:0.1 to facilitate more effective model training and evaluation. The dataset has a total of 382,688 samples, and the involved task is text classification.



