阿里云NLP新闻分类赛
收藏阿里云天池2026-06-09 更新2025-02-22 收录
下载链接:
https://tianchi.aliyun.com/dataset/196620
下载链接
链接失效反馈官方服务:
资源简介:
阿里云NLP新闻分类赛。
赛题以新闻数据为赛题数据,数据集报名后可见并可下载。赛题数据为新闻文本,并按照字符级别进行匿名处理。整合划分出14个候选分类类别:财经、彩票、房产、股票、家居、教育、科技、社会、时尚、时政、体育、星座、游戏、娱乐的文本数据。
赛题数据由以下几个部分构成:训练集20w条样本,测试集A包括5w条样本,测试集B包括5w条样本。为了预防选手人工标注测试集的情况,我们将比赛数据的文本按照字符级别进行了匿名处理。
Alibaba Cloud NLP News Classification Competition. The competition uses news data as the dataset, which is accessible and downloadable after registration. The competition dataset comprises news texts anonymized at the character level, and 14 candidate classification categories are integrated and divided: Finance, Lottery, Real Estate, Stocks, Home Furnishing, Education, Technology, Society, Fashion, Politics and Current Affairs, Sports, Constellation, Games, and Entertainment.
The dataset consists of three parts: a training set with 200,000 samples, Test Set A with 50,000 samples, and Test Set B with 50,000 samples. To prevent participants from manually annotating the test sets, all text in the competition dataset has been anonymized at the character level.
提供机构:
阿里云天池
创建时间:
2025-02-18
搜集汇总
数据集介绍

背景与挑战
背景概述
阿里云NLP新闻分类赛数据集包含20万条训练样本和10万条测试样本,涵盖14个新闻类别,所有文本均经过字符级别匿名处理。
以上内容由遇见数据集搜集并总结生成



