five

Global News 60K

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/global-news-60k
下载链接
链接失效反馈
官方服务:
资源简介:
Text classification systems have become increasingly important in recent years due to the explosion of online documents and the need to sort them for specific services. One of the most critical issues in text classification is the limited availability and diversity of datasets, which can lead to overfitting and poor generalization. In this context, we present a new dataset named Global News 60K (GN60K), which consists of 60,000 news articles from different sources from different parts of the world, covering 10 topics. The dataset provides a rich vocabulary, avoids overfitting problems, and creates better-generalized models.The topics included in the dataset are Politics, Sports, Entertainment, Science and Technology, Business, Health, Environment, Education, Arts and Culture, and Crime. We selected these topics because they cover a wide range of interests and are commonly used in text classification applications. To further increase the dataset's diversity, we considered articles from different parts of the world, including North America, Europe, Asia, Africa, and South America.The articles were selected based on their publication dates, which range from 2022 and 2023.  We believe that our dataset will be valuable for researchers and practitioners working on text/topic classification tasks. The GN60K dataset provides a diverse and well-labelled set of documents that can be used for training and testing various machine learning models. Additionally, the dataset can be used to develop new algorithms for topic classification, and related tasks. We hope that our dataset will contribute to the advancement of the text classification field and foster new research ideas.
提供机构:
Nitti, Michele; Serreli, Luigi; Marche, Claudio
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作