NYtimes_train_test_set.hdf5

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/12760692

下载链接

链接失效反馈

官方服务：

资源简介：

The NYtimes dataset, part of the Bags of Words dataset from the UCI repository, comprises a collection of New York Times news articles represented as a bag of words. Each document in the dataset is associated with a set of word occurrences, where the dimensions represent unique words extracted from the articles. The dataset is organised as a document–word matrix, where each row corresponds to a document and each column corresponds to a word. The values in the matrix indicate the frequency of each word occurring in the respective document. Preprocessing steps include tokenization, removal of stopwords, and vocabulary truncation, with only words occurring more than ten times retained.

创建时间：

2024-07-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集