five

NYtimes_train_test_set.hdf5

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12760692
下载链接
链接失效反馈
官方服务:
资源简介:
The NYtimes dataset, part of the Bags of Words dataset from the UCI repository, comprises a collection of New York Times news articles represented as a bag of words. Each document in the dataset is associated with a set of word occurrences, where the dimensions represent unique words extracted from the articles. The dataset is organised as a document–word matrix, where each row corresponds to a document and each column corresponds to a word. The values in the matrix indicate the frequency of each word occurring in the respective document. Preprocessing steps include tokenization, removal of stopwords, and vocabulary truncation, with only words occurring more than ten times retained.
创建时间:
2024-07-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作