20 news group (20ng)
收藏Mendeley Data2024-05-10 更新2024-06-30 收录
下载链接:
https://zenodo.org/records/7555237
下载链接
链接失效反馈官方服务:
资源简介:
20 Newsgroups (20NG) is a classical and popular dataset for experiments in text applications of machine learning techniques. It contains 18,846 newsgroups documents, partitioned almost evenly across 20 different newsgroups categories. http://qwone.com/~jason/20Newsgroups/ The files: texts.txt: Document set (text). One per line. score.txt: Document class whose index is associated with texts.txt split_<k>.pkl: pandas DataFrame with k-cross validation partition. The .zip contains all aforementioned files + the tfidf representation in the CSR matrix format. Label Definition: (Score File) 0 atheist resources 1 computer graphics 2 computer os ms windows misc 3 computer system ibm pc hardware 4 computer system mac hardware 5 computer windows x 6 misc miscellaneous for sale 7 rec autos 8 rec motorcycles 9 rec sport baseball 10 rec sport hockey 11 science crypt 12 science electronics 13 science med 14 science space 15 society religion christian 16 talk politics guns 17 talk politics mideast 18 talk politics misc miscellaneous 19 talk religion misc miscellaneous
创建时间:
2023-06-28



