Sogou News Corpus (SOGOU)
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5259055
下载链接
链接失效反馈官方服务:
资源简介:
Sogou news corpus (SOGOU): It is a Chinese dataset from the combination of the SogouCA and SogouCS news corpora, containing 500K news articles in various topic channels. They were labeled by manually classifying their domain names. Five categories were defined: ‘‘sports’’, ‘‘finance’’, ‘‘entertainment’’, ‘‘automobile’’ and ‘‘technology’’. The models for English can be applied to this dataset without change. The fields used are title and content (Zhang et al., 2016).
创建时间:
2021-08-26



