five

Data for: An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit

收藏
Mendeley Data2024-06-25 更新2024-06-27 收录
下载链接:
https://data.mendeley.com/datasets/85njyhj45m
下载链接
链接失效反馈
官方服务:
资源简介:
Topic labelled online social network (OSN) data sets are useful to evaluate topic modelling and document clustering tasks. We provide three data sets with topic labels from two online social networks: Twitter and Reddit. To comply with Twitter’s terms and conditions, we only publish the tweet identifiers along with the topic label. The Reddit data is supplied with the full text and the topic label. The first Twitter data set was collected from the Twitter API by filtering for the hashtag #Auspol, used to tag political discussion tweets in Australia. The second Twitter data set was originally used in the RepLab 2013 competition and contains expert annotated topics. The Reddit data set consists of 40,000 Reddit parent comments from May 2015 belonging to 5 subreddit pages, which are used as topic labels.
创建时间:
2024-01-23
二维码
社区交流群
二维码
科研交流群
商业服务