en.Size=150.TagCap=300.SEL.10tags
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/ipipan-barstar/ICCS25.MfHNSiEGSCoTD
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了2019年至2023年间在Twitter上发布的与特定话题标签相关的随机推文,每条推文字数超过150个字符。数据集的特点是基于GloVe词向量嵌入,并进行了多项调整以处理负相似度。其规模属于大型,任务是通过图谱聚类方法预测话题标签。
This dataset contains random tweets published on Twitter between 2019 and 2023 that are associated with specific hashtags, with each tweet exceeding 150 characters in length. Built upon GloVe word vector embeddings, the dataset has undergone multiple adjustments to address negative similarity issues. As a large-scale dataset, its targeted task is to predict hashtags via graph clustering methods.
提供机构:
Twitter



