Chinese Weibo Hashtag Generation (WHG) Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/OpenSUM/HashtagGen
下载链接
链接失效反馈官方服务:
资源简介:
该数据集专为提升中文微博标签生成的性能而创建,包含了特定的微博帖子与标签配对。平均每个帖子附带约1个标签,总序列长度约为4.2。该数据集的规模包含了31万2千7百62个帖子与标签的配对,其任务旨在进行标签生成。
This dataset is specifically constructed to enhance the performance of Chinese Weibo hashtag generation, and it includes paired Weibo posts and their corresponding hashtags. On average, each post is associated with approximately 1 hashtag, with an average total sequence length of about 4.2. The dataset contains a total of 312,762 post-hashtag pairs, and its targeted task is hashtag generation.



