five

Chinese Social Media Persona's Stance Dataset

收藏
arXiv2026-01-28 更新2026-01-29 收录
下载链接:
https://github.com/17Huaaa/DGTA-stance-detection/
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集由中央民族大学研究团队构建,是首个高质量中文多领域社交媒体立场检测语料库,包含70,931条经过严格交叉验证的微博文本。数据来源于240个领域多样化用户的125,176条原始帖子,经过去噪、匿名化处理后,通过三大主流LLM协同标注和人工校验流程,最终保留涵盖单目标与多目标场景的样本。数据集采用动态目标生成范式,包含人物、事件、抽象概念等多元目标类型,支持、反对、中立三种立场标签分布均衡。该资源旨在推动开放域零样本立场检测研究,解决真实社交场景中目标动态化、多目标共存等挑战性问题。

This dataset, constructed by the research team from Minzu University of China, is the first high-quality Chinese multi-domain social media stance detection corpus, which contains 70,931 strictly cross-validated Weibo posts. It is sourced from 125,176 original posts from diverse users across 240 domains. After denoising and anonymization processing, the data underwent collaborative annotation by three mainstream LLMs and manual verification, finally retaining samples covering both single-target and multi-target scenarios. The dataset adopts the dynamic target generation paradigm, with diverse target types including individuals, events, abstract concepts and more, and has a balanced distribution of three stance labels: support, oppose and neutral. This resource is intended to promote open-domain zero-shot stance detection research, addressing challenging issues in real social scenarios such as dynamic targets and coexistence of multiple targets.
提供机构:
中央民族大学·信息工程学院; 中央民族大学·少数民族语言资源监测与研究中心; 中央民族大学·国家安全研究院; 中央民族大学·少数民族语言文学系
创建时间:
2026-01-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作