Sockpuppet Corpus
收藏arXiv2013-10-25 更新2024-06-21 收录
下载链接:
http://docsig.cis.uab.edu
下载链接
链接失效反馈官方服务:
资源简介:
Sockpuppet Corpus是由阿拉巴马大学伯明翰分校的研究人员创建的数据集,包含623个真实的sockpuppet案例,这些案例来自维基百科中的编辑讨论页。数据集大小约为每个案例平均180条评论,总数据量庞大,涉及实际恶意用户创建的多个身份。创建过程中,研究人员通过半自动化爬虫技术从维基百科的特定URL收集数据,并经过手动筛选和验证。该数据集主要用于开发自动检测维基百科中sockpuppet的工具,同时也支持研究者在社交媒体中进行作者身份识别的研究,解决在线匿名和隐私保护的问题。
Sockpuppet Corpus is a dataset created by researchers at the University of Alabama at Birmingham. It contains 623 real sockpuppet cases sourced from the edit discussion pages of Wikipedia. The dataset has an average of approximately 180 comments per case, with a substantial total volume of data, involving multiple identities created by actual malicious users. During dataset construction, researchers collected data from specific URLs of Wikipedia via semi-automated crawler technology, followed by manual screening and validation. This dataset is primarily used to develop automated tools for detecting sockpuppets on Wikipedia, and also supports researchers in conducting author identification research on social media, addressing the issues of online anonymity and privacy protection.
提供机构:
阿拉巴马大学伯明翰分校
创建时间:
2013-10-25



