白人至上主义语言数据集
收藏arXiv2023-06-28 更新2024-06-21 收录
下载链接:
https://osf.io/274z3/
下载链接
链接失效反馈官方服务:
资源简介:
本研究聚焦于白人至上主义极端主义语言的检测,构建了一个包含118,842条文本的数据集。该数据集由卡内基梅隆大学的研究团队创建,主要收集自明确表达白人至上主义观点的在线空间,如Stormfront和Iron March等论坛。数据集的创建过程中,研究者通过弱监督学习方法,从大量文本中筛选出与白人至上主义相关的主题。此数据集的应用领域主要在于识别和分析网络中的白人至上主义言论,旨在帮助研究人员和政策制定者更好地理解和应对这一日益严重的社会问题。
This study focuses on the detection of white supremacist extremist language, and constructs a dataset containing 118,842 text samples. This dataset was developed by a research team at Carnegie Mellon University, and was primarily collected from online spaces that explicitly espouse white supremacist viewpoints, including forums such as Stormfront and Iron March. During the dataset construction process, researchers utilized a weak supervision learning approach to screen out content related to white supremacist themes from a large volume of textual data. The primary application scenario of this dataset is the identification and analysis of white supremacist speech across online networks, with the aim of assisting researchers and policymakers in better understanding and addressing this increasingly severe social issue.
提供机构:
软件与社会系统系,卡内基梅隆大学
创建时间:
2023-06-28



