five

What Social Media Platforms Miss About White Supremacist Speech

收藏
DataCite Commons2025-04-12 更新2025-04-16 收录
下载链接:
https://www.openicpsr.org/openicpsr/project/156161/version/V2/view?path=/openicpsr/156161/fcr:versions/V2/reddit_posts.txt&type=file
下载链接
链接失效反馈
官方服务:
资源简介:
Data includes 274,668 posts scraped from Stormfront and 509,982 comments collected from the Reddit API. The following files are included:<br>stormfront_posts.txt: one post per line, no post metadatareddit_posts.txt: one comment per line, no comment metadatastormfront_post_data_processed.json.gz: preprocessed posts from Stormfront, includes post metadatareddit_sample.csv.gz: preprocessed comments from Reddit, includes comment metadataTwitter data used in the report is not available for public reuse because of Twitter's terms of service and our data use agreement with VOX-Pol.<br><br>The following Python modules were used for analysis:<br>Gensim's Lda Sequence model (https://radimrehurek.com/gensim/models/ldaseqmodel.html)<br>Shifterator (https://shifterator.readthedocs.io/en/latest/)pyLDAvis (https://pyldavis.readthedocs.io/en/latest/readme.html)

本数据集包含从Stormfront爬取的274,668条帖子,以及通过Reddit应用程序编程接口(API)采集的509,982条评论。本次发布包含以下文件: stormfront_posts.txt:每行存储一条帖子,不含帖子元数据(metadata) reddit_posts.txt:每行存储一条评论,不含评论元数据(metadata) stormfront_post_data_processed.json.gz:经预处理的Stormfront帖子数据,包含帖子元数据(metadata) reddit_sample.csv.gz:经预处理的Reddit评论数据,包含评论元数据(metadata) 本报告中使用的Twitter数据因受Twitter服务条款以及本研究团队与VOX-Pol签署的数据使用协议限制,无法公开复用。 本次分析使用了以下Python模块: Gensim的LDA序列模型(Lda Sequence model),详见:https://radimrehurek.com/gensim/models/ldaseqmodel.html Shifterator,详见:https://shifterator.readthedocs.io/en/latest/ pyLDAvis,详见:https://pyldavis.readthedocs.io/en/latest/readme.html
提供机构:
ICPSR - Interuniversity Consortium for Political and Social Research
创建时间:
2025-04-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作