five

Dataset for "Mentions of Political Extremism in English Wikipedia"

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14328837
下载链接
链接失效反馈
官方服务:
资源简介:
This is a data set accompanying the article "Mentions of Political Extremism in English Wikipedia" (https://davidrozado.substack.com/p/mentions-of-political-extremism-in-wikipedia) To ensure the integrity of the frequency counts reported above, I provide a link to a file containing the analyzed terms, the corresponding Wikipedia paragraphs where the terms appear, and the URLs of the corresponding Wikipedia articles. Two important points to note: First, the data source is the Wikipedia Dump file "enwiki-20221001-pages-articles-multistream.xml," retrieved on October 15, 2022. The raw content was parsed to extract only the textual data from articles, excluding metadata and other non-article information. The final article text-only dataset is 23.6 GB in size. While the URLs provide a reference to the article where each target term appeared as of October 2022, some of the content of the articles may have changed since then. Second, the analysis above did not disambiguate terms with multiple meanings. For instance, a small fraction of the paragraphs analyzed use far-right to describe the position of an entity in a photograph, not political extremism. This is not too limiting for the results as there is no reason to expect such mentions to disproportionately outnumber equivalent uses of far-left. And still, even when excluding far-right and far-left from the analysis, the remaining references to right-wing political extremism (4,870) still outnumber all remaining references to left-wing political extremism (3,019).
创建时间:
2024-12-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作