Measuring Online Hate on 4chan Using Deep Learning
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14219047
下载链接
链接失效反馈官方服务:
资源简介:
This is the dataset released with the paper titled: "Measuring Online Hate on 4chan Using Deep Learning".
This dataset contains a collection of 500,000 posts extracted from the /pol/ board (Politically Incorrect) of 4chan using the 4chan API. The dataset is structured as a single CSV file with one column, com, which includes the raw content of the posts.
The dataset does not preserve the structure of threads or replies; instead, it consists of a flat collection of individual posts extracted from /pol/. This format is intended to support applications such as text analysis, natural language processing, and computational social science research by providing a straightforward dataset of raw post content.
Dataset Format
File Format: CSV (Comma-Separated Values)
Columns:
com: The raw content of the post.
Source
The posts were extracted from 4chan’s /pol/ board using the official 4chan API. This board is known for hosting discussions on various topics, often with a focus on political content. Due to the nature of the /pol/ board, the content may include offensive language, hate speech, or otherwise sensitive material. Users should exercise caution and consider ethical implications when analysing this dataset.
Potential Use Cases
Text analysis and natural language processing (NLP).
Studies on online discourse, extremism, or political polarization.
Research on language usage and sentiment in online forums.
Development and testing of machine learning models for text classification or moderation.
Example Data
Here’s an example of what a few rows of the dataset look like:
com
"Why does no one talk about this?"
"The government is hiding the truth!"
"We need to take action against this injustice."
If you find our dataset useful, please cite our paper:
@article{
}
创建时间:
2025-02-10



