/ourguy/ mentions and associated names on 4chan/pol/
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5013746
下载链接
链接失效反馈官方服务:
资源简介:
Datasets related to the meme '/ourguy/' on 4chan/pol/, used in this paper.
The data are all derived from 4CAT, a tool that archives 4chan, and consist of the following:
ourguy_mentions_4chanpol_no_referrals.csv: Posts on 4chan/pol/ where either the body or subject text mentions one of the following: "our guy", "ourguy", "/our guy/", or "/ourguy/".
ourguy_mentions_4chanpol.csv: The same dataset as the first, but here I also added posts that were replied-to with an implicit /ourguy/-referral in the original dataset (e.g. without a name - 'Yes, he's definitely /ourguy/'). I identified this by marking which posts were implicit referrals (for time reasons, I only did so for sanitised text that appeared twice or more). With the implicit referrals, I checked whether they started with two greater-than signs and an integer, representing a reply on 4chan (e.g. '>>12345678'). I extracted the posts numbers from these replies, queried them in 4CAT, and added the results to the above dataset. Finally, I deleted any duplicates.
ourguy_counts.xlsx: Numbers for the amount of posts per month and day from the first dataset, with area graphs (hence the .xlsx).
top_ourguys.csv: The top names of public figures associated to /ourguy/. I extracted these in two ways. First, I used SpaCy's language model for entity recognition to extract tokens recognised as a PERSON entity (in either the body or subject). I also used word collocations to extract names surrounding '/ourguy/' (window size of six). I then merged these and filtered out the valid names. I did not consider posts where three or more PERSON-entities were recognised to prevent spam from dominating. I also did not account for polysemy and separated surnames from given names so only one token remained.
top_ourguys_month.csv: The same as top_ourguys.csv, but separated per month.
All data ranges from late November 2013 to 28 May 2020.
创建时间:
2022-03-02



