Reddit blackout announcements: 2023 API protest
收藏DataONE2024-02-07 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:35b3f1b218c36a7a0d87430d756e6920c318016c2756b0b91a0148f286bd2b9a
下载链接
链接失效反馈官方服务:
资源简介:
Starting June 12, 2023, many Reddit communities (subreddits) began a protest where they \"went dark\" - by changing to private mode - as a protest in response to Reddit's plans to change its API access policies and fee structure. Supporters of the protest criticize the planned changes for being prohibitively expensive for 3rd party apps. Beyond 3rd party apps, there is significant concern that the API changes are a move by the platform to increase monetization, degrade the user experience, and eventually kill off other custom features such as the old.reddit.com interface, the Reddit Enhancement Suite browser extension, and more. Additionally, there are concerns that the API changes will impede the ability of subreddit moderators (who are all unpaid users) to access tools to keep their communities on-topic and free of spam.
This dataset includes the \"stickied\" posts that appeared on 5,351 subreddits on June 11, 2023 and June 12, 2023 - including many subreddits announcing their plans to pa..., The list of subreddits was created from the ist of participating subreddits that had been collated in the /r/ModCoord subreddit. An initial Python script looks at three reddit posts and grabs the list of participating subreddits:
https://www.reddit.com/r/ModCoord/comments/1401qw5/incomplete_and_growing_list_of_participating/
https://www.reddit.com/r/ModCoord/comments/143fzf6/incomplete_and_growing_list_of_participating/
https://www.reddit.com/r/ModCoord/comments/146ffpb/incomplete_and_growing_list_of_participating/
It uses the requests library to get the HTTP response body. Then it uses re to search for links that look like <a href=\"/r/iphone/\">r/iphone</a>, e.g. what the list looks like in the post. Next it's just a bit of string cleanup and then writing to an output file.
This script does not use the Reddit API at all. It's just basic HTTP requests.
A second Python script then reads that list and uses the Reddit API to request information about current posts in each subr..., , # Reddit Blackout Announcements - 2023 API Protest
# Reddit Blackout Announcements - 2023 API Protest
This dataset includes the list of scraped subreddits, a single CSV file for each subreddit, and a copy of the Python scripts used to scrape the data.
## Description of the data and file structure
The dataset is uploaded as a single .zip file. Once it is downloaded and decompressed, it will include several files and directories. Here is how they are organized
.
âââ subreddit-list.txt
âââ CSVs
âââ [subreddit-name].csv
âââ [...]
âââ code
âââ [...]
âââ parsed TXTs
âââ API.txt
âââ blackout.txt
âââ community.txt
âââ mod-team.txt
âââ moderator.txt
âââ platform.txt
âââ protest.txt
### Subreddit List
The subreddit-list.txt file contains a list of 5,351 subreddit names. Each appears on its own line. This list was generated using the list-subreddits.py script, as described below.
#### Stickied Posts - CSVs
The \"CSVs\" directory contains 5,351 CSV (Comma Separated Value) files, each named ...
2023年6月12日起,众多Reddit社区(subreddits,即子版块)发起抗议活动,通过将社区切换为私密模式实现“关停”,以此反对Reddit修改API访问政策与收费结构的计划。抗议支持者批评,拟议的修改将使第三方应用面临高得难以承受的使用成本。除第三方应用外,各界还广泛担忧,API修改是该平台为提升商业化程度、降低用户体验,并最终取缔诸如旧版reddit.com界面、Reddit增强套件(Reddit Enhancement Suite)浏览器扩展程序等其他自定义功能的举措。此外,还有人担心,API修改会阻碍子版块版主(均为无偿志愿者用户)使用工具维护社区话题合规性与清理垃圾信息的能力。
本数据集收录了2023年6月11日与6月12日,5351个子版块中置顶的固定帖(stickied posts)——其中涵盖诸多宣布将参与抗议的子版块相关内容。子版块列表源自/r/ModCoord子版块中汇总的参与抗议子版块名单。首个Python脚本通过三篇Reddit帖子抓取参与抗议的子版块列表:
https://www.reddit.com/r/ModCoord/comments/1401qw5/incomplete_and_growing_list_of_participating/
https://www.reddit.com/r/ModCoord/comments/143fzf6/incomplete_and_growing_list_of_participating/
https://www.reddit.com/r/ModCoord/comments/146ffpb/incomplete_and_growing_list_of_participating/
该脚本借助requests库获取HTTP响应正文,随后通过正则表达式(regular expression,后文简称re)搜索形如`<a href="/r/iphone/">r/iphone</a>`的链接,即帖子中列表的呈现形式。后续仅需对字符串进行少量清理操作,再将结果写入输出文件即可。本脚本完全未调用Reddit官方API,仅通过基础HTTP请求实现功能。
随后的第二个Python脚本读取该列表,并借助Reddit API获取每个子版块当前帖子的相关信息……
# Reddit黑潮抗议公告 - 2023年API抗议
本数据集包含抓取得到的子版块列表、每个子版块对应的单个逗号分隔值(Comma Separated Value,简称CSV)文件,以及用于抓取数据的Python脚本副本。
## 数据说明与文件结构
本数据集以单个.zip压缩包形式上传。下载并解压后,将包含若干文件与目录,结构如下:
├── subreddit-list.txt
├── CSVs
│ ├── [subreddit-name].csv
│ └── [...]
├── code
│ └── [...]
└── parsed TXTs
├── API.txt
├── blackout.txt
├── community.txt
├── mod-team.txt
├── moderator.txt
├── platform.txt
└── protest.txt
### 子版块列表
subreddit-list.txt文件包含5351个子版块名称,每行一个。该列表通过下文所述的list-subreddits.py脚本生成。
#### 置顶帖 - CSV文件
"CSVs"目录包含5351个CSV文件,每个文件命名为……
创建时间:
2025-07-27



