Comprehensive dataset of over 4000 subreddits across 13 categories
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13343577
下载链接
链接失效反馈官方服务:
资源简介:
This dataset encompasses a rich collection of 4000 subreddits organized into 13 distinct categories, providing a valuable resource for researchers and data scientists in the fields of social media analysis, natural language processing, and community dynamics. The subreddits and the respective categories were obtained here.
Each subreddit contains an average of over 400 posts and 11 million unique users.
The dataset is formatted in JSON.
The data is structured in the following manner.
id: the post's unique identifier
post_user: the post's author (anonymized)
post_time: the time at which the post was created, in unix time
post_body: the post's body
comments: a list of comments on the post, where each comment is a dictionary with the following keys:
id: the comment's unique identifier
user: the comment's author (anonymized)
time: the time at which the comment was created, in unix time
body: the comment's body
replies: a list of replies to the comment, where each reply is a dictionary with the same information as a comment.
The comments and replies are threaded within.
创建时间:
2024-10-01



