five

Comprehensive dataset of over 4000 subreddits across 13 categories

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13343577
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset encompasses a rich collection of 4000 subreddits organized into 13 distinct categories, providing a valuable resource for researchers and data scientists in the fields of social media analysis, natural language processing, and community dynamics. The subreddits and the respective categories were obtained here.  Each subreddit contains an average of over 400 posts and 11 million unique users. The dataset is formatted in JSON. The data is structured in the following manner. id: the post's unique identifier post_user: the post's author (anonymized) post_time: the time at which the post was created, in unix time post_body: the post's body comments: a list of comments on the post, where each comment is a dictionary with the following keys: id: the comment's unique identifier user: the comment's author (anonymized) time: the time at which the comment was created, in unix time body: the comment's body replies: a list of replies to the comment, where each reply is a dictionary with the same information as a comment. The comments and replies are threaded within.
创建时间:
2024-10-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作