Reddit and StackOverflow dataset (Programming languages)
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7685061
下载链接
链接失效反馈官方服务:
资源简介:
This data set contains anonymized data collected from Reddit (via the Pushshift API) and StackOverflow (from Kaggle's dataset).
Each folder includes the data split by trimester. The schema of StackOverflow and Reddit-related files follows:
Fields from StackOverflow
question_id
answer_id
creation_date - answer creation_date
score - score of the question/answer
tags - all tags flagged for a question
answer_count - number of answers for a question
start_question - question's time of creation
last_activity_date - last update on the question
new_id - hashed id of the answerer
q_new_id - hashed id of the questioner
Fields from Reddit
comment_id
submission_id
score - score of the question/submission
subreddit
created_utc - time of creation (unrelated to last modified comments)
new_id - hashed id
The .txt files represent the structure of the corresponding hypergraphs.
创建时间:
2023-03-07



