five

社交问答社区数据集

收藏
国家基础学科公共科学数据中心2024-03-05 收录
下载链接:
https://www.nbsdc.cn/general/dataDetail?id=64ef851fbb16e0591d025685&type=1
下载链接
链接失效反馈
官方服务:
资源简介:
课题组使用与Quora相似的中文平台——知乎作为本数据集的数据来源。作为2010年成立的领先社交问答社区,截至2018年底,知乎已经吸引了超过2.2亿注册会员。在这个平台上,参与者可以发布问题、提供回答和交流知识。社区在交互页面上提供公开可见的用户配置文件、问题日志和回答数据。本数据集中分析的基本单位为每周。为了构建样本,收集了1500个新发布的问题。这些问题在2019年7月至2019年10月期间进行跟踪和观察,同时对每个问题的所有回答和提供回答者的用户数据进行爬取。通过匹配回答和用户,过滤出匿名用户数据。因此获得了初步样本的面板数据。每个观察值包括三层数据:问题层、应答层和贡献层。在爬虫循环中连续获得39382个回答。本数据集使用的样本是非平衡面板数据。数据量为157 MB。

The research team selected Zhihu, a Chinese social question-and-answer (Q&A) platform similar to Quora, as the data source for this dataset. As a leading social Q&A community founded in 2010, Zhihu had attracted over 220 million registered users by the end of 2018. On this platform, users can post questions, provide answers, and share knowledge. The platform makes publicly visible user profiles, question logs, and answer data available on its interactive pages. The fundamental unit of analysis for this dataset is the weekly interval. To construct the sample, 1500 newly posted questions were collected. These questions were tracked and observed between July and October 2019, while all answers to each question and the user data of the answer providers were crawled. By matching answers to their respective contributors, anonymous user data was filtered out, resulting in panel data for the initial sample. Each observation contains three layers of data: question layer, response layer, and contribution layer. A total of 39,382 answers were continuously collected during the crawling cycles. The sample utilized in this dataset is unbalanced panel data, with a total data size of 157 MB.
提供机构:
浙江大学
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集基于中文社交问答平台知乎,收集了2019年7月至10月期间的1500个新发布问题及其39382个回答,形成非平衡面板数据,涵盖问题、应答和贡献三个层次,总数据量157 MB。数据集聚焦于社交问答社区中的受欢迎度、信誉、时间距离等关键因素,适用于管理信息系统等学科的研究。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务