habedi/stack-exchange-dataset
收藏Hugging Face2023-11-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/habedi/stack-exchange-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc
task_categories:
- text-classification
- question-answering
language:
- en
size_categories:
- 10K<n<100K
pretty_name: Stack Exchange -- Question Dataset
---
This dataset consists of three CSV files, namely: 'cs.csv', 'ds.csv', and 'p.csv'.
Each CSV file includes the data for the questions asked on a Stack Exchange (SE) question-answering community, from the creation of the community until May 2021.
- 'cs.csv' --> [Computer Science SE](https://cs.stackexchange.com/)
- 'ds.csv' --> [Data Science SE](https://datascience.stackexchange.com/)
- 'p.csv' --> [Political Science SE](https://politics.stackexchange.com/)
Each CSV file has the following columns:
- `id`: the question id
- `title`: the title of the question
- `body`: the body or text of the question
- `tags`: the list of tags assigned to the question
- `label`: a label indicating whether the question is resolved or not (0: not resolved; 1: resolved)
The dataset was used in these researches:
- [A deep learning-based approach for identifying unresolved questions on Stack Exchange Q&A communities through graph-based communication modelling](https://doi.org/10.1007/s41060-023-00454-0)
- [Survival analysis for user disengagement prediction: question-and-answering communities’ case](https://doi.org/10.1007/s13278-022-00914-8)
提供机构:
habedi
原始信息汇总
数据集概述
基本信息
- 许可证: cc
- 任务类别:
- 文本分类
- 问答
- 语言: 英语
- 数据集大小: 10K<n<100K
- 数据集名称: Stack Exchange -- Question Dataset
数据文件
- 文件列表:
cs.csvds.csvp.csv
数据内容
- 来源: 来自Stack Exchange问答社区的问题数据,截止至2021年5月。
cs.csv: 来自Computer Science SEds.csv: 来自Data Science SEp.csv: 来自Political Science SE
数据结构
- 列信息:
id: 问题IDtitle: 问题标题body: 问题内容tags: 问题标签列表label: 问题是否已解决的标签(0: 未解决; 1: 已解决)



