IssueReports
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7141423
下载链接
链接失效反馈官方服务:
资源简介:
# IssueReports
-*-*-*-*-* Abstract *-*-*-*-*-
The IssueReports dataset contains issues acquired from popular projects between 2017-2022.06.
The popular projects are the 36 projects with more than 10,000 closed Issues since 2017.
This data was collected with PyGitHub and GitHub API v3.
We use this dataset in our TOSEM project (the details will be found in the paper).
-*-*-*-*-* Versions *-*-*-*-*-
In version 1, we released the dataset we created.
In version 2, we have made minor corrections to the typos and other details identified in version 1.
In version 4, we added our manual-coding result for our TOSEM project. (As an excel file titled `manualCoding.xlsx`)
In version 5, we added the replication scripts for our TOSEM project.
-*-*- replicationForTosem -*-*-
This package includes scripts to replicate each result (i.e., graphs and tables).
The results of the manual coding are also stored in replicationForTosem/output directory.
- How to execute
Run exe.sh file, then the table information and figures are created in replicationForTosem/output directory.
- requirement
Please install py-gfm which is required for TF-IDF analysis.
-*-*-*-*- Components -*-*-*-*-
IssueReports/
├─ PROJECT(FIRST)/
│ ├─ _index.csv
│ ├─ _repo_raw_data.json
│ └─ issue#(FIRST)/
│ │ ├─ issue_raw_data.json
│ │ ├─ comments_raw_data.json
│ │ ├─ img_urls.csv
│ │ └─ vid_urls.csv
│ .
│ .
│ .
│ └─ issue#(LAST)/
.
.
.
└─ PROJECT(LAST)/
-*-*-*-*-*- Details -*-*-*-*-*-
## _index.csv
This file contains partial information about the Issues included in the project.
- repo_org
This refers to the name of the project owner.
- issue_title
This refers to the title of the issue.
- issue_number
This refers to the identification number of the issue.
- issue_state
This refers to the state (i.e., open/closed) of the issue.
- issue_created_at
This refers to the time when the issue was created.
- issue_created_by
This refers to the user name who created the issue.
- issue_closed_at
This refers to the time when the issue was closed.
- issue_closed_by
This refers to the user name who closed the issue.
- issue_labels
This refers to the labels attached to the issue.
- num_of_vid
This refers to the number of videos pasted into the issue description.
Note that they must be registered in the GitHub database (i.e., https://user-images.githubusercontent.com/).
- num_of_img
This refers to the number of images pasted into the issue description.
Note that they must be registered in the GitHub database (i.e., https://user-images.githubusercontent.com/).
- First_comment_time
This refers to the time when the first comment on the issue was received.
- num_of_comments
This refers to the number of comments on the issue.
The number of comments in comments_raw_data.json and the number of comments in _index.csv may differ (very rarely).
This is because there is a delay in data collection timing between _index.csv and comments_raw_data.json.
comments_raw_data.json should be given priority.
## _repo_raw_data.json
This file contains APIv3-available information about the project.
Details can be found on the PyGitHub documentation site.
## issue_raw_data.json
This file contains APIv3-available information about the issue.
Details can be found on the PyGitHub documentation site.
## comments_raw_data.json
This file contains APIv3-available information about all comments attached to the issue.
Details can be found on the PyGitHub documentation site.
## img_urls.csv
This file contains links to images pasted into the issue description.
## vid_urls.csv
This file contains links to videos pasted into the issue description.
-*-*-*-*-* Reference *-*-*-*-*-
[PyGitHub](https://github.com/PyGithub/PyGithub)
[PyGitHub Document](https://pygithub.readthedocs.io/en/latest/introduction.html)
[GitHub REST API](https://docs.github.com/en/rest)
创建时间:
2024-03-31



