ICPC2022 ERA :Dataset used for research
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6071587
下载链接
链接失效反馈官方服务:
资源简介:
## Abstract
ICPC2022 ERA
This repository contains our dataset.
## Description of submissions
- _data_Comments.csv
The file contains the interquatile range, min, max, mean, and standard deviation values of the Comments attribute for each issue report category (i.e., Img, Vid, and None).
- _data_DescriptionLength.csv
The file contains the interquatile range, min, max, mean, and standard deviation values of the DescriptionLength attribute for each issue report category (i.e., Img, Vid, and None).
- _data_FisrtCommentTime.csv
The file contains the interquatile range, min, max, mean, and standard deviation values of the FisrtCommentTime attribute for each issue report category (i.e., Img, Vid, and None).
- _data_ResolutionTime.csv
The file contains the interquatile range, min, max, mean, and standard deviation values of the ResolutionTime attribute for each issue report category (i.e., Img, Vid, and None).
- _data_IssueCreatedYear.csv
This file contains the proportion of issue report categories for each year.
- _high_tfidf.csv
This file contains the top-200 characteritic words in terms of TF-IDF for each issue report category in descending order.
- _downloaded_data.csv
This file contains all downloaded issue reports (approximately 770,000). Each row corresponds to an issue report and shows all attributes, tags (issue_labels), the issue category (issue_type), and the TF-IDF values for the words (words). Note that this data includes pull requests because of the specification of the GitHub api.
- our_dataset.csv
This file contains the studied issue reports (approximately 230,000) with the same information as _downloaded_data.csv. Hence, this file does not include pull requests, and issue reports containing specific tags or invalid values.
## Attributes of our_dataset.csv
- issue_created_at_year
This refers to IssueCreatedYear in the paper.
- issue_resolved_time
This refers to ResolutionTime in the paper.
- num_of_img
This refers to Images in the paper.
- num_of_mov
This refers to Videos in the paper.
- num_of_comments
This refers to Comments in the paper.
- first_comment_time
This refers to FirstCommentTime in the paper.
- num_of_words
This refers to DescriptionLength in the paper.
- issue_labels
This refers to lists of attached tags.
- issue_type
This refers to the category in the paper.
- words
This refers to lists of the TF-IDF values for the words in the issue description for each issue.
创建时间:
2022-02-16



