five

ICPC2022 ERA :Dataset used for research

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6071587
下载链接
链接失效反馈
官方服务:
资源简介:
## Abstract ICPC2022 ERA This repository contains our dataset. ## Description of submissions - _data_Comments.csv The file contains the interquatile range, min, max, mean, and standard deviation values of the Comments attribute for each issue report category (i.e., Img, Vid, and None).  - _data_DescriptionLength.csv The file contains the interquatile range, min, max, mean, and standard deviation values of the DescriptionLength attribute for each issue report category (i.e., Img, Vid, and None).  - _data_FisrtCommentTime.csv The file contains the interquatile range, min, max, mean, and standard deviation values of the FisrtCommentTime attribute for each issue report category (i.e., Img, Vid, and None).  - _data_ResolutionTime.csv The file contains the interquatile range, min, max, mean, and standard deviation values of the ResolutionTime attribute for each issue report category (i.e., Img, Vid, and None).  - _data_IssueCreatedYear.csv This file contains the proportion of issue report categories for each year.  - _high_tfidf.csv This file contains the top-200 characteritic words in terms of TF-IDF for each issue report category in descending order. - _downloaded_data.csv This file contains all downloaded issue reports (approximately 770,000). Each row corresponds to an issue report and shows all attributes, tags (issue_labels), the issue category (issue_type), and the TF-IDF values for the words (words). Note that this data includes pull requests because of the specification of the GitHub api. - our_dataset.csv This file contains the studied issue reports (approximately 230,000) with the same information as _downloaded_data.csv. Hence, this file does not include pull requests, and issue reports containing specific tags or invalid values.  ## Attributes of our_dataset.csv - issue_created_at_year This refers to IssueCreatedYear in the paper. - issue_resolved_time This refers to ResolutionTime in the paper. - num_of_img This refers to Images in the paper. - num_of_mov This refers to Videos in the paper. - num_of_comments This refers to Comments in the paper. - first_comment_time This refers to FirstCommentTime in the paper. - num_of_words This refers to DescriptionLength in the paper. - issue_labels This refers to lists of attached tags. - issue_type This refers to the category in the paper. - words This refers to lists of the TF-IDF values for the words in the issue description for each issue.
创建时间:
2022-02-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作