five

Replication Package for ASE 2023 Paper "Personalized First Issue Recommender for Newcomers in Open Source Projects"

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7915840
下载链接
链接失效反馈
官方服务:
资源简介:
This replication package contains a replication package for ASE 2023 paper titled "Personalized First Issue Recommender for Newcomers in Open Source Projects." This package includes a dataset of 68,858 issues from 100 GitHub projects, records of 123 manually labeled issue samples, and Python scripts for analyzing the data and evaluating models. The package is also stored in the GitHub repository https://github.com/mcxwx123/PFIRec. Required Environment We recommend setting up the required environment on a commodity Linux machine with at least 1 CPU Core, 8GB Memory, and 100GB empty storage space. Our experiments were conducted on an Ubuntu 20.04 server with two Intel Xeon Gold CPUs, 320GB memory, and 36TB RAID 5 Storage. Files and Replicating Results We used the GFI-bot database and the GitHub GraphQL API to collect features of 68,858 candidate issues and restore historical states of resolvers of 11,615 FIs (first issues). The followings are the files and replicating results: Dataset: The raw data of newcomer-issue pairs' features are stored in ReplicationPackage/data/dataset_{bertmodel}_{num}.pkl, where {bertmodel} is one of the four BERT-based language models: SIMCSE, RoBERTa, CodeBERT, and BERTOverflow, corresponding to the dataset whose textual features are extracted by one of the four language models. And {num} is 0 to 19, corresponding to the 20 chronological folds. The training sets of the GFI-Bot approach are contained in ReplicationPackage/data/training_set_recgfi_simcse_{num}.pkl. ReplicationPackage/data/newcomerdata.json contains first issues' title and description and their resolvers' total commit number and number of commits in the latest month, and ReplicationPackage/data/processeddata.pkl contains the 37 developers' features for the empirical study. ReplicationPackage/data/isstexts.json contains issues titles and descriptions for Stanik et al.'s approach. Python scripts: ReplicationPackage/empirical.py is the script for reproducing all the results in Section III of the paper. ReplicationPackage/model.py is the script for reproducing all the results in Section IV of the paper. Records: ReplicationPackage/PFIs.csv records the manually labeled issues for the empirical study. Figures: By running ReplicationPackage/empirical.py and ReplicationPackage/model.py, you can get all the figures in the fold ReplicationPackage/figures/. Besides the figures in the paper, ReplicationPackage/figures/ also contains typedis_{num}.png, and domaindis_{num}.png, {num} is 1 to 4, representing additional results of newcomer features for Figure 4 in the paper.
创建时间:
2023-07-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作