DocWarn Replication Package
收藏DataCite Commons2023-02-19 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/DocWarn_Replication_Package/16823143/1
下载链接
链接失效反馈官方服务:
资源简介:
A replication package of "Towards Reliable Agile Iterative Planning via Predicting Documentation Changes of Work Items" published at the 19th International Conference on Mining Software Repositories – Technical track (MSR 2022 Technical track). Replication code DocWarn-T: /code/model/DocWarn_T.py DocWarn-H: /code/model/DocWarn_H.py DocWarn-C: /code/Rscript/DocWarn_C.R Analysis code for RQ1 can be found at /code/Rscript/ RQ1_performance_measure.R must be run first to measure the performance of the model. Then, run RQ1_performance_stattest.R to perform statistical test on the measured performance. RQ3_rank_features.R is used to find a statistical distinct rank for each features in DocWarn-C. Results and dataset can be found at /data (only available on on Figshare version: https://figshare.com/s/88547b3c197b21b60f7c) /data/data_reverted_cleaned stores dataset that the work items were reverted to sprint assignment time. /data/trainingData stores the dataset for each cross-validation round. /data/features stores the metrics extracted from each work items in the dataset. /data/modelResult stores the DocWarn-C R models (/models/...), performance of each DocWarn variations, and the result of features ranking. Manual classification /rq2_manual_validation.csv is the result of RQ2's manual classification to validate DocWarn-C. /rq2_manual_validation_external.csv is the manual classification that were done by the external coder (to measure the inter-rater agreement). code 0 = others, 1 = changing scope, 2 = defining scope, 3 = adding additional detail, 4 = adding implementation detail DistilRoberta for DocWarn-T and DocWarn-H can be found at /distilroberta-base-jira (only available on on Figshare version: https://figshare.com/s/88547b3c197b21b60f7c) This is the fine-tuned version of distilroberta-base with 110k JIRA issues.
本数据集为发表于第19届国际软件仓库挖掘会议技术论坛(MSR 2022技术论坛)的论文《面向通过预测工作项(work items)的文档变更实现可靠的敏捷迭代规划》(Towards Reliable Agile Iterative Planning via Predicting Documentation Changes of Work Items)的复现包。
复现代码分布如下:DocWarn-T 位于 `/code/model/DocWarn_T.py`,DocWarn-H 位于 `/code/model/DocWarn_H.py`,DocWarn-C 位于 `/code/Rscript/DocWarn_C.R`。
针对研究问题1(RQ1)的分析代码存放在 `/code/Rscript/` 目录中:需先运行`RQ1_performance_measure.R`以测算模型性能,随后运行`RQ1_performance_stattest.R`对已测算的性能结果开展统计检验。`RQ3_rank_features.R`用于为DocWarn-C中的每个特征求取具有统计学显著性的排序结果。
结果与数据集可通过`/data`目录获取(仅可通过Figshare版本获取:https://figshare.com/s/88547b3c197b21b60f7c),其中各子目录内容如下:
- `/data/data_reverted_cleaned` 存储了还原至迭代规划分配阶段的工作项数据集;
- `/data/trainingData` 存储了各交叉验证轮次的训练数据集;
- `/data/features` 存储了从数据集中各工作项提取得到的指标数据;
- `/data/modelResult` 存储了DocWarn-C的R模型(位于`/models/`子目录)、各DocWarn变体的性能表现,以及特征排序的结果。
`/rq2_manual_validation.csv` 为RQ2的人工分类结果,用于验证DocWarn-C模型的有效性;`/rq2_manual_validation_external.csv` 为外部编码人员完成的人工分类结果,用于测算编码者间一致性。
标签定义如下:代码0代表其他类型,1代表变更范围,2代表定义范围,3代表补充额外细节,4代表补充实现细节。
用于DocWarn-T与DocWarn-H的DistilRoberta模型可通过`/distilroberta-base-jira`目录获取(仅可通过Figshare版本获取:https://figshare.com/s/88547b3c197b21b60f7c),该模型为基于11万个JIRA问题微调后的distilroberta-base版本。
提供机构:
figshare
创建时间:
2023-02-19



