five

DocWarn

收藏
DataCite Commons2023-02-19 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/DocWarn_Replication_Package/16823143
下载链接
链接失效反馈
官方服务:
资源简介:
A replication package of "Towards Reliable Agile Iterative Planning via Predicting Documentation Changes of Work Items" published at the 19th International Conference on Mining Software Repositories – Technical track (MSR 2022 Technical track). Replication code DocWarn-T: /code/model/DocWarn_T.py DocWarn-H: /code/model/DocWarn_H.py DocWarn-C: /code/Rscript/DocWarn_C.R Analysis code for RQ1 can be found at /code/Rscript/ RQ1_performance_measure.R must be run first to measure the performance of the model. Then, run RQ1_performance_stattest.R to perform statistical test on the measured performance. RQ3_rank_features.R is used to find a statistical distinct rank for each features in DocWarn-C. Results and dataset can be found at /data (only available on on Figshare version: https://figshare.com/s/88547b3c197b21b60f7c) /data/data_reverted_cleaned stores dataset that the work items were reverted to sprint assignment time. /data/trainingData stores the dataset for each cross-validation round. /data/features stores the metrics extracted from each work items in the dataset. /data/modelResult stores the DocWarn-C R models (/models/...), performance of each DocWarn variations, and the result of features ranking. Manual classification /rq2_manual_validation.csv is the result of RQ2's manual classification to validate DocWarn-C. /rq2_manual_validation_external.csv is the manual classification that were done by the external coder (to measure the inter-rater agreement). code 0 = others, 1 = changing scope, 2 = defining scope, 3 = adding additional detail, 4 = adding implementation detail DistilRoberta for DocWarn-T and DocWarn-H can be found at /distilroberta-base-jira (only available on on Figshare version: https://figshare.com/s/88547b3c197b21b60f7c) This is the fine-tuned version of distilroberta-base with 110k JIRA issues.

本资源为发表于第19届国际软件仓库挖掘会议技术专场(MSR 2022技术专场)的论文《面向通过预测工作项文档变更实现可靠敏捷迭代规划》的复现包。 本次复现所用代码如下:DocWarn-T 对应 /code/model/DocWarn_T.py,DocWarn-H 对应 /code/model/DocWarn_H.py,DocWarn-C 对应 /code/Rscript/DocWarn_C.R。 针对研究问题1(RQ1)的分析代码存放于/code/Rscript/目录下,需优先运行RQ1_performance_measure.R以计算模型性能,随后执行RQ1_performance_stattest.R对所得性能结果开展统计检验。RQ3_rank_features.R 用于为DocWarn-C中的每个特征提取具备统计学显著性的排序结果。 研究结果与数据集存储于/data目录(仅可在Figshare版本中获取:https://figshare.com/s/88547b3c197b21b60f7c),该目录下各子目录内容说明如下: - /data/data_reverted_cleaned:存储了回溯至迭代冲刺分配阶段的工作项数据集; - /data/trainingData:存储了每一轮交叉验证所用的训练数据集; - /data/features:存储了从数据集中每个工作项提取得到的指标数据; - /data/modelResult:存储了DocWarn-C的R模型(位于/models/目录下)、各DocWarn变体的性能结果以及特征排序结果。 /rq2_manual_validation.csv 为RQ2的手动分类结果,用于验证DocWarn-C模型;/rq2_manual_validation_external.csv 为外部编码人员完成的手动分类结果,用于衡量编码者间的一致性。 代码标签定义如下:0代表其他类别,1代表变更范围,2代表定义范围,3代表补充额外细节,4代表添加实现细节。 用于DocWarn-T与DocWarn-H的DistilRoberta模型存放于/distilroberta-base-jira目录下(仅可在Figshare版本中获取:https://figshare.com/s/88547b3c197b21b60f7c),该模型为基于11万个JIRA问题微调后的distilroberta-base版本。
提供机构:
figshare
创建时间:
2022-01-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作