five

IRIS-database

收藏
Zenodo2026-03-31 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19314849
下载链接
链接失效反馈
官方服务:
资源简介:
IRIS Replication Package This repository contains the data, annotation artifacts, prompts, model outputs, and evaluation scripts used in the study on IRIS, a four-dimensional framework for evaluating the quality of security code review comments. Overview Security code review comments vary widely in usefulness and technical depth. To support reproducible evaluation, this package organizes the materials used to derive, annotate, and automatically score the four IRIS dimensions: Identification: whether the comment clearly identifies a valid defect Reason: whether the comment explains why the issue is a defect Impact: whether the comment states the consequence of the defect Solution: whether the comment provides actionable repair guidance The package includes: the source datasets used in the study, the manually annotated sample used for taxonomy construction and scoring, dimension-specific prompts and model outputs, scripts for computing agreement and evaluation metrics. Package Contents IRIS/├── README.md├── requirements.txt├── data/│   ├── literature_survey.xlsx│   └── posts_list.xlsx├── taxonomy/│   ├── taxonomy_definition.md│   ├── manual_1.xlsx│   ├── manual_2.xlsx│   └── final_taxonomy_labels.xlsx├── annotation/│   ├── score_rubric.md│   └── manual_scoring/│       ├── batch_1.xlsx│       ├── batch_2.xlsx│       ├── batch_3.xlsx│       ├── final_manual_scores.xlsx│       └── calc_qwk.py├── llm_evaluation/│   ├── score_rubric.md│   ├── identification/│   │   ├── instances.xlsx│   │   ├── base_prompt_identify.txt│   │   ├── expert_prompt_identify.txt│   │   ├── score_identification.py│   │   ├── score_eva.py│   │   ├── human_score_identification.xlsx│   │   ├── gpt-5.1_identification.xlsx│   │   ├── deepseek-v3.2_identification.xlsx│   │   ├── qwen3-max_identification.xlsx│   │   ├── multi-model_identification.xlsx│   │   └── metrics_identify.xlsx│   ├── reason/│   ├── impact/│   └── solution/└── docs/   └── package_notes.md Main Files 1. data/ literature_survey.xlsx: literature extraction sheet used in the triangulation stage (Section 3.1). posts_list.xlsx: source list for industry guidelines and developer-community materials (Section 3.1). 2. taxonomy/ taxonomy_definition.md: definitions of the IRIS taxonomy categories and decision order used for labeling. manual_1.xlsx and manual_2.xlsx: independent taxonomy annotations from two annotators. final_taxonomy_labels.xlsx: final reconciled taxonomy labels for the sampled comments. 3. annotation/manual_scoring/ batch_1.xlsx, batch_2.xlsx, and batch_3.xlsx: three batches of human annotation for the four IRIS dimensions. final_manual_scores.xlsx: final ground-truth four-dimensional scores determined after discussion. calc_qwk.py: script for calculating quadratic weighted Cohen's kappa between the two annotators. 4. llm_evaluation/ This directory contains the automated scoring setup for each dimension. For each of the four dimensions (identification, reason, impact, and solution), the directory includes: instances.xlsx: evaluation instances, each containing ID, Code_Diff, and Comment. base prompt and expert prompt text files. a dimension-specific scoring script. a metric script for agreement computation. model outputs from GPT-5.1, DeepSeek-V3.2, Qwen3-Max, and the multi-model setting. a final metrics spreadsheet summarizing Exact Match, Quadratic Weighted Kappa, Spearman's rho, Gwet's AC2, and Krippendorff's alpha. 5. score_rubric.md Contains the 4-point scoring criteria for all four IRIS dimensions. Environment The scripts were developed in Python. A minimal environment should include: python>=3.10pandasopenpyxlscikit-learnscipytqdmopenaiirrCACkrippendorffnumpy Running the Evaluation Run scripts from the corresponding dimension folder. Example: cd llm_evaluation/identificationpython score_identification.pypython score_eva.py Repeat the same process for reason, impact, and solution. API Configuration The scoring scripts expect API credentials through environment variables. Before running the scripts, configure the required endpoints and keys. Example variables referenced in the scripts include: DS_BASE_URLDS_API_KEYQWEN_BASE_URLQWEN_API_KEYGPT_BASE_URLGPT_API_KEY
提供机构:
Zenodo
创建时间:
2026-03-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作