IRIS-database
收藏Zenodo2026-03-31 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19314849
下载链接
链接失效反馈官方服务:
资源简介:
IRIS Replication Package
This repository contains the data, annotation artifacts, prompts, model outputs, and evaluation scripts used in the study on IRIS, a four-dimensional framework for evaluating the quality of security code review comments.
Overview
Security code review comments vary widely in usefulness and technical depth. To support reproducible evaluation, this package organizes the materials used to derive, annotate, and automatically score the four IRIS dimensions:
Identification: whether the comment clearly identifies a valid defect
Reason: whether the comment explains why the issue is a defect
Impact: whether the comment states the consequence of the defect
Solution: whether the comment provides actionable repair guidance
The package includes:
the source datasets used in the study,
the manually annotated sample used for taxonomy construction and scoring,
dimension-specific prompts and model outputs,
scripts for computing agreement and evaluation metrics.
Package Contents
IRIS/├── README.md├── requirements.txt├── data/│ ├── literature_survey.xlsx│ └── posts_list.xlsx├── taxonomy/│ ├── taxonomy_definition.md│ ├── manual_1.xlsx│ ├── manual_2.xlsx│ └── final_taxonomy_labels.xlsx├── annotation/│ ├── score_rubric.md│ └── manual_scoring/│ ├── batch_1.xlsx│ ├── batch_2.xlsx│ ├── batch_3.xlsx│ ├── final_manual_scores.xlsx│ └── calc_qwk.py├── llm_evaluation/│ ├── score_rubric.md│ ├── identification/│ │ ├── instances.xlsx│ │ ├── base_prompt_identify.txt│ │ ├── expert_prompt_identify.txt│ │ ├── score_identification.py│ │ ├── score_eva.py│ │ ├── human_score_identification.xlsx│ │ ├── gpt-5.1_identification.xlsx│ │ ├── deepseek-v3.2_identification.xlsx│ │ ├── qwen3-max_identification.xlsx│ │ ├── multi-model_identification.xlsx│ │ └── metrics_identify.xlsx│ ├── reason/│ ├── impact/│ └── solution/└── docs/ └── package_notes.md
Main Files
1. data/
literature_survey.xlsx: literature extraction sheet used in the triangulation stage (Section 3.1).
posts_list.xlsx: source list for industry guidelines and developer-community materials (Section 3.1).
2. taxonomy/
taxonomy_definition.md: definitions of the IRIS taxonomy categories and decision order used for labeling.
manual_1.xlsx and manual_2.xlsx: independent taxonomy annotations from two annotators.
final_taxonomy_labels.xlsx: final reconciled taxonomy labels for the sampled comments.
3. annotation/manual_scoring/
batch_1.xlsx, batch_2.xlsx, and batch_3.xlsx: three batches of human annotation for the four IRIS dimensions.
final_manual_scores.xlsx: final ground-truth four-dimensional scores determined after discussion.
calc_qwk.py: script for calculating quadratic weighted Cohen's kappa between the two annotators.
4. llm_evaluation/
This directory contains the automated scoring setup for each dimension.
For each of the four dimensions (identification, reason, impact, and solution), the directory includes:
instances.xlsx: evaluation instances, each containing ID, Code_Diff, and Comment.
base prompt and expert prompt text files.
a dimension-specific scoring script.
a metric script for agreement computation.
model outputs from GPT-5.1, DeepSeek-V3.2, Qwen3-Max, and the multi-model setting.
a final metrics spreadsheet summarizing Exact Match, Quadratic Weighted Kappa, Spearman's rho, Gwet's AC2, and Krippendorff's alpha.
5. score_rubric.md
Contains the 4-point scoring criteria for all four IRIS dimensions.
Environment
The scripts were developed in Python. A minimal environment should include:
python>=3.10pandasopenpyxlscikit-learnscipytqdmopenaiirrCACkrippendorffnumpy
Running the Evaluation
Run scripts from the corresponding dimension folder.
Example:
cd llm_evaluation/identificationpython score_identification.pypython score_eva.py
Repeat the same process for reason, impact, and solution.
API Configuration
The scoring scripts expect API credentials through environment variables. Before running the scripts, configure the required endpoints and keys.
Example variables referenced in the scripts include:
DS_BASE_URLDS_API_KEYQWEN_BASE_URLQWEN_API_KEYGPT_BASE_URLGPT_API_KEY
提供机构:
Zenodo
创建时间:
2026-03-31



