Replication Package for "When AI Writes Code: Investigating Security Issues in Agentic Software Changes"
收藏Figshare2026-03-03 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Replication_Package_for_When_AI_Writes_Code_Investigating_Security_Issues_in_Agentic_Software_Changes_/30937364
下载链接
链接失效反馈官方服务:
资源简介:
Initially based on the AIDev Dataset proposed at the MSR Mining Challenge, this project aims to explore the prevalence and nature of security smells in code generated by AI agents compared to human developers.Our work provides insights into:The most common security vulnerabilities (categorized by CWE).A comparison of security posture between Human and AI-generated code. A ranking of AI agents based on their tendency to introduce security issues.All findings and methodologies are detailed in the paper.Repository StructureThis replication package is organized into data processing pipelines and specific analysis modules.📂 Data & Pre-processingThe Data/ directory contains the datasets used for analysis and the scripts required to clean and enrich them.Data/: Core data storage.SAST_results/: Raw output from static analysis tools (Bandit, Semgrep, SonarQube, CodeQL and YARA (optional for YARA)).Dataset_completion/: Scripts to retrieve missing data, such as get_patch_for_HPR (Github API fetcher for human PR patches).aggregate_data.py: The main script to merge results into a single file (full_sast_results.csv), calculate metrics, and filter out test files to ensure accurate analysis.📊 Analysis ModulesThe following folders contain scripts for statistical analysis, visualization, and metric calculation used in the paper.1. Category Analysis (Category-analysis/)Focuses on the classification and frequency of identified issues.bandit_rules.json: Configuration for mapping Bandit IDs to broader security categories.frequency_categorization.py: Categorizes issues based on CWE and OWASP classifications.histogram_severity.py: Generates visualizations regarding the severity distribution of the findings.2. Human vs. Agent Comparison (HumanAgent-comparaison/)Scripts dedicated to comparing the distribution of security smells between human developers and AI agents.Percentage-Severity-comparaison.py: Generates stacked bar charts to visualize the proportion of severity levels.Man_Withney_Comparaison.py: Performs statistical tests to validate the significance of the differences observed between the two groups.3. Agent Ranking (AgentRanking/)Evaluates and ranks specific AI agents based on their security performance.agents_ranking.py: Calculates the "Smells Tendency Score" and ranks agents.Output: Generates visualizations such as ranking_global_Weighted_Net_Score.png to normalize results by code volume.Usagepip install -r requirements.txt) and run the scripts that you nedd
创建时间:
2026-03-03



