LLM-Generated Python Fuzzing Seeds

Name: LLM-Generated Python Fuzzing Seeds
Creator: IEEE DataPort
Published: 2024-06-19 17:53:56
License: 暂无描述

DataCite Commons2024-06-19 更新2024-07-13 收录

下载链接：

https://ieee-dataport.org/documents/llm-generated-python-fuzzing-seeds

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset comprises over 38,000 seed inputs generated from a range of Large Language Models (LLMs), including ChatGPT-3.5, ChatGPT-4, Claude-Opus, Claude-Instant, and Gemini Pro 1.0, specifically designed for the application in fuzzing Python functions. These seeds were produced as part of a study evaluating the utility of LLMs in automating the creation of effective fuzzing inputs, a method crucial for uncovering software defects in the Python programming environment where traditional methods show limitations. The dataset targets 50 commonly used Python functions across various libraries, highlighting the diversity and potential of LLM-generated inputs to enhance software testing processes. Each seed input within this collection has been evaluated for its effectiveness in improving code coverage and instruction count, underpinning a comprehensive framework that aids in determining the most efficient LLMs for fuzzing tasks. The results from this dataset have demonstrated the significant potential of LLMs to match or even exceed the outcomes of conventional fuzzing campaigns, thereby supporting the advancement of automated and scalable fuzzing technologies.

本数据集包含38000余条种子输入，均源自多款大语言模型（Large Language Model，LLM），包括ChatGPT-3.5、ChatGPT-4、Claude-Opus、Claude-Instant及Gemini Pro 1.0，专为Python函数模糊测试 (fuzzing) 场景打造。此类种子输入源于一项评估大语言模型自动化生成高效模糊测试输入效用的研究，该方法在传统手段存在局限的Python编程环境中，对挖掘软件缺陷至关重要。本数据集覆盖各类库中的50个常用Python函数，彰显了大语言模型生成输入在优化软件测试流程方面的多样性与潜力。集合中的每一条种子输入均已针对其提升代码覆盖率与指令计数的有效性完成评估，这为构建用于筛选适配模糊测试任务最优大语言模型的综合框架提供了核心支撑。基于本数据集得到的实验结果表明，大语言模型在模糊测试任务中的表现可媲美甚至超越传统模糊测试活动，进而推动自动化、可扩展模糊测试技术的发展。

提供机构：

IEEE DataPort

创建时间：

2024-06-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集