formai-v2-full
收藏数据集概述
数据集信息
特征
- category: 字符串类型
- file_name: 字符串类型
- verification_finished: 字符串类型
- vulnerable_line: 64位整数类型
- column: 64位整数类型
- function: 字符串类型
- violated_property: 字符串类型
- error_type: 字符串类型
- code_snippet: 字符串类型
- source_code: 字符串类型
- num_lines: 64位整数类型
- cyclomatic_complexity: 32位浮点数类型
数据分割
- train: 包含331000个样本,总字节数为960684901
数据集大小
- 下载大小: 133615536字节
- 数据集大小: 960684901字节
配置
- default: 包含训练数据文件,路径为
data/train-*
数据集来源
- 仓库: https://github.com/FormAI-Dataset/FormAI-dataset/?tab=readme-ov-file
- 论文: https://dl.acm.org/doi/10.1145/3617555.3617874
引用
bibtex @inproceedings{10.1145/3617555.3617874, author = {Tihanyi, Norbert and Bisztray, Tamas and Jain, Ridhi and Ferrag, Mohamed Amine and Cordeiro, Lucas C. and Mavroeidis, Vasileios}, title = {The FormAI Dataset: Generative AI in Software Security through the Lens of Formal Verification}, year = {2023}, isbn = {9798400703751}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3617555.3617874}, doi = {10.1145/3617555.3617874}, booktitle = {Proceedings of the 19th International Conference on Predictive Models and Data Analytics in Software Engineering}, pages = {33–43}, numpages = {11}, keywords = {Artificial Intelligence, Dataset, Formal Verification, Large Language Models, Software Security, Vulnerability Classification}, location = {San Francisco, CA, USA}, series = {PROMISE 2023} }




