five

msc-smart-contract-auditing/vulnerability-severity-classification

收藏
Hugging Face2024-05-04 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/msc-smart-contract-auditing/vulnerability-severity-classification
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: function dtype: string - name: severity dtype: string splits: - name: train num_bytes: 1327573 num_examples: 2473 - name: test num_bytes: 237962 num_examples: 437 download_size: 670552 dataset_size: 1565535 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* license: mit task_categories: - text-classification language: - en tags: - finance pretty_name: Severity of Vulnerable Solidity Functions size_categories: - 1K<n<10K --- This dataset combines vulnerable functions (scraped from 5 auditting companies: [Codehawks](https://www.codehawks.com/), [ConsenSys](https://consensys.io/), [Cyfrin](https://www.cyfrin.io/), [Sherlock](https://www.sherlock.xyz/), [Trust Security](https://www.trust-security.xyz/)) and auddited functions with no vulnerabilities (scraped from [Etherscan](https://etherscan.io)) The purpose of the dataset is to enable training of classification models to discriminate between the 4 classes: `none`, `low`, `medium` and `high`. | Field | Description | |-|-| | 1. `function` | Raw solidity code | | 2. `severity` | Severity of vulnerability ('none', `low`, `medium`, `high`) | # Data Analysis <img src="https://huggingface.co/datasets/msc-smart-contract-audition/vulnerability-severity-classification/resolve/main/figures/severity-distribution.png"> <img src="https://huggingface.co/datasets/msc-smart-contract-audition/vulnerability-severity-classification/resolve/main/figures/length-severity-distribution.png"> # Additional Info - The newline characters are escaped (i.e. `\\n`)

数据集信息: 特征字段: - 字段名:function 数据类型:字符串 - 字段名:severity 数据类型:字符串 数据划分: - 子集名:train 字节大小:1327573 样本数量:2473 - 子集名:test 字节大小:237962 样本数量:437 下载大小:670552 数据集总大小:1565535 配置项: - 配置名称:default 数据文件: - 数据划分子集:train 文件路径:data/train-* - 数据划分子集:test 文件路径:data/test-* 许可证:MIT 任务类别: - 文本分类 语言: - 英语 标签: - 金融 数据集展示名称:易受攻击Solidity函数的严重性 规模类别: - 1000 < 样本数 < 10000 本数据集整合了两类函数:从5家审计公司(Codehawks、ConsenSys、Cyfrin、Sherlock、Trust Security)爬取的易受攻击函数,以及从Etherscan爬取的无漏洞经审计函数。 本数据集旨在支持分类模型的训练,以区分`none`、`low`、`medium`、`high`四个类别。 | 字段 | 描述 | |-|-| | 1. `function` | 原始Solidity代码 | | 2. `severity` | 漏洞严重性等级(取值为`none`、`low`、`medium`、`high`) | # 数据分析 ![严重性分布](https://huggingface.co/datasets/msc-smart-contract-audition/vulnerability-severity-classification/resolve/main/figures/severity-distribution.png) ![代码长度与严重性分布](https://huggingface.co/datasets/msc-smart-contract-audition/vulnerability-severity-classification/resolve/main/figures/length-severity-distribution.png) # 附加说明 - 换行符已转义(即`\n`)
提供机构:
msc-smart-contract-auditing
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作