msc-smart-contract-auditing/vulnerability-severity-classification
收藏Hugging Face2024-05-04 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/msc-smart-contract-auditing/vulnerability-severity-classification
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: function
dtype: string
- name: severity
dtype: string
splits:
- name: train
num_bytes: 1327573
num_examples: 2473
- name: test
num_bytes: 237962
num_examples: 437
download_size: 670552
dataset_size: 1565535
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
license: mit
task_categories:
- text-classification
language:
- en
tags:
- finance
pretty_name: Severity of Vulnerable Solidity Functions
size_categories:
- 1K<n<10K
---
This dataset combines vulnerable functions (scraped from 5 auditting companies: [Codehawks](https://www.codehawks.com/), [ConsenSys](https://consensys.io/), [Cyfrin](https://www.cyfrin.io/), [Sherlock](https://www.sherlock.xyz/), [Trust Security](https://www.trust-security.xyz/)) and auddited functions with no vulnerabilities (scraped from [Etherscan](https://etherscan.io))
The purpose of the dataset is to enable training of classification models to discriminate between the 4 classes: `none`, `low`, `medium` and `high`.
| Field | Description |
|-|-|
| 1. `function` | Raw solidity code |
| 2. `severity` | Severity of vulnerability ('none', `low`, `medium`, `high`) |
# Data Analysis
<img src="https://huggingface.co/datasets/msc-smart-contract-audition/vulnerability-severity-classification/resolve/main/figures/severity-distribution.png">
<img src="https://huggingface.co/datasets/msc-smart-contract-audition/vulnerability-severity-classification/resolve/main/figures/length-severity-distribution.png">
# Additional Info
- The newline characters are escaped (i.e. `\\n`)
数据集信息:
特征字段:
- 字段名:function
数据类型:字符串
- 字段名:severity
数据类型:字符串
数据划分:
- 子集名:train
字节大小:1327573
样本数量:2473
- 子集名:test
字节大小:237962
样本数量:437
下载大小:670552
数据集总大小:1565535
配置项:
- 配置名称:default
数据文件:
- 数据划分子集:train
文件路径:data/train-*
- 数据划分子集:test
文件路径:data/test-*
许可证:MIT
任务类别:
- 文本分类
语言:
- 英语
标签:
- 金融
数据集展示名称:易受攻击Solidity函数的严重性
规模类别:
- 1000 < 样本数 < 10000
本数据集整合了两类函数:从5家审计公司(Codehawks、ConsenSys、Cyfrin、Sherlock、Trust Security)爬取的易受攻击函数,以及从Etherscan爬取的无漏洞经审计函数。
本数据集旨在支持分类模型的训练,以区分`none`、`low`、`medium`、`high`四个类别。
| 字段 | 描述 |
|-|-|
| 1. `function` | 原始Solidity代码 |
| 2. `severity` | 漏洞严重性等级(取值为`none`、`low`、`medium`、`high`) |
# 数据分析


# 附加说明
- 换行符已转义(即`\n`)
提供机构:
msc-smart-contract-auditing



