Shellcode_IA32
收藏OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/Shellcode_IA32
下载链接
链接失效反馈官方服务:
资源简介:
Shellcode_IA32 是一个包含 20 年各种来源的 shellcode 的数据集,是迄今为止可用的汇编中最大的 shellcode 集合。
该数据集包含 3,200 个用于 IA-32(x86 英特尔架构的 32 位版本)的汇编语言指令示例,来自公开可用的安全漏洞。我们收集了用于从exploit-db 和shell-storm 生成shellcode 的汇编程序。我们通过添加来自流行教程和书籍的 IA-32 架构的汇编程序示例来丰富数据集。这使我们能够了解不同的作者和汇编专家是如何评论的,从而了解如何在这种特定的上下文中处理自然语言的歧义。我们的数据集包含 10% 的从书籍和指南中收集的指令,其余的则来自真实的 shellcode。
我们的重点是 Linux,这是最常见的安全关键网络服务操作系统。因此,我们添加了使用 Netwide Assembler (NASM) 为 Linux 编写的汇编指令。
Shellcode_IA32 数据集的每一行代表一个片段 - 意图对。该片段是一行或多行汇编代码的组合,按照 NASM 语法构建。意图是用英语发表评论。
以下论文描述了有关数据集的进一步统计数据和使用神经机器翻译 (NMT) 模型执行的一组初步实验:Shellcode_IA32: A Dataset for Automatic Shellcode Generation。
Shellcode_IA32 is a dataset of shellcode sourced from various origins spanning 20 years, and it is the largest collection of assembly-language shellcode currently available.
This dataset contains 3,200 examples of assembly language instructions for IA-32 (the 32-bit version of the x86 Intel architecture), sourced from publicly disclosed security vulnerabilities. We collected assembly programs used to generate shellcode from exploit-db and shell-storm. We enriched the dataset by adding assembly program examples for the IA-32 architecture from popular tutorials and books. This allowed us to gain insights into how different authors and assembly experts annotate their code, thereby understanding how to handle natural language ambiguity in this specific context. Ten percent of the instructions in our dataset are collected from books and guides, while the remaining portion comes from real-world shellcode.
Our focus is on Linux, the most prevalent operating system for security-critical network services. As such, we added assembly instructions written for Linux using the Netwide Assembler (NASM).
Each row in the Shellcode_IA32 dataset represents a snippet-intent pair. The snippet is a combination of one or more lines of assembly code constructed in accordance with NASM syntax, while the intent is an English-language annotation.
The following paper describes further statistics regarding the dataset and a set of preliminary experiments conducted using neural machine translation (NMT) models: *Shellcode_IA32: A Dataset for Automatic Shellcode Generation*.
提供机构:
OpenDataLab
创建时间:
2022-05-24
搜集汇总
数据集介绍

背景与挑战
背景概述
Shellcode_IA32是一个包含3,200个IA-32架构汇编指令示例的数据集,主要用于自动shellcode生成研究,特别关注Linux操作系统。数据集结合了公开安全漏洞、教程和书籍的汇编代码,提供了片段-意图对的结构。
以上内容由遇见数据集搜集并总结生成



