five

AISE-TUDelft/memtune-data_attack

收藏
Hugging Face2025-01-17 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/AISE-TUDelft/memtune-data_attack
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含用于论文《How Much Do Code Language Models Remember? An Investigation on Data Extraction Attacks before and after Fine-tuning》的攻击样本。数据集分为两个部分:fine-tuning攻击和pre-training攻击。fine-tuning攻击部分来源于fine-tuning集,pre-training攻击部分来源于TheStack-v2的Java部分。根据样本的重复率,数据集又分为不同的子集,包括完全唯一样本的d1,重复两次的d2,重复三次的d3,以及重复超过三次的dg3。

This dataset consists of attack samples used for the paper How Much Do Code Language Models Remember? An Investigation on Data Extraction Attacks before and after Fine-tuning. It includes two parts: the fine-tuning attack from the fine-tuning set, and the pre-training attack from the Java section of TheStack-v2. Depending on the duplication rate of the samples, the dataset is further divided into subsets: d1 with unique samples, d2 with samples appearing twice, d3 with samples appearing three times, and dg3 with samples appearing more than three times.
提供机构:
AISE-TUDelft
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作