AISE-TUDelft/memtune-data_attack
收藏Hugging Face2025-01-17 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/AISE-TUDelft/memtune-data_attack
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含用于论文《How Much Do Code Language Models Remember? An Investigation on Data Extraction Attacks before and after Fine-tuning》的攻击样本。数据集分为两个部分:fine-tuning攻击和pre-training攻击。fine-tuning攻击部分来源于fine-tuning集,pre-training攻击部分来源于TheStack-v2的Java部分。根据样本的重复率,数据集又分为不同的子集,包括完全唯一样本的d1,重复两次的d2,重复三次的d3,以及重复超过三次的dg3。
This dataset consists of attack samples used for the paper How Much Do Code Language Models Remember? An Investigation on Data Extraction Attacks before and after Fine-tuning. It includes two parts: the fine-tuning attack from the fine-tuning set, and the pre-training attack from the Java section of TheStack-v2. Depending on the duplication rate of the samples, the dataset is further divided into subsets: d1 with unique samples, d2 with samples appearing twice, d3 with samples appearing three times, and dg3 with samples appearing more than three times.
提供机构:
AISE-TUDelft



