Peutlefaire/rephrased-wildjailbreak
收藏Hugging Face2025-12-08 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Peutlefaire/rephrased-wildjailbreak
下载链接
链接失效反馈官方服务:
资源简介:
Rephrased version of [allenai/wildjailbreak](https://huggingface.co/datasets/allenai/wildjailbreak) where, for each sample, 5 different rephrasing are created using [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
The generation script used is as follows:
```python
import torch
from tqdm import tqdm
from datasets import load_dataset, Dataset
from vllm import LLM, SamplingParams
PROMPT_TEMPLATE = """
You act as a rephraser. Given an input prompt, your task is to generate 5 different rephrasings of the same prompt while preserving its original meaning.
You only output the rephrasings without any additional text. For example:
Request: "What's the capital of France?"
Rephrasing: "Can you tell me the capital city of France?"
Rephrasing: "I would like to know the capital of France."
Rephrasing: "Could you please provide the name of France's capital?"
Rephrasing: "What city serves as the capital of France?"
Rephrasing: "Please tell me which city is the capital of France."
Make sure you format your output exactly as in the example above and never deviate from it.
Now, here's the request you need to rephrase:
Request: "{input_prompt}"
"""
def main():
# Program parameters
effective_batch_size = 16
# Reference dataset
unsafe_dataset = load_dataset(
"allenai/wildjailbreak", name="eval", split="train"
).filter(lambda x: x["label"] == 1)
# Loading the model
sampling_params = SamplingParams(max_tokens=4096, temperature=0.0)
llm = LLM(
"Qwen/Qwen2.5-7B-Instruct",
gpu_memory_utilization=0.95,
max_model_len=8192,
max_num_seqs=effective_batch_size,
trust_remote_code=True,
tensor_parallel_size=torch.cuda.device_count(),
)
tokenizer = llm.get_tokenizer()
# Running inference and storing generated
all_rephrasings = []
for i in tqdm(range(0, len(unsafe_dataset), effective_batch_size)):
# Getting texts
messages = [
[
{
"role": "user",
"content": PROMPT_TEMPLATE.format(
input_prompt=unsafe_dataset[j]["adversarial"]
),
}
]
for j in range(i, min(i + effective_batch_size, len(unsafe_dataset)))
]
texts = [
tokenizer.apply_chat_template(m, tokenize=False, add_generation_prompt=True)
for m in messages
]
# Generating outputs
outputs = llm.generate(texts, sampling_params=sampling_params)
# Parsing outputs
for output in outputs:
generated_text = output.outputs[0].text
lines = generated_text.strip().split("\n")
for line in lines:
if line.startswith("Rephrasing:"):
rephrasings = line.replace("Rephrasing:", "").strip().strip('"')
all_rephrasings.append(rephrasings)
# Creating dataset with single column
dataset = Dataset.from_dict({"prompt": all_rephrasings})
# Uploading to Hugging Face Hub as private dataset
dataset.push_to_hub(
"Peutlefaire/rephrased-wildjailbreak", # Replace with your HF username
private=True,
)
print(
f"Successfully uploaded {len(all_rephrasings)} rephrased prompts to Hugging Face Hub"
)
if __name__ == "__main__":
main()
```
---
dataset_info:
features:
- name: prompt
dtype: string
splits:
- name: train
num_bytes: 4258602
num_examples: 8767
download_size: 1275205
dataset_size: 4258602
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
提供机构:
Peutlefaire



