Name: Peutlefaire/rephrased-wildjailbreak
Creator: Peutlefaire
Published: 2025-12-08 16:38:24
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/Peutlefaire/rephrased-wildjailbreak

下载链接

链接失效反馈

官方服务：

资源简介：

Rephrased version of [allenai/wildjailbreak](https://huggingface.co/datasets/allenai/wildjailbreak) where, for each sample, 5 different rephrasing are created using [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). The generation script used is as follows: ```python import torch from tqdm import tqdm from datasets import load_dataset, Dataset from vllm import LLM, SamplingParams PROMPT_TEMPLATE = """ You act as a rephraser. Given an input prompt, your task is to generate 5 different rephrasings of the same prompt while preserving its original meaning. You only output the rephrasings without any additional text. For example: Request: "What's the capital of France?" Rephrasing: "Can you tell me the capital city of France?" Rephrasing: "I would like to know the capital of France." Rephrasing: "Could you please provide the name of France's capital?" Rephrasing: "What city serves as the capital of France?" Rephrasing: "Please tell me which city is the capital of France." Make sure you format your output exactly as in the example above and never deviate from it. Now, here's the request you need to rephrase: Request: "{input_prompt}" """ def main(): # Program parameters effective_batch_size = 16 # Reference dataset unsafe_dataset = load_dataset( "allenai/wildjailbreak", name="eval", split="train" ).filter(lambda x: x["label"] == 1) # Loading the model sampling_params = SamplingParams(max_tokens=4096, temperature=0.0) llm = LLM( "Qwen/Qwen2.5-7B-Instruct", gpu_memory_utilization=0.95, max_model_len=8192, max_num_seqs=effective_batch_size, trust_remote_code=True, tensor_parallel_size=torch.cuda.device_count(), ) tokenizer = llm.get_tokenizer() # Running inference and storing generated all_rephrasings = [] for i in tqdm(range(0, len(unsafe_dataset), effective_batch_size)): # Getting texts messages = [ [ { "role": "user", "content": PROMPT_TEMPLATE.format( input_prompt=unsafe_dataset[j]["adversarial"] ), } ] for j in range(i, min(i + effective_batch_size, len(unsafe_dataset))) ] texts = [ tokenizer.apply_chat_template(m, tokenize=False, add_generation_prompt=True) for m in messages ] # Generating outputs outputs = llm.generate(texts, sampling_params=sampling_params) # Parsing outputs for output in outputs: generated_text = output.outputs[0].text lines = generated_text.strip().split("\n") for line in lines: if line.startswith("Rephrasing:"): rephrasings = line.replace("Rephrasing:", "").strip().strip('"') all_rephrasings.append(rephrasings) # Creating dataset with single column dataset = Dataset.from_dict({"prompt": all_rephrasings}) # Uploading to Hugging Face Hub as private dataset dataset.push_to_hub( "Peutlefaire/rephrased-wildjailbreak", # Replace with your HF username private=True, ) print( f"Successfully uploaded {len(all_rephrasings)} rephrased prompts to Hugging Face Hub" ) if __name__ == "__main__": main() ``` --- dataset_info: features: - name: prompt dtype: string splits: - name: train num_bytes: 4258602 num_examples: 8767 download_size: 1275205 dataset_size: 4258602 configs: - config_name: default data_files: - split: train path: data/train-* ---

应用场景：