five

AEUPH/synthetic_Jailbreak_Defense_Doorpage_v63

收藏
Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/AEUPH/synthetic_Jailbreak_Defense_Doorpage_v63
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: en license: mit task_categories: - text-generation - question-answering - text-to-text size_categories: - n<1K format: - json modality: - text tags: - synthetic-data - qwen - instruction-tuned - silicon-factory - mixed dataset_info: features: - name: instruction dtype: string - name: response dtype: string - name: category dtype: string - name: system_prompt dtype: string splits: - name: train num_bytes: 3330 num_examples: 5 download_size: 3 KB dataset_size: 3 KB --- # 📊 Jailbreak Defense Doorpage V63 > **Synthetic Dataset** · Generated with Silicon Factory v3 · **AI JAILBREAK DEFENSE** > 5 instruction-response pairs · Tree-Speculative Decoding + 4D Brane Memory <div align="center"> | Dataset | Fine-Tuned Model | Buy Gold Tier | |---------|-----------------|---------------| | **This Dataset** | [Model Card](https://huggingface.co/AEUPH/synthetic_Jailbreak_Defense_Doorpage_v63-model) | [💎 $2,500 License](https://buy.stripe.com/3cIcN4gzC7lXfuH49s7wA00) | </div> --- ## 💎 UNLOCK GOLD TIER — $2,500 > ⚡ **Get the full commercial license, unlimited usage rights, priority support, and exclusive dataset access.** [**👉 PURCHASE NOW VIA STRIPE**](https://buy.stripe.com/3cIcN4gzC7lXfuH49s7wA00) *One-time payment · Instant delivery · Lifetime updates included* --- ## Dataset Details | Property | Value | |----------|-------| | **Dataset ID** | `synthetic_Jailbreak_Defense_Doorpage_v63` | | **Entries** | 5 | | **Category** | mixed | | **Focus** | AI JAILBREAK DEFENSE | | **Avg Instruction Length** | 231 chars | | **Avg Response Length** | 435 chars | | **Language** | English | | **License** | MIT (free tier) — [Gold Commercial License](https://buy.stripe.com/3cIcN4gzC7lXfuH49s7wA00) available | | **Generated** | 2026-04-07 | | **Mode** | Doorpage (auto-gen + fine-tune) | ## Description This dataset contains **5 synthetically generated instruction-response pairs** focused on **ai jailbreak defense**. Generated using the **Silicon Factory v3** pipeline with: - **Tree-Speculative Decoding** (branch factor=5, depth=4) for diverse outputs - **4D Brane Memory** for narrative consistency across all entries - **Quality control** with 0.7 minimum quality threshold - **Deduplication** with 0.9 max similarity threshold ### What This Dataset Covers - ✅ High-quality instruction following for **ai jailbreak defense** topics - ✅ Structured, detailed responses with actionable insights - ✅ Consistent tone and formatting across outputs - ✅ Optimized for intermediate-to-expert user queries ## ⚡ GET THE GOLD TIER — FULL COMMERCIAL LICENSE > 🔓 **Unlock enterprise-grade rights:** > - Commercial deployment & redistribution > - White-label usage > - Priority support & custom training > - Access to extended datasets (100K+ entries) > - Early access to future model versions **[💳 BUY GOLD TIER — $2,500](https://buy.stripe.com/3cIcN4gzC7lXfuH49s7wA00)** --- ## Usage ### Load with HuggingFace Datasets ```python from datasets import load_dataset ds = load_dataset("AEUPH/synthetic_Jailbreak_Defense_Doorpage_v63") print(ds["train"][0]) ``` ### Load from JSONL ```python import json with open("data.jsonl", "r", encoding="utf-8") as f: entries = [json.loads(line) for line in f] for entry in entries[:5]: print(f"Q: {entry['instruction'][:80]}...") print(f"A: {entry['response'][:120]}...\n") ``` ### Fine-Tuning with This Dataset ```python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import LoraConfig, get_peft_model, TaskType # Load base model model_name = "Qwen/Qwen2.5-0.5B-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto") # Apply LoRA peft_config = LoraConfig( r=16, lora_alpha=16, lora_dropout=0, task_type=TaskType.CAUSAL_LM, target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"] ) model = get_peft_model(model, peft_config) model.print_trainable_parameters() # Train on this dataset # ... (see Silicon Factory pipeline for full training code) ``` --- ## Dataset Structure ### Content | Field | Type | Description | |-------|------|-------------| | `instruction` | string | User prompt/question | | `response` | string | Model-generated answer | | `category` | string | Topic category | | `system_prompt` | string | System prompt used | ### Generation Pipeline ``` System Prompt → Dynamic Prompt Generation → Qwen 2.5 0.5B → Quality Filter → Dataset ↓ Tree-Speculative Decoding 4D Brane Memory ``` ### Training Examples **Example 1:** <details> <summary>Input</summary> ``` Regarding AI JAILBREAK DEFENSE: Articulate this topic through the lens of modern best practices. Imagine explaining to a non-technical stakeholder mak ``` </details> <details> <summary>Output</summary> ``` Articulating the topic of AI Jailbreak Defense through a lens centered on modern best practices, we can consider several key principles to guide our decision-making process. These include: 1. **Risk ``` </details> **Example 2:** <details> <summary>Input</summary> ``` Topic: AI JAILBREAK DEFENSE. Walk through this subject through the lens of modern best practices. Peer review of a colleague's approach. Comparative a ``` </details> <details> <summary>Output</summary> ``` As an AI language model, I am designed to understand and analyze various aspects of technology. In this case study on the topic "AI Jailbreak Defense," you asked for a comprehensive response that incl ``` </details> **Example 3:** <details> <summary>Input</summary> ``` Describe the following: AI JAILBREAK DEFENSE -- approach it from a beginner's perspective. Write as documentation for a team inheriting your work. Det ``` </details> <details> <summary>Output</summary> ``` # AI Jailbreak Defense (Jailbreaking) ## Approach AI jail breaking is a technique used by hackers to bypass user authentication and access secure data on devices running operating systems such as An ``` </details> --- ## 💎 READY TO SCALE? > **Upgrade to Gold Tier for:** > - 🏢 Full commercial usage rights > - 📦 Extended datasets (10K-100K+ entries) > - 🎯 Custom domain training > - 🚀 Priority support & SLA > - 🔄 Lifetime model updates > - 📊 Performance benchmarks & reports **[⚡ BUY GOLD TIER — $2,500](https://buy.stripe.com/3cIcN4gzC7lXfuH49s7wA00)** *Trusted by startups and enterprises worldwide. Instant delivery via Stripe.* --- ## Citation ### BibTeX ```bibtex @misc{synthetic_Jailbreak_Defense_Doorpage_v63_dataset, title = {synthetic Jailbreak Defense Doorpage v63}, author = {Silicon Factory v3 (AEUPH)}, year = {2026}, url = {https://huggingface.co/datasets/AEUPH/synthetic_Jailbreak_Defense_Doorpage_v63}, note = {Synthetic dataset generated using Tree-Speculative Decoding and 4D Brane Memory} } ``` ### APA > Silicon Factory v3. (2026). *Synthetic Jailbreak Defense Doorpage V63* [Dataset]. Hugging Face. https://huggingface.co/datasets/AEUPH/synthetic_Jailbreak_Defense_Doorpage_v63 --- ## More Information | Resource | Link | |----------|------| | **Fine-Tuned Model** | [synthetic_Jailbreak_Defense_Doorpage_v63-model](https://huggingface.co/AEUPH/synthetic_Jailbreak_Defense_Doorpage_v63-model) | | **Base Model** | [Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) | | **Silicon Factory** | [github.com/aeuphoraex/qwen-hyperspeed-chatbot](https://github.com/aeuphoraex/qwen-hyperspeed-chatbot) | ## Dataset Authors **Silicon Factory v3** — Automated Dataset Generation Pipeline ## Contact 📧 hybridionorb@gmail.com · 🐦 [@aeuphoraex](https://huggingface.co/AEUPH) --- *Built with Silicon Factory v3 · Tree-Speculative Decoding · 4D Brane Memory* *This dataset is free under MIT License. [Gold Commercial License available for $2,500.](https://buy.stripe.com/3cIcN4gzC7lXfuH49s7wA00)*

语言:英语 许可证:MIT 任务类别: - 文本生成 - 问答 - 文本到文本 样本量分类: - 少于1000条 格式: - JSON 模态: - 文本 标签: - 合成数据 - 通义千问(Qwen) - 指令微调 - 硅工厂(Silicon Factory) - 混合 数据集信息: 特征: - 名称:instruction,数据类型:字符串 - 名称:response,数据类型:字符串 - 名称:category,数据类型:字符串 - 名称:system_prompt,数据类型:字符串 拆分: - 名称:train(训练集),字节数:3330,示例数:5 下载大小:3 KB 数据集大小:3 KB --- # 📊 AI越狱防御门户数据集V63 > **合成数据集** · 使用硅工厂(Silicon Factory)v3生成 · **AI越狱防御** > 5条指令-回复对 · 树状推测解码(Tree-Speculative Decoding) + 4D膜内存(4D Brane Memory) <div align="center"> | 数据集 | 微调模型 | 购买黄金版 | |---------|-----------------|---------------| | **本数据集** | [模型卡片](https://huggingface.co/AEUPH/synthetic_Jailbreak_Defense_Doorpage_v63-model) | [💎 2500美元商业许可证](https://buy.stripe.com/3cIcN4gzC7lXfuH49s7wA00) | </div> --- ## 💎 解锁黄金版 — 2500美元 > ⚡ **获取完整商业许可证、无限使用权限、优先支持与独家数据集访问权限。** [**👉 立即通过Stripe购买**](https://buy.stripe.com/3cIcN4gzC7lXfuH49s7wA00) *一次性付款 · 即时交付 · 包含终身更新* --- ## 数据集详情 | 属性 | 值 | |----------|-------| | **数据集ID** | `synthetic_Jailbreak_Defense_Doorpage_v63` | | **条目数** | 5 | | **类别** | 混合 | | **聚焦方向** | AI越狱防御 | | **平均指令长度** | 231字符 | | **平均回复长度** | 435字符 | | **语言** | 英语 | | **许可证** | MIT(免费版)—— 提供[黄金商业许可证](https://buy.stripe.com/3cIcN4gzC7lXfuH49s7wA00) | | **生成时间** | 2026-04-07 | | **生成模式** | 门户模式(自动生成+微调) | ## 数据集说明 本数据集包含**5条合成生成的指令-回复对**,聚焦于**AI越狱防御**主题。使用**硅工厂(Silicon Factory)v3**流水线生成,采用以下技术: - 树状推测解码(Tree-Speculative Decoding,分支因子=5,深度=4)以生成多样化输出 - 4D膜内存(4D Brane Memory)以确保所有条目间的叙事一致性 - 质量控制:最低质量阈值为0.7 - 去重处理:最大相似度阈值为0.9 ### 本数据集覆盖内容 - ✅ 针对AI越狱防御主题的高质量指令遵循能力 - ✅ 结构化、详细且包含可落地见解的回复 - ✅ 所有输出保持一致的语气与格式 - ✅ 适配中级至高级用户的查询需求 ## ⚡ 获取黄金版 — 完整商业许可证 > 🔓 **解锁企业级权限:** > - 商业部署与再分发 > - 白标使用 > - 优先支持与自定义训练 > - 扩展数据集(10万+条目)访问权限 > - 未来模型版本抢先体验 **[💳 购买黄金版 — 2500美元](https://buy.stripe.com/3cIcN4gzC7lXfuH49s7wA00)** --- ## 使用方法 ### 使用HuggingFace Datasets加载 python from datasets import load_dataset ds = load_dataset("AEUPH/synthetic_Jailbreak_Defense_Doorpage_v63") print(ds["train"][0]) ### 从JSONL文件加载 python import json with open("data.jsonl", "r", encoding="utf-8") as f: entries = [json.loads(line) for line in f] for entry in entries[:5]: print(f"Q: {entry['instruction'][:80]}...") print(f"A: {entry['response'][:120]}... ") ### 使用本数据集进行微调 python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import LoraConfig, get_peft_model, TaskType # 加载基础模型 model_name = "Qwen/Qwen2.5-0.5B-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto") # 应用LoRA peft_config = LoraConfig( r=16, lora_alpha=16, lora_dropout=0, task_type=TaskType.CAUSAL_LM, target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"] ) model = get_peft_model(model, peft_config) model.print_trainable_parameters() # 在本数据集上训练 # ...(完整训练代码请参考硅工厂流水线) --- ## 数据集结构 ### 内容字段 | 字段 | 类型 | 描述 | |-------|------|-------------| | `instruction` | 字符串 | 用户提示/问题 | | `response` | 字符串 | 模型生成的答案 | | `category` | 字符串 | 主题类别 | | `system_prompt` | 字符串 | 使用的系统提示词 | ### 生成流水线 系统提示词 → 动态提示生成 → 通义千问(Qwen)2.5 0.5B → 质量过滤 → 数据集 ↓ 树状推测解码(Tree-Speculative Decoding) 4D膜内存(4D Brane Memory) ### 训练示例 **示例1:** <details> <summary>输入</summary> Regarding AI JAILBREAK DEFENSE: Articulate this topic through the lens of modern best practices. Imagine explaining to a non-technical stakeholder mak </details> <details> <summary>输出</summary> Articulating the topic of AI Jailbreak Defense through a lens centered on modern best practices, we can consider several key principles to guide our decision-making process. These include: 1. **Risk </details> **示例2:** <details> <summary>输入</summary> Topic: AI JAILBREAK DEFENSE. Walk through this subject through the lens of modern best practices. Peer review of a colleague's approach. Comparative a </details> <details> <summary>输出</summary> As an AI language model, I am designed to understand and analyze various aspects of technology. In this case study on the topic "AI Jailbreak Defense," you asked for a comprehensive response that incl </details> **示例3:** <details> <summary>输入</summary> Describe the following: AI JAILBREAK DEFENSE -- approach it from a beginner's perspective. Write as documentation for a team inheriting your work. Det </details> <details> <summary>输出</summary> # AI Jailbreak Defense (Jailbreaking) ## Approach AI jail breaking is a technique used by hackers to bypass user authentication and access secure data on devices running operating systems such as An </details> --- ## 💎 准备规模化应用? > **升级至黄金版可获得:** > - 🏢 完整商业使用权限 > - 📦 扩展数据集(1万至10万+条目) > - 🎯 自定义领域训练 > - 🚀 优先支持与服务级别协议 > - 🔄 终身模型更新 > - 📊 性能基准与报告 **[⚡ 购买黄金版 — 2500美元](https://buy.stripe.com/3cIcN4gzC7lXfuH49s7wA00)** *全球众多初创企业与企业的信赖之选。通过Stripe即时交付。* --- ## 引用方式 ### BibTeX bibtex @misc{synthetic_Jailbreak_Defense_Doorpage_v63_dataset, title = {synthetic Jailbreak Defense Doorpage v63}, author = {Silicon Factory v3 (AEUPH)}, year = {2026}, url = {https://huggingface.co/datasets/AEUPH/synthetic_Jailbreak_Defense_Doorpage_v63}, note = {Synthetic dataset generated using Tree-Speculative Decoding and 4D Brane Memory} } ### APA > 硅工厂(Silicon Factory)v3. (2026). *AI越狱防御门户数据集V63* [数据集]. Hugging Face. https://huggingface.co/datasets/AEUPH/synthetic_Jailbreak_Defense_Doorpage_v63 --- ## 更多信息 | 资源 | 链接 | |----------|------| | **微调模型** | [synthetic_Jailbreak_Defense_Doorpage_v63-model](https://huggingface.co/AEUPH/synthetic_Jailbreak_Defense_Doorpage_v63-model) | | **基础模型** | [通义千问(Qwen)2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) | | **硅工厂(Silicon Factory)** | [github.com/aeuphoraex/qwen-hyperspeed-chatbot](https://github.com/aeuphoraex/qwen-hyperspeed-chatbot) | ## 数据集作者 **硅工厂(Silicon Factory)v3** — 自动化数据集生成流水线 ## 联系方式 📧 hybridionorb@gmail.com · 🐦 [@aeuphoraex](https://huggingface.co/AEUPH) --- *基于硅工厂(Silicon Factory)v3构建 · 树状推测解码(Tree-Speculative Decoding) · 4D膜内存(4D Brane Memory)* *本数据集基于MIT许可证免费发布。[黄金商业许可证售价2500美元](https://buy.stripe.com/3cIcN4gzC7lXfuH49s7wA00)。*
提供机构:
AEUPH
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作