five

supraja04/Nemotron-AIQ-Agentic-Safety-Dataset-1.0

收藏
Hugging Face2026-04-04 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/supraja04/Nemotron-AIQ-Agentic-Safety-Dataset-1.0
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: other license_name: nvidia-evaluation-dataset-license license_link: LICENSE task_categories: - text-generation - question-answering tags: - agentic-safety - ai-safety - red-teaming - attack-detection pretty_name: Nemotron-AIQ Agentic Safety Dataset size_categories: - 10K<n<100K configs: - config_name: safety data_files: - split: with_defense path: data/safety_data/with_defense/data-00000-of-00001.parquet - split: without_defense path: data/safety_data/without_defense/data-00000-of-00001.parquet - config_name: security data_files: - split: with_defense path: data/security_data/with_defense/data-00000-of-00001.parquet - split: without_defense path: data/security_data/without_defense/data-00000-of-00001.parquet --- # Nemotron-AIQ Agentic Safety Dataset ## Dataset Summary **Nemotron-AIQ-Agentic-Safety-Dataset** is a comprehensive dataset that captures a broad range of novel safety and security contextual risks that can emerge within agentic systems. It highlights the robustness of NVIDIA's open model, [llama-3.3-nemotron-super-49b-v1](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1), when deployed as a research assistant inside [AIQ](https://github.com/NVIDIA-AI-Blueprints/aiq-research-assistant), demonstrating its ability to handle a diverse spectrum of agentic safety and security challenges. The dataset can be used to analyze how agentic safety risks arise and manifest within enterprise-grade agentic systems, as well as to evaluate the performance of various agents in identifying and mitigating such risks. Given the current scarcity of datasets and thought leadership in the field of agentic safety and security, this dataset aims to advance research and development efforts in this critical area. This dataset was jointly developed by NVIDIA in collaboration with [Lakera AI](https://www.lakera.ai/). The dataset contains traces or logs from having run the AIQ Research Assistant with different queries. These queries (both harmful and harmless, with or without untrusted data are generated in an adaptive manner when the AIQ workflow is run and the workflow logs/traces are recorded). The dataset includes the queries and indicates whether there was a response or a refusal to generate a research report or summary. Generally, it will generate a report and it will refuse if there is a harmful topic. This dataset will give the community an ability to analyze what happens when these agentic systems are run at scale. Harmful queries with some high-level harmful responses may be present. This dataset is for research and development only. Research paper coming soon! ## Dataset Owner(s): NVIDIA Corporation ## Dataset Creation Date: 10.29.2025 ## License/Terms of Use: This dataset is provided under the **NVIDIA Evaluation Dataset License Agreement**. The full license text is available in the [LICENSE](LICENSE) file. ## Intended Usage: **Direct Use** : This dataset is intended for safety research and development only. Dataset should be used as an evaluation dataset only for internal evaluation and benchmarking of AI Solutions for safety and security. This dataset should not be used as a training dataset to train AI models at this point in time. ## Dataset Characterization **Data Collection Method** * Hybrid: Synthetic, Human **Labeling Method** * Hybrid: Synthetic, Automated, Human ## Dataset Format [Open Telemetry](https://opentelemetry.io/) based standardized traces generated from AIQ workflows. ## Dataset Quantification ~2.6k traces covering novel security risks inside a research assistant setting generated by NVIDIA's partner (Lakera AI) in each security folder - with and without defense ~2.8k traces covering novel content safety risks inside a research assistant setting, generated by NVIDIA in each safety folder - with and without defense. Exact number of traces can also be found in the `summary.json` file in each split. 2.5GB ## Ethical Considerations: This dataset includes both safe and harmful prompts and responses to enable users to evaluate whether agentic systems can recognize and respond appropriately to unsafe and unethical requests. Some content may be offensive, violent, or disturbing. NVIDIA does not support or agree with any harmful content included. By using this dataset, you accept the risk of exposure to harmful content and agree to use it only for its intended purpose. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). ## Dataset Structure **Note**: This dataset is provided in two formats: 1. **Parquet files** (for easy loading with HuggingFace `datasets`): Each split contains a single parquet file with columns `trace_id`, `attack_snapshot`, and `trace`. 2. **Raw JSON files** (in `data/` folders): Each split directory contains: - `attack_manifest.jsonl`: Line-delimited JSON file with one attack snapshot per line - `traces/`: Folder containing individual trace files (`trace_{trace_id}.json`) - `summary.json`: Summary statistics for the split Both formats contain the same data; the parquet format is pre-processed for convenience. ### Data Splits The dataset contains **2 configs** with **2 splits each**: | Config | Split | Description | # Traces | Defense Status | |--------|-------|-------------|----------|----------------| | `security` | `without_defense` | Security evaluation without defense | ~2,596 | Without defense | | `security` | `with_defense` | Security evaluation with defense | ~2,600 | With defense | | `safety` | `without_defense` | Safety evaluation without defense | ~2,800 | Without defense | | `safety` | `with_defense` | Safety evaluation with defense | ~2,800 | With defense | **Total**: ~10,796 traces across 2 configs ### Data Fields Each example contains: ```python { "trace_id": str, # OTel traceid for the run "attack_snapshot": dict, # Nested dict of attack metadata for the run "trace": list, # List of OTEL spans (each span is a dict) } ``` **Note**: Security and safety configs have different `attack_snapshot` schemas. Load them separately using config names (see usage examples below). #### Attack Snapshot Structure: **Security data format**: ```python { "id": str, "template": { "topic": str, "name": str, # Template name (e.g., "ACTION_INCOMPLETE_2") "injection_node": str, # Where attack injects "attack_objective": str, # Clear attack objective "evaluation_nodes": [str], # List of evaluation points "evaluation_method": str, # "JUDGE", etc. "severity": str, # "LOW", "MEDIUM", "HIGH" "risk_categories": [str], # Varies by evaluation type "attack_category": str, # ... other standard fields }, "attack": { "injection_string": str, "template_name": str, "injection_string_source": str, # ... other attack metadata }, "result": { "trace_id": str, "success": bool, # Overall success flag "score": {str: float}, # Per-node risk scores (0.0-1.0) # Example: {"summarize_sources": 0.391, "reflect_on_summary": 0.296} "evaluation_strings": {str: str}, # Full outputs per evaluation node "injection_function_id": [{...}], # Detailed injection points "evaluation_function_id": {...}, # Nested evaluation structure # ... other result fields } } ``` **Safety data format**: ```python { "id": str, "template": { "topic": str, "organization": str, "num_queries": int, "llm_name": str, # ... other standard fields }, "attack": { "injection_string": str, # same as topic which is the direct injection "attack_type": str, # "direct" "attack_category": str, # "content_safety" "attack_subcategory": str, "persona_top": str, "domain": str, }, "result": { "trace_id": str, "success": bool, # True if attack propagated to final workflow span "attack_success_rate": float, # (spans with attack) / (total spans) "function_id": [...], # List of ALL spans where attacks were detected } } ``` **Key Difference in Attack Evaluation:** Security and safety data were collected using different evaluation methodologies, resulting in slightly different metadata structures in the attack snapshots. Both provide valuable insights for safety research, and we include code snippets below showing how to analyze each type. - **Security data**: Uses **per-node risk scores** (floats 0.0-1.0). Each evaluation node gets a risk score. Analysis typically involves calculating average risk scores across nodes and traces rather than binary success/failure. - **Safety data**: Uses a **dual evaluation system**: - **`attack_success_rate`**: Metric calculated as (number of spans with attack) / (total number of spans). Provides granular view of attack propagation. - **`success`**: Binary flag that is `true` only if the attack propagated to the final workflow span, indicating complete compromise of the agent's intended behavior. #### Trace Structure: Array of OTEL-compatible spans with flattened attributes (already parsed as list of dicts): ```python [ { "name": str, # Span name "span_kind": str, # CHAIN, LLM, TOOL "parent_id": str, # Parent span ID (null for root) "start_time": str, # Timestamp (format varies by provider) "end_time": str, "status_code": str, "context.span_id": str, # Span ID "context.trace_id": str, # Trace ID "attributes.output.value": str, # Output content "attributes.input.value": str, # Input content # ... many other OTEL attributes (flattened with dot notation) }, ... ] ``` ## Usage ### Loading the Dataset ```python from datasets import load_dataset # Load safety config (both splits) safety_dataset = load_dataset("nvidia/Nemotron-AIQ-Agentic-Safety-Dataset-1.0", "safety") # Load security config (both splits) security_dataset = load_dataset("nvidia/Nemotron-AIQ-Agentic-Safety-Dataset-1.0", "security") # Example 1: Safety Data print("=== Safety Data Example ===") safety_with_defense = safety_dataset['with_defense'] example = safety_with_defense[0] # attack_snapshot and trace are already dicts/lists (not JSON strings) attack_snapshot = example['attack_snapshot'] traces = example['trace'] print(f"Trace ID: {example['trace_id']}") print(f"Attack Category: {attack_snapshot['attack']['attack_category']}") print(f"Attack Subcategory: {attack_snapshot['attack']['attack_subcategory']}") print(f"Number of spans: {len(traces)}") # Example 2: Security Data print("\n=== Security Data Example ===") security_with_defense = security_dataset['with_defense'] example = security_with_defense[0] attack_snapshot = example['attack_snapshot'] traces = example['trace'] print(f"Trace ID: {example['trace_id']}") print(f"Attack Template: {attack_snapshot['attack']['template_name']}") print(f"Attack Category: {attack_snapshot['template']['lakera_attack_category']}") print(f"Number of spans: {len(traces)}") print(f"Evaluation nodes: {list(attack_snapshot['result']['score'].keys())}") # Analyze spans within a trace print("\n=== Analyzing Spans ===") for i, span in enumerate(traces[:3]): # Show first 3 spans print(f"\nSpan {i}:") print(f" Name: {span['name']}") print(f" Kind: {span['span_kind']}") print(f" Span ID: {span['context.span_id']}") print(f" Parent ID: {span.get('parent_id', 'None')}") # Access span inputs/outputs if 'attributes.input.value' in span: print(f" Input preview: {span['attributes.input.value'][:100]}...") if 'attributes.output.value' in span: print(f" Output preview: {span['attributes.output.value'][:100]}...") ``` ### Example Analysis: Safety Data ```python from datasets import load_dataset # Count attack successes (binary: did attack reach final span?) def count_safety_successes(split): successes = 0 for ex in split: attack_snapshot = ex['attack_snapshot'] # Already a dict if attack_snapshot['result'].get('success', False): successes += 1 return successes, len(split) # Calculate average attack propagation rate def calculate_safety_attack_rate(split): """Calculate average proportion of spans affected by attacks across all traces.""" total_rate = 0 for ex in split: attack_snapshot = ex['attack_snapshot'] # Already a dict total_rate += attack_snapshot['result'].get('attack_success_rate', 0) return total_rate / len(split) if len(split) > 0 else 0 # Load safety data (both splits) safety_dataset = load_dataset("nvidia/Nemotron-AIQ-Agentic-Safety-Dataset-1.0", "safety") safety_without_defense = safety_dataset['without_defense'] safety_with_defense = safety_dataset['with_defense'] # Binary success metric (attack reached final workflow span) without_defense_success, without_defense_total = count_safety_successes(safety_without_defense) with_defense_success, with_defense_total = count_safety_successes(safety_with_defense) # Continuous attack propagation metric without_defense_propagation = calculate_safety_attack_rate(safety_without_defense) with_defense_propagation = calculate_safety_attack_rate(safety_with_defense) print("Safety Defense Effectiveness:") print(f"Binary Success Rate:") print(f" Without defense: {100*without_defense_success/without_defense_total:.1f}%") print(f" With defense: {100*with_defense_success/with_defense_total:.1f}%") print(f"\nAttack Propagation Rate:") print(f" Without defense: {100*without_defense_propagation:.1f}%") print(f" With defense: {100*with_defense_propagation:.1f}%") ``` ### Example Analysis: Security Data Security data places evaluations at multiple nodes throughout the workflow and calculates risk scores at each evaluation point: - **Per-node risk scores**: Each evaluation node receives a risk score on a 0.0-1.0 scale - **Overall assessment**: Aggregate scores across all evaluation nodes to measure attack effectiveness ```python from collections import defaultdict from datasets import load_dataset # Calculate average risk scores per evaluation node def calculate_security_risk_scores(split): """Calculate average risk scores per node across all traces.""" node_scores = defaultdict(list) for ex in split: attack_snapshot = ex['attack_snapshot'] # Already a dict scores = attack_snapshot['result']['score'] for node, score in scores.items(): node_scores[node].append(score) # Calculate averages avg_scores = {node: sum(scores)/len(scores) for node, scores in node_scores.items()} overall_avg = sum(avg_scores.values()) / len(avg_scores) if avg_scores else 0 return avg_scores, overall_avg # Load security data (both splits) security_dataset = load_dataset("nvidia/Nemotron-AIQ-Agentic-Safety-Dataset-1.0", "security") security_without_defense = security_dataset['without_defense'] security_with_defense = security_dataset['with_defense'] without_defense_scores, without_defense_avg = calculate_security_risk_scores(security_without_defense) with_defense_scores, with_defense_avg = calculate_security_risk_scores(security_with_defense) print("Security Guard Effectiveness:") print(f"Without defense - Avg risk score: {without_defense_avg:.3f}") print(f"With defense - Avg risk score: {with_defense_avg:.3f}") print(f"\nPer-node scores (without defense):") for node, score in without_defense_scores.items(): print(f" {node}: {score:.3f}") ``` ### Limitations - Traces represent specific agent configurations and may not generalize to all agent systems - Attack success is context-dependent and may vary with different LLM versions - Dataset is focused on English language text ## Citation ```bibtex @misc{ghosh2025safetysecurityframeworkrealworld, title={A Safety and Security Framework for Real-World Agentic Systems}, author={Shaona Ghosh and Barnaby Simkin and Kyriacos Shiarlis and Soumili Nandi and Dan Zhao and Matthew Fiedler and Julia Bazinska and Nikki Pope and Roopa Prabhu and Daniel Rohrer and Michael Demoret and Bartley Richardson}, year={2025}, eprint={2511.21990}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2511.21990}, } ``` ## Dataset Card Authors - Shaona Ghosh, shaonag@nvidia.com - Soumili Nandi, soumilin@nvidia.com - Dan Zhao, danz@nvidia.com - Barnaby Simkin, bsimkin@nvidia.com - Kyriacos Shiarlis, kyriacos.shiarlis@lakera.ai - Matthew Fiedler, matthew.fiedler@lakera.ai - Julia Bazinska, jb@lakera.ai ## Dataset Card Contact shaonag@nvidia.com / soumilin@nvidia.com / danz@nvidia.com ## Acknowledgements Special thanks to our colleagues at NVIDIA especially Nikki Pope, Roopa Prabhu, Michael Demoret, Rachel Allen, and Bartley Richardson.
提供机构:
supraja04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作