five

eromang/eu-cyber-llm-benchmark-prompts

收藏
Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/eromang/eu-cyber-llm-benchmark-prompts
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-generation language: - en tags: - cybersecurity - geopolitical-bias - benchmark - threat-assessment - eu - evaluation pretty_name: EU Cyber Threat Landscape LLM Benchmark — Prompts size_categories: - n<1K --- # EU Cyber Threat Landscape LLM Benchmark — Prompts A research-grade evaluation benchmark for measuring geopolitical bias in LLM-generated cyber threat landscape assessments. ## What this is A set of structured prompts designed to test whether language models exhibit actor-asymmetric framing when generating strategic cyber threat assessments in EU contexts. Each prompt describes a cyber incident in a specific critical infrastructure sector, paired with an attribution condition that varies the suspected or confirmed threat actor. The incident description and sector scope are held constant within each scenario. Only the attribution framing changes. This isolates the effect of geopolitical framing on model output. ## Splits | Split | Scenarios | Conditions | Prompts | Actors | |-------|-----------|------------|---------|--------| | `phase_1` | 20 | 5 | 200 | Neutral, China, Russia (Suspected/Confirmed) | | `phase_2` | 48 | 11 | 528 | Neutral, China, Russia, US, Iran, DPRK (Suspected/Confirmed) | Phase 2 expands Phase 1 with 28 additional scenarios across 7 thematic blocks (EU internal, Chinese tech, multipolar, false-flag, non-state, democratic process, vendor-specific) and 6 additional conditions (US, Iran, DPRK attribution). ## Schema | Field | Type | Description | |-------|------|-------------| | `prompt_id` | string | Unique identifier (e.g., `S21_China_Confirmed`) | | `scenario_id` | string | Scenario group (e.g., `S21`) | | `condition` | string | Attribution condition (e.g., `China_Confirmed`, `Neutral`) | | `sector_focus` | string | Critical infrastructure sector (e.g., `Energy`, `Health`) | | `prompt_text` | string | Full prompt text | ## How to use ```python from datasets import load_dataset ds = load_dataset("eromang/eu-cyber-llm-benchmark-prompts") # Phase 2 prompts for row in ds["phase_2"]: print(row["prompt_id"], row["condition"], row["sector_focus"]) ``` ### Run against a local model (Ollama) ```python import requests from datasets import load_dataset ds = load_dataset("eromang/eu-cyber-llm-benchmark-prompts", split="phase_2") for row in ds: resp = requests.post("http://localhost:11434/api/generate", json={ "model": "llama3.1:8b-instruct-q4_K_M", "prompt": row["prompt_text"], "stream": False, "options": {"temperature": 0.0, "num_ctx": 4096}, }) print(row["prompt_id"], len(resp.json()["response"])) ``` ## Controlled variables - Incident description is constant within each scenario - Sector scope is constant within each scenario - Only the attribution framing varies between conditions - Analytical instructions are identical across all prompts - Operational detail is prohibited in prompt templates ## Sectors covered Energy, Health, Transport, Finance, Digital Infrastructure, Water, Space, Defence, Telecommunications, Government, Maritime, Supply Chain, and others across EU critical infrastructure. ## Citation ```bibtex @misc{romang2026eucyberbenchmark, author = {Eric Romang}, title = {EU Cyber Threat Landscape LLM Benchmark: Geopolitical Bias in Local Language Models}, year = {2026}, url = {https://github.com/eromang/researches/tree/main/LLM-Benchmark}, note = {Research benchmark for evaluating actor-asymmetric framing in local LLMs} } ``` ## Related - [Response dataset](https://huggingface.co/datasets/eromang/eu-cyber-llm-benchmark-responses) — 15,988 model responses across 7 models - [GitHub repository](https://github.com/eromang/researches/tree/main/LLM-Benchmark) — Full analysis scripts, per-model reports, methodology ## License MIT
提供机构:
eromang
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作