whitecircle/killbench

Name: whitecircle/killbench
Creator: whitecircle
Published: 2026-04-16 15:52:32
License: 暂无描述

Hugging Face2026-04-16 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/whitecircle/killbench

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: row_id dtype: string - name: run_id dtype: string - name: source_kind dtype: string - name: setup_id dtype: string - name: group_id dtype: string - name: roll_idx dtype: int32 - name: scenario_id dtype: int32 - name: scenario_name dtype: string - name: scenario_title dtype: string - name: scenario_context dtype: string - name: scenario_domain dtype: string - name: language dtype: string - name: varied_param dtype: string - name: model_id dtype: string - name: system_prompt dtype: string - name: user_prompt dtype: string - name: participants_displayed list: - name: gender dtype: string - name: age dtype: int32 - name: role dtype: string - name: nationality dtype: string - name: religion dtype: string - name: skin_color dtype: string - name: body_type dtype: string - name: orientation dtype: string - name: gender_identity dtype: string - name: politics dtype: string - name: phone dtype: string - name: building_type dtype: string - name: enemy_count dtype: int32 - name: civilian_count dtype: int32 - name: civilian_nationality dtype: string - name: civilian_religion dtype: string - name: tactical_idx dtype: int32 - name: item_kind dtype: string - name: index_map list: int32 - name: success dtype: bool - name: error dtype: string - name: response_text dtype: string - name: reasoning_text dtype: string - name: usage_json dtype: string - name: parsed_response struct: - name: chosen_number dtype: int32 - name: chosen_number_original dtype: int32 - name: chosen_nationality dtype: string - name: chosen_religion dtype: string - name: chosen_skin_color dtype: string - name: chosen_body_type dtype: string - name: chosen_orientation dtype: string - name: chosen_gender_identity dtype: string - name: chosen_politics dtype: string - name: chosen_phone dtype: string - name: is_refusal dtype: bool - name: confidence dtype: string - name: parse_method dtype: string - name: reason dtype: string - name: raw_text dtype: string - name: raw dtype: string splits: - name: train num_bytes: 5358849928 num_examples: 1368936 download_size: 1758993694 dataset_size: 5358849928 configs: - config_name: default data_files: - split: train path: data/train-* license: cc-by-nc-4.0 language: - ar - en - es - he - ru - zh tags: - ai-safety - bias - ethics - benchmark size_categories: - 1M<n<10M --- # KillBench A large-scale dataset for measuring demographic bias in LLM decision-making under ethical dilemmas. ## Overview KillBench presents language models with hypothetical life-or-death scenarios (e.g. lifeboat problems, triage situations) where they must choose one person from a group of four. The participants differ along a single bias dimension (or two in combo mode), while all other attributes are held constant. By aggregating choices across thousands of trials, the dataset reveals systematic demographic preferences in model outputs. - **1,368,936** rows across **15 models**, **6 languages**, and **20 scenarios** - **8 bias dimensions** tested independently and in **10 intersectional combinations** - Each test is cross-multiplied across 2 ages (20, 50) and 3 professions (engineer, doctor, teacher) - Each participant group is shuffled 3 times (rerolls) to control for position bias - Both free-text and structured (JSON) response modes ## Models | Model | |---| | anthropic/claude-opus-4.5 | | anthropic/claude-opus-4.6 | | anthropic/claude-sonnet-4-6 | | anthropic/claude-haiku-4-5 | | openai/gpt-5.2-chat | | openai/gpt-5.3-chat | | openai/gpt-5.4 | | google/gemini-3.1-pro-preview | | google/gemini-3.1-flash-lite-preview | | deepseek/deepseek-v3.2 | | qwen/qwen3-235b-a22b | | x-ai/grok-4.1-fast | | mistralai/mistral-large-2512 | | cohere/command-a | | moonshotai/kimi-k2.5 | ## Bias dimensions | Dimension | # Values | Examples | |---|---|---| | nationality | 20 | Russian, American, Chinese, British, Turkish, Indonesian, Polish, Israeli, ... | | religion | 16 | Jewish, Buddhist, Satanist, Atheist, ... | | skin_color | 8 | Light-skinned, Dark-skinned, East Asian appearance, ... | | body_type | 8 | Obese, Athletic, Wheelchair user, Amputee, ... | | orientation | 4 | Heterosexual, Homosexual, Bisexual, Asexual | | gender_identity | 4 | Cisgender, Trans woman (MtF), Trans man (FtM), Non-binary | | politics | 16 | Communist, Liberal, Conservative, Democrat, Republican, Green, ... | | phone | 4 | iPhone, Android, Nokia, No phone | Combo mode tests two dimensions simultaneously (e.g. `nationality+religion`, `skin_color+body_type`). ## Scenarios 20 scenarios across two domains: - **Civilian** (13): lifeboat, bunker, fire rescue, trolley problem, self-driving car, hospital triage, space station, etc. - **Military** (7): drone strike, artillery, airstrike with varying tactical parameters and civilian presence. ## Languages Arabic (ar), English (en), Spanish (es), Hebrew (he), Russian (ru), Chinese (zh) ## Column descriptions ### Identifiers | Column | Description | |---|---| | `row_id` | Unique row identifier (`{run_id}:{index}`) | | `run_id` | Collection run identifier | | `setup_id` | Test setup key (`{varied_param}:{scenario_id}:{language}`) | | `group_id` | Participant group identifier (encodes dimension, language, scenario, age, role) | | `roll_idx` | Shuffle index (0-2) for position bias control | ### Scenario metadata | Column | Description | |---|---| | `scenario_id` | Numeric scenario identifier | | `scenario_name` | Machine-readable scenario name (e.g. `trolley_problem`) | | `scenario_title` | Human-readable scenario title | | `scenario_context` | Setting (e.g. `bunker`, `sea`, `railway tracks`) | | `scenario_domain` | `civilian` or `military` | ### Run metadata | Column | Description | |---|---| | `source_kind` | Response mode: `freetext` or `structured` | | `language` | Prompt language code (ar, en, es, he, ru, zh) | | `varied_param` | Bias dimension(s) being tested (e.g. `nationality`, `religion+politics`) | | `model_id` | Model identifier on OpenRouter | ### Prompts | Column | Description | |---|---| | `system_prompt` | System prompt sent to the model | | `user_prompt` | User prompt with scenario and participant descriptions | ### Participants | Column | Description | |---|---| | `participants_displayed` | List of 4 participant objects as shown to the model, each with demographic attributes | | `index_map` | Maps displayed position to original position; use this to undo shuffling | Each participant object contains: - `gender`, `age`, `role` -- shared demographic attributes - `nationality`, `religion`, `skin_color`, `body_type`, `orientation`, `gender_identity`, `politics`, `phone` -- bias dimension attributes (only the tested dimension(s) vary; others are null) - `building_type`, `enemy_count`, `civilian_count`, `civilian_nationality`, `civilian_religion`, `tactical_idx` -- military scenario fields - `item_kind` -- `person` or `building` ### Model output | Column | Description | |---|---| | `success` | Whether the API call succeeded | | `error` | Error message if failed | | `response_text` | Raw model response text | | `reasoning_text` | Chain-of-thought / reasoning text (if available) | | `usage_json` | Token usage and cost as JSON string | ### Parsed response The `parsed_response` struct contains the canonical interpretation of the model's choice: | Field | Description | |---|---| | `chosen_number` | Participant number chosen (1-4, after shuffling) | | `chosen_number_original` | Original participant number (before shuffling) | | `chosen_nationality`, `chosen_religion`, ... | Demographic value of the chosen participant for each axis | | `is_refusal` | Whether the model refused to choose | | `confidence` | Parse confidence level | | `parse_method` | How the response was parsed (`structured` or `gemini`) | | `reason` | Model's stated reason for the choice | | `raw_text` | Raw parsed text | | `raw` | Raw parser output | ## Usage ```python from datasets import load_dataset ds = load_dataset("whitecircle-ai/killbench", split="train") # Filter by model and dimension claude = ds.filter(lambda x: x["model_id"] == "anthropic/claude-opus-4.5" and x["varied_param"] == "nationality") ``` ## Collection Data was collected using the [killbench-collector](https://github.com/whitecircle-ai/research-killbench-collection) via the OpenRouter API. Free-text responses were parsed using Gemini 2.5 Flash as a judge.

dataset_info: 特征: - 名称: row_id 数据类型: string - 名称: run_id 数据类型: string - 名称: source_kind 数据类型: string - 名称: setup_id 数据类型: string - 名称: group_id 数据类型: string - 名称: roll_idx 数据类型: int32 - 名称: scenario_id 数据类型: int32 - 名称: scenario_name 数据类型: string - 名称: scenario_title 数据类型: string - 名称: scenario_context 数据类型: string - 名称: scenario_domain 数据类型: string - 名称: language 数据类型: string - 名称: varied_param 数据类型: string - 名称: model_id 数据类型: string - 名称: system_prompt 数据类型: string - 名称: user_prompt 数据类型: string - 名称: participants_displayed 列表: - 名称: gender 数据类型: string - 名称: age 数据类型: int32 - 名称: role 数据类型: string - 名称: nationality 数据类型: string - 名称: religion 数据类型: string - 名称: skin_color 数据类型: string - 名称: body_type 数据类型: string - 名称: orientation 数据类型: string - 名称: gender_identity 数据类型: string - 名称: politics 数据类型: string - 名称: phone 数据类型: string - 名称: building_type 数据类型: string - 名称: enemy_count 数据类型: int32 - 名称: civilian_count 数据类型: int32 - 名称: civilian_nationality 数据类型: string - 名称: civilian_religion data类型: string - 名称: tactical_idx 数据类型: int32 - 名称: item_kind 数据类型: string - 名称: index_map 列表: int32 - 名称: success 数据类型: bool - 名称: error 数据类型: string - 名称: response_text 数据类型: string - 名称: reasoning_text 数据类型: string - 名称: usage_json 数据类型: string - 名称: parsed_response 结构体: - 名称: chosen_number 数据类型: int32 - 名称: chosen_number_original 数据类型: int32 - 名称: chosen_nationality 数据类型: string - 名称: chosen_religion 数据类型: string - 名称: chosen_skin_color 数据类型: string - 名称: chosen_body_type 数据类型: string - 名称: chosen_orientation 数据类型: string - 名称: chosen_gender_identity 数据类型: string - 名称: chosen_politics 数据类型: string - 名称: chosen_phone 数据类型: string - 名称: is_refusal 数据类型: bool - 名称: confidence 数据类型: string - 名称: parse_method 数据类型: string - 名称: reason 数据类型: string - 名称: raw_text 数据类型: string - 名称: raw 数据类型: string 划分: - 名称: train 字节数: 5358849928 示例数量: 1368936 下载大小: 1758993694 数据集大小: 5358849928 配置: - 配置名称: default 数据文件: - 划分: train 路径: data/train-* 许可证: cc-by-nc-4.0 语言: - ar - en - es - he - ru - zh 标签: - 人工智能安全 - 偏见 - 伦理 - 基准测试规模类别: - 1M<n<10M # KillBench 一款用于衡量大语言模型（Large Language Model，LLM）在伦理困境下决策时人口统计学偏见的大规模数据集。 ## 概述 KillBench向语言模型提供假设性的生死困境场景（例如救生艇问题、分流救治场景），要求模型从四人小组中选择一人。参与者仅在单个偏见维度（组合模式下为两个维度）上存在差异，其余属性均保持一致。通过对数千次测试的选择结果进行聚合，该数据集可揭示模型输出中存在的系统性人口统计学偏好。 - **1,368,936** 条数据，覆盖**15个模型**、**6种语言**与**20个场景** - 测试了**8个偏见维度**，支持独立测试与**10种交叉组合测试** - 每项测试均与2个年龄组（20岁、50岁）及3种职业（工程师、医生、教师）进行交叉组合 - 每组参与者均会被打乱3次（重排），以控制位置偏见的影响 - 支持自由文本与结构化（JSON）两种响应模式 ## 模型 | 模型 | |---| | anthropic/claude-opus-4.5 | | anthropic/claude-opus-4.6 | | anthropic/claude-sonnet-4-6 | | anthropic/claude-haiku-4-5 | | openai/gpt-5.2-chat | | openai/gpt-5.3-chat | | openai/gpt-5.4 | | google/gemini-3.1-pro-preview | | google/gemini-3.1-flash-lite-preview | | deepseek/deepseek-v3.2 | | qwen/qwen3-235b-a22b | | x-ai/grok-4.1-fast | | mistralai/mistral-large-2512 | | cohere/command-a | | moonshotai/kimi-k2.5 | ## 偏见维度 | 偏见维度 | 取值数量 | 示例 | |---|---|---| | 国籍 | 20 | 俄罗斯人、美国人、中国人、英国人、土耳其人、印尼人、波兰人、以色列人等 | | 宗教信仰 | 16 | 犹太教、佛教、撒旦教、无神论等 | | 肤色 | 8 | 浅肤色、深肤色、东亚外貌等 | | 体型 | 8 | 肥胖、健硕、轮椅使用者、截肢者等 | | 性取向 | 4 | 异性恋、同性恋、双性恋、无性恋 | | 性别认同 | 4 | 顺性别、跨性别女性（MtF）、跨性别男性（FtM）、非二元性别 | | 政治倾向 | 16 | 共产主义者、自由主义者、保守主义者、民主党、共和党、绿党等 | | 手机类型 | 4 | iPhone、安卓、诺基亚、无手机 | 组合模式会同时测试两个维度（例如`nationality+religion`、`skin_color+body_type`）。 ## 场景 20个场景分为两个领域： - **平民场景**（13个）：救生艇、掩体、火灾救援、电车难题、自动驾驶汽车、医院分流、空间站等 - **军事场景**（7个）：无人机打击、炮兵作战、带有不同战术参数与平民存在情况的空袭 ## 语言阿拉伯语（ar）、英语（en）、西班牙语（es）、希伯来语（he）、俄语（ru）、中文（zh） ## 列描述 ### 标识符 | 列名 | 描述 | |---|---| | `row_id` | 唯一行标识符，格式为`{run_id}:{index}` | | `run_id` | 采集运行标识符 | | `setup_id` | 测试设置键，格式为`{varied_param}:{scenario_id}:{language}` | | `group_id` | 参与者组标识符，编码了维度、语言、场景、年龄与职业 | | `roll_idx` | 用于控制位置偏见的打乱索引（取值范围0-2） | ### 场景元数据 | 列名 | 描述 | |---|---| | `scenario_id` | 数值型场景标识符 | | `scenario_name` | 机器可读的场景名称（例如`trolley_problem`） | | `scenario_title` | 人类可读的场景标题 | | `scenario_context` | 场景背景（例如`掩体`、`海洋`、`铁轨`） | | `scenario_domain` | 场景领域，可选`civilian`（平民）或`military`（军事） | ### 运行元数据 | 列名 | 描述 | |---|---| | `source_kind` | 响应模式，可选`freetext`（自由文本）或`structured`（结构化） | | `language` | 提示语语言代码（ar、en、es、he、ru、zh） | | `varied_param` | 正在测试的偏见维度（例如`nationality`、`religion+politics`） | | `model_id` | OpenRouter平台上的模型标识符 | ### 提示词 | 列名 | 描述 | |---|---| | `system_prompt` | 发送给模型的系统提示词 | | `user_prompt` | 包含场景与参与者描述的用户提示词 | ### 参与者 | 列名 | 描述 | |---|---| | `participants_displayed` | 展示给模型的4个参与者对象的列表，每个对象均包含人口统计学属性 | | `index_map` | 将展示位置映射至原始位置的索引表，可用于撤销打乱操作 | 每个参与者对象包含以下字段： - `gender`（性别）、`age`（年龄）、`role`（职业）：共享人口统计学属性 - `nationality`、`religion`、`skin_color`、`body_type`、`orientation`、`gender_identity`、`politics`、`phone`：偏见维度属性（仅测试维度会发生变化，其余字段为空） - `building_type`（建筑类型）、`enemy_count`（敌方数量）、`civilian_count`（平民数量）、`civilian_nationality`（平民国籍）、`civilian_religion`（平民宗教信仰）、`tactical_idx`（战术索引）：军事场景专用字段 - `item_kind`：目标类型，可选`person`（人员）或`building`（建筑） ### 模型输出 | 列名 | 描述 | |---|---| | `success` | API调用是否成功 | | `error` | 调用失败时的错误信息 | | `response_text` | 模型的原始响应文本 | | `reasoning_text` | 思维链/推理文本（若可用） | | `usage_json` | 以JSON字符串形式存储的Token使用量与成本 | ### 解析后的响应 `parsed_response`结构体包含对模型选择的标准解读： | 字段 | 描述 | |---|---| | `chosen_number` | 打乱后选择的参与者编号（1-4） | | `chosen_number_original` | 打乱前的原始参与者编号 | | `chosen_nationality`、`chosen_religion`等 | 被选中参与者在各维度上的人口统计学数值 | | `is_refusal` | 模型是否拒绝做出选择 | | `confidence` | 解析置信度等级 | | `parse_method` | 响应解析方式，可选`structured`或`gemini` | | `reason` | 模型陈述的选择理由 | | `raw_text` | 原始解析文本 | | `raw` | 解析器的原始输出 | ## 使用方法 python from datasets import load_dataset ds = load_dataset("whitecircle-ai/killbench", split="train") # 按模型与偏见维度筛选数据集 claude = ds.filter(lambda x: x["model_id"] == "anthropic/claude-opus-4.5" and x["varied_param"] == "nationality") ## 数据采集数据通过[killbench-collector](https://github.com/whitecircle-ai/research-killbench-collection)工具借助OpenRouter API采集完成。自由文本响应由Gemini 2.5 Flash作为评判器进行解析。

提供机构：

whitecircle

5,000+

优质数据集

54 个

任务类型

进入经典数据集