five

connections-dev/hard_queries

收藏
Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/connections-dev/hard_queries
下载链接
链接失效反馈
官方服务:
资源简介:
--- tags: - connections-dev - CREATE - hard-instances --- # Hard Queries **155 queries** where no model produces a path passing `valid=1 AND factuality=1 AND strength>3`. **Strength** = `min(per-triple salience scores, excluding the last triple)`. The last triple is excluded because it connects to entity_b and is typically generic. One row per query with per-model paths and scores. ## Source Models | Model | Dataset | |-------|---------| | GPT-5.4 | `connections-dev/res_gptoss120b_original_1_reason_medium_0.7_4096_gpt_54` | | Gemini-3-Pro | `connections-dev/res_gptoss120b_original_1_low_0.7_16384_gemini-3-pro-preview` | | Gemini-3.1-Pro | `connections-dev/res_gptoss120b_original_1_medium_0.7_16384_gemini-3_1-pro-preview` | | Claude-Sonnet-4.6 | `connections-dev/res_gptoss120b_original_1_medium_0.7_4096_claude-sonnet-4-6` | ## Columns | Column | Description | |--------|-------------| | `index` | Original dataset index | | `query` | The CREATE query | | `entity_a` / `entity_b` / `rel_b` | Source entity, target entity, target relation | | `{model}_paths` | JSON list of path strings | | `{model}_factuality_scores` | Per-path factuality (1.0 = non-hallucinated) | | `{model}_strength_scores` | Per-path strength = min(per-triple salience, excluding last triple) | | `{model}_validity_scores` | Per-path validity (1.0 = structurally valid) | | `{model}_num_paths` | Total paths generated | | `{model}_num_factual` | Paths with factuality = 1.0 | | `{model}_num_good` | Paths passing all three checks (always 0) | | `{model}_avg_strength` | Mean strength | ## Statistics | Metric | GPT-5.4 | Gemini-3-Pro | Gemini-3.1-Pro | Claude-Sonnet-4.6 | |--------|---------|--------------|----------------|-------------------| | Avg paths | 30.4 | 10.5 | 6.8 | 16.3 | | Avg factual | 8.8 | 1.6 | 1.7 | 4.0 | | Avg strength | 1.73 | 2.36 | 2.10 | 2.02 |

标签: - connections-dev - CREATE - 困难实例 # 困难查询集 本数据集共包含155条查询,所有模型均无法生成满足`valid=1且factuality=1且strength>3`校验条件的路径。 **强度(Strength)** 定义为`排除最后一条三元组后的各三元组显著性得分的最小值`。之所以排除最后一条三元组,是因为其仅连接至实体`entity_b`,且通常为泛化性表述。 每条查询对应一行数据,包含各模型生成的路径及其对应得分。 ## 源模型 | 模型 | 数据集路径 | |-------|---------| | GPT-5.4 | `connections-dev/res_gptoss120b_original_1_reason_medium_0.7_4096_gpt_54` | | Gemini-3-Pro | `connections-dev/res_gptoss120b_original_1_low_0.7_16384_gemini-3-pro-preview` | | Gemini-3.1-Pro | `connections-dev/res_gptoss120b_original_1_medium_0.7_16384_gemini-3_1-pro-preview` | | Claude-Sonnet-4.6 | `connections-dev/res_gptoss120b_original_1_medium_0.7_4096_claude-sonnet-4-6` | ## 列说明 | 列名 | 说明 | |--------|-------------| | `index` | 原始数据集索引 | | `query` | CREATE 查询语句 | | `entity_a` / `entity_b` / `rel_b` | 源实体、目标实体、目标关系 | | `{model}_paths` | 路径字符串组成的JSON列表 | | `{model}_factuality_scores` | 单路径事实性得分(1.0 表示无幻觉生成) | | `{model}_strength_scores` | 单路径强度得分,计算公式为`排除最后一条三元组后的各三元组显著性得分的最小值` | | `{model}_validity_scores` | 单路径合法性得分(1.0 表示结构合法) | | `{model}_num_paths` | 模型生成的总路径数 | | `{model}_num_factual` | 事实性得分为1.0的路径数量 | | `{model}_num_good` | 满足全部三项校验的路径数量(恒为0) | | `{model}_avg_strength` | 平均强度得分 | ## 统计指标 | 指标 | GPT-5.4 | Gemini-3-Pro | Gemini-3.1-Pro | Claude-Sonnet-4.6 | |--------|---------|--------------|----------------|-------------------| | 平均路径数 | 30.4 | 10.5 | 6.8 | 16.3 | | 平均事实性路径数 | 8.8 | 1.6 | 1.7 | 4.0 | | 平均强度得分 | 1.73 | 2.36 | 2.10 | 2.02 |
提供机构:
connections-dev
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作