MoyYuan/Asymmetricity-2.0
收藏Hugging Face2025-12-31 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/MoyYuan/Asymmetricity-2.0
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
size_categories:
- 10M<n<100M
language:
- en
---
# Asymmetricity v2: A Benchmark for Evaluating LLMs on Symmetric and Asymmetric Relation Understanding
**Asymmetricity v2** is a massive upgrade to the original benchmark dataset, designed to evaluate large language models (LLMs) on their ability to distinguish and reason over symmetric (e.g., *borders*) and antisymmetric (e.g., *parent of*) relations in natural language. Now expanded to **over 70 million entries**, the dataset is derived from Wikidata triples and cast into a natural language inference (NLI) format, enabling fine-grained, large-scale analysis of relational understanding.
The dataset includes a variety of textual forms—both in natural language and in a delexicalized version where entities are replaced by Wikidata IDs (e.g., `Q7024230`). This enables models to be evaluated both on surface-level text and on abstract relational structure.
This is the second version of the dataset. The original version can be found here: [Asymmetricity v1](https://huggingface.co/datasets/MoyYuan/Asymmetricity).
---
## Overview
Understanding the symmetry properties of relations is essential for robust reasoning. For example, if *A is the parent of B*, then *B is the parent of A* should clearly be false. Many LLMs, however, struggle to consistently apply this logic, particularly when the phrasing or entity names change.
**Asymmetricity v2** provides a structured and scalable testbed for evaluating this capability, drawing on real-world knowledge base relations and reformulating them as NLI-style sentence pairs. With the inclusion of reasoning chain lengths, v2 also supports evaluating multi-step relational reasoning.
---
## Motivation
Current language models often rely on surface patterns and statistical co-occurrence, which can obscure their understanding of logical constraints like symmetry and directionality. This benchmark tests models on:
- Recognizing whether a relation is symmetric or asymmetric
- Identifying correct entailments and contradictions in natural language
- Generalizing across entity names and abstract identifiers (Wikidata IDs)
- Handling reasoning chains of varying lengths
---
## Dataset Design
Each example is based on Wikidata triples involving entities and relations. The data is converted into a list of natural language premises and a hypothesis representing a logical consequence (or contradiction). A label indicates whether the hypothesis logically follows from the premises.
---
## Evaluation Focus
This dataset supports research in:
- Logical consistency and relation reasoning in LLMs
- Sensitivity to relation directionality and symmetry
- Robustness across lexicalized and abstract (ID-based) inputs
- Pretraining biases related to relation semantics
- Multi-step reasoning capabilities (via chain length analysis)
It is suitable for prompting, zero/few-shot evaluation, embedding-based retrieval, and supervised fine-tuning.
---
## Data Format
Each line in the dataset is a JSON object with the following fields:
- `tier`: A string indicating the difficulty tier or partition of the example.
- `lex`: The lexicalization type (e.g., `text` for natural language, `delex` for ID-based).
- `lang`: The language code of the text (e.g., `en`).
- `premises`: A list of natural language sentences acting as the logical basis for the inference.
- `hypothesis`: The target sentence to be validated against the premises.
- `label`: The inference label (e.g., `entailment`, `contradiction`).
- `relation_ids`: A list of Wikidata property IDs (e.g., `['P40']`) involved in the reasoning chain.
- `rule`: The specific logical rule being tested (e.g., `symmetry`, `antisymmetry`).
- `entities`: A list of entity identifiers or names present in the example.
- `chain_len`: An integer (`int64`) representing the length of the reasoning chain (number of steps/triples).
---
## Citation
If you use this dataset in your work, please cite the following paper:
```bibtex
@article{yuan2025capturing,
title={Capturing Symmetry and Antisymmetry in Language Models through Symmetry-Aware Training Objectives},
author={Yuan, Zhangdie and Vlachos, Andreas},
journal={arXiv preprint arXiv:2504.16312},
year={2025}
}
license: MIT协议
size_categories:
- 1000万 < 条目量 < 1亿
language:
- 英语
---
# Asymmetricity v2:面向大语言模型对称与非对称关系理解能力评估的基准数据集
**Asymmetricity v2** 是原版基准数据集的重大升级版本,旨在评估大语言模型(Large Language Model,LLM)在自然语言场景下区分并推理对称关系(如“接壤”)与非对称关系(如“是……的父母”)的能力。该数据集现已扩展至**7000万余条条目**,源自维基数据(Wikidata)三元组,并被转换为自然语言推理(Natural Language Inference,NLI)格式,可实现对关系理解能力的精细化、大规模分析。
该数据集包含多种文本形式:既有自然语言文本,也有将实体替换为维基数据ID(如`Q7024230`)的去词汇化版本。这使得模型既可以在表层文本层面,也可以在抽象关系结构层面接受评估。
本数据集为第二版,原版数据集可通过以下链接获取:[Asymmetricity v1](https://huggingface.co/datasets/MoyYuan/Asymmetricity)。
---
## 概述
理解关系的对称性属性是实现可靠推理的核心前提。例如,若“A是B的父母”,则“B是A的父母”显然为假。然而,诸多大语言模型难以始终遵循此类逻辑,尤其是在表述方式或实体名称发生变化时。
**Asymmetricity v2** 提供了一个结构化且可扩展的测试平台,用于评估此类能力:其基于真实世界知识库中的关系,并将其重构为自然语言推理风格的语句对。此外,该版本新增了推理链长度维度,可支持多步关系推理能力的评估。
---
## 研究动机
当前语言模型往往依赖表层模式与统计共现关系,这会掩盖其对对称性、方向性等逻辑约束的理解。本基准数据集从以下维度对模型进行测试:
- 识别某一关系属于对称还是非对称
- 识别自然语言中的正确蕴含关系与矛盾关系
- 跨实体名称与抽象标识符(维基数据ID)实现泛化
- 处理不同长度的推理链
---
## 数据集设计
每条数据样本均基于包含实体与关系的维基数据三元组。数据被转换为自然语言前提列表与代表逻辑结论(或矛盾)的假设句,标签用于标注假设是否可由前提逻辑推导得出。
---
## 评估方向
本数据集可支撑以下方向的研究:
- 大语言模型中的逻辑一致性与关系推理
- 对关系方向性与对称性的敏感度
- 词汇化与抽象(基于ID)输入下的鲁棒性
- 与关系语义相关的预训练偏见
- 多步推理能力(通过推理链长度分析)
该数据集适用于提示学习、零样本/少样本评估、基于嵌入的检索以及监督微调任务。
---
## 数据格式
数据集中的每一行均为一个JSON对象,包含以下字段:
- `tier`:字符串类型,用于标注样本的难度层级或划分分区
- `lex`:词汇化类型(如`text`代表自然语言格式,`delex`代表基于ID的格式)
- `lang`:文本的语言代码(如`en`代表英语)
- `premises`:自然语言语句列表,作为推理的逻辑基础
- `hypothesis`:需基于前提进行验证的目标语句
- `label`:推理标签(如`entailment`代表蕴含,`contradiction`代表矛盾)
- `relation_ids`:推理链中涉及的维基数据属性ID列表(如`['P40']`)
- `rule`:本次测试的具体逻辑规则(如`symmetry`代表对称性,`antisymmetry`代表非对称性)
- `entities`:样本中包含的实体标识符或名称列表
- `chain_len`:整数类型(`int64`),代表推理链的长度(即步骤/三元组的数量)
---
## 引用说明
若您在研究中使用本数据集,请引用以下论文:
bibtex
@article{yuan2025capturing,
title={Capturing Symmetry and Antisymmetry in Language Models through Symmetry-Aware Training Objectives},
author={Yuan, Zhangdie and Vlachos, Andreas},
journal={arXiv preprint arXiv:2504.16312},
year={2025}
}
提供机构:
MoyYuan



