klimzaporojets/emerge-benchmark
收藏Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/klimzaporojets/emerge-benchmark
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-sa-4.0
task_categories:
- text-classification
- token-classification
language:
- en
tags:
- knowledge-graph
- information-extraction
- relation-extraction
- entity-linking
- wikidata
- wikipedia
- temporal
pretty_name: "EMERGE: Updating Knowledge Graphs with Emerging Textual Knowledge"
size_categories:
- 1K<n<10K
dataset_info:
- config_name: default
features:
- name: hash_id
dtype: string
- name: passage
dtype: string
- name: anchor_title
dtype: string
- name: anchor_page_qid
dtype: string
- name: revision_date
dtype: string
splits:
- name: test
num_examples: 3500
---
# EMERGE: A Benchmark for Updating Knowledge Graphs with Emerging Textual Knowledge
**[Paper](https://arxiv.org/abs/2507.03617)** | **[Code](https://github.com/klimzaporojets/emerge-benchmark)**
## Overview
EMERGE is a benchmark for **Text-driven KG Updating (TKGU)** — evaluating methods for updating knowledge graphs from textual evidence.
Each instance pairs a textual passage with a KG snapshot and a set of update operations induced by the passage. EMERGE defines five TKGU operations:
| Operation | Code | Description |
|-----------|------|-------------|
| **Exists** | `x-triples` | Triple already present in the KG, supported by the textual passage |
| **Add** | `e-triples` | New triple involving entities that already exist in the KG |
| **Mint+Add** | `ee-triples` | New triple involving one or more entities not yet in the KG |
| **Infer** | `ee-kg-triples` | Triple linking a newly introduced entity to an existing KG entity, not explicitly stated in the passage |
| **Deprecate** | `d-triples` | Existing triple invalidated by updated information in the passage |
## Dataset Contents
### Test set (`evaluation_set/`)
3,500 instances across 7 annual Wikidata snapshots (2019-2025), organized as:
```
evaluation_set/
├── snapshot_2019-01-01/
│ ├── delta_2019-01-08.jsonl (100 instances)
│ ├── delta_2019-01-15.jsonl
│ ├── delta_2019-01-22.jsonl
│ ├── delta_2019-01-29.jsonl
│ └── delta_2019-02-05.jsonl
├── snapshot_2020-01-01/ ... snapshot_2025-01-01/
```
Each instance (JSONL line) contains:
- **`passage`**: Wikipedia passage text
- **`mentions`**: Entity mentions with character offsets and Wikidata QIDs
- **`tkgu_triples`**: Ground-truth triples with TKGU operations and LLM assessments
- **`predictions`**: Outputs from 13 benchmark models
- **`hash_id`**: Unique instance identifier
### Annotations (`annotation/`)
Human annotation data for inter-annotator agreement statistics.
### KG Snapshots (`kg_snapshots/`)
7 yearly Wikidata KG snapshots (gzip-compressed TSV, ~3.7GB total).
Each row is a `(subject, predicate, object)` triple active at that snapshot date.
Needed for relik-cie Exists operation evaluation.
### Relation Indices (`indices/`)
Per-snapshot relation embeddings (~400MB) used by ReLiK and EDC+ benchmarks.
## Benchmark Models
The test set includes pre-computed predictions from 13 models:
| Model | Type | Backend |
|-------|------|---------|
| EDC+ GPT-5.1 | LLM (in-context learning) | GPT-5.1 |
| EDC+ Mistral-Large | LLM (in-context learning) | Mistral-Large |
| EDC+ Mistral-Small | LLM (in-context learning) | Mistral-Small |
| EDC+ ZS GPT-5.1 | LLM (zero-shot) | GPT-5.1 |
| EDC+ ZS Mistral-Large | LLM (zero-shot) | Mistral-Large |
| KGGen GPT-5.1 | LLM | GPT-5.1 |
| KGGen Mistral-Large | LLM | Mistral-Large |
| KGGen Mistral-Small | LLM | Mistral-Small |
| RAKG Mistral-Large | LLM | Mistral-Large |
| RAKG Mistral-Small | LLM | Mistral-Small |
| REBEL | Local seq2seq | Babelscape/rebel-large |
| ReLiK OIE | Local neural | sapienzanlp/relik-relation-extraction-nyt-large |
| ReLiK CIE | Local neural | sapienzanlp/relik-cie-large |
## Usage
### Download with the EMERGE repository
```bash
git clone https://github.com/klimzaporojets/emerge-benchmark.git
cd emerge-benchmark
./scripts/download_data.sh # test set + annotations
./scripts/download_data.sh --kg # + KG snapshots
./scripts/download_data.sh --all # + relation indices
```
### Download with Python
```python
from huggingface_hub import snapshot_download
# Download test set and annotations
snapshot_download(
repo_id="klimzaporojets/emerge-benchmark",
repo_type="dataset",
local_dir="./data",
allow_patterns=["evaluation_set/**", "annotation/**"],
)
```
### Load a single instance
```python
import json
with open("data/evaluation_set/snapshot_2024-01-01/delta_2024-01-08.jsonl") as f:
instance = json.loads(f.readline())
print(instance["passage"][:200])
print(f"TKGU triples: {len(instance['tkgu_triples'])}")
print(f"Models with predictions: {list(instance['predictions'].keys())}")
```
## Instance Format
Each JSONL line contains:
| Field | Type | Description |
|-------|------|-------------|
| `hash_id` | string | Unique instance identifier |
| `passage` | string | Wikipedia passage text |
| `mentions` | list | Entity mentions with char offsets and Wikidata QIDs |
| `tkgu_triples` | list | Ground-truth triples with operations and LLM assessments |
| `predictions` | dict | Model predictions keyed by model name |
| `revision_date` | string | Wikipedia revision timestamp |
| `anchor_title` | string | Wikipedia article title |
| `delta_dates` | list | Start and end dates of the delta period |
See the [code repository](https://github.com/klimzaporojets/emerge-benchmark) for the full schema documentation (`data/README.md`).
## Citation
```bibtex
@article{zaporojets2025emerge,
title={EMERGE: A Benchmark for Updating Knowledge Graphs with Emerging Textual Knowledge},
author={Zaporojets, Klim and Daza, Daniel and Barba, Edoardo and Assent, Ira and Navigli, Roberto and Groth, Paul},
journal={arXiv preprint arXiv:2507.03617},
year={2025}
}
```
## License
This dataset is licensed under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/).
It is derived from [Wikipedia](https://en.wikipedia.org/) (CC BY-SA 3.0+) and [Wikidata](https://www.wikidata.org/) (CC0).
提供机构:
klimzaporojets



