tempo26/Tempo
收藏Hugging Face2026-01-12 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/tempo26/Tempo
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-4.0
size_categories:
- 1K<n<10K
task_categories:
- text-retrieval
- question-answering
tags:
- temporal-reasoning
- rag
- retrieval
- complex-reasoning
- stackexchange
configs:
- config_name: documents
features:
- name: id
dtype: string
- name: content
dtype: string
data_files:
- split: bitcoin
path: documents/bitcoin.parquet
- split: cardano
path: documents/cardano.parquet
- split: iota
path: documents/iota.parquet
- split: monero
path: documents/monero.parquet
- split: economics
path: documents/economics.parquet
- split: law
path: documents/law.parquet
- split: politics
path: documents/politics.parquet
- split: history
path: documents/history.parquet
- split: quant
path: documents/quant.parquet
- split: travel
path: documents/travel.parquet
- split: workplace
path: documents/workplace.parquet
- split: genealogy
path: documents/genealogy.parquet
- split: hsm
path: documents/hsm.parquet
- config_name: examples
features:
- name: id
dtype: string
- name: query
dtype: string
- name: gold_ids
sequence: string
- name: gold_answers
sequence: string
data_files:
- split: bitcoin
path: examples/bitcoin.parquet
- split: cardano
path: examples/cardano.parquet
- split: iota
path: examples/iota.parquet
- split: monero
path: examples/monero.parquet
- split: economics
path: examples/economics.parquet
- split: law
path: examples/law.parquet
- split: politics
path: examples/politics.parquet
- split: history
path: examples/history.parquet
- split: quant
path: examples/quant.parquet
- split: travel
path: examples/travel.parquet
- split: workplace
path: examples/workplace.parquet
- split: genealogy
path: examples/genealogy.parquet
- split: hsm
path: examples/hsm.parquet
- config_name: steps
features:
- name: id
dtype: string
- name: query
dtype: string
- name: gold_ids
sequence: string
data_files:
- split: bitcoin
path: steps/bitcoin.parquet
- split: cardano
path: steps/cardano.parquet
- split: iota
path: steps/iota.parquet
- split: monero
path: steps/monero.parquet
- split: economics
path: steps/economics.parquet
- split: law
path: steps/law.parquet
- split: politics
path: steps/politics.parquet
- split: history
path: steps/history.parquet
- split: quant
path: steps/quant.parquet
- split: travel
path: steps/travel.parquet
- split: workplace
path: steps/workplace.parquet
- split: genealogy
path: steps/genealogy.parquet
- split: hsm
path: steps/hsm.parquet
- config_name: deepseek_reason
features:
- name: id
dtype: string
- name: query
dtype: string
- name: original_query
dtype: string
- name: reformulation_model
dtype: string
- name: gold_ids
sequence: string
- name: gold_answers
sequence: string
data_files:
- split: bitcoin
path: deepseek_reason/bitcoin.parquet
- split: cardano
path: deepseek_reason/cardano.parquet
- split: iota
path: deepseek_reason/iota.parquet
- split: monero
path: deepseek_reason/monero.parquet
- split: economics
path: deepseek_reason/economics.parquet
- split: law
path: deepseek_reason/law.parquet
- split: politics
path: deepseek_reason/politics.parquet
- split: history
path: deepseek_reason/history.parquet
- split: quant
path: deepseek_reason/quant.parquet
- split: travel
path: deepseek_reason/travel.parquet
- split: workplace
path: deepseek_reason/workplace.parquet
- split: genealogy
path: deepseek_reason/genealogy.parquet
- split: hsm
path: deepseek_reason/hsm.parquet
- config_name: gpt4o_reason
features:
- name: id
dtype: string
- name: query
dtype: string
- name: original_query
dtype: string
- name: reformulation_model
dtype: string
- name: gold_ids
sequence: string
- name: gold_answers
sequence: string
data_files:
- split: bitcoin
path: gpt4o_reason/bitcoin.parquet
- split: cardano
path: gpt4o_reason/cardano.parquet
- split: iota
path: gpt4o_reason/iota.parquet
- split: monero
path: gpt4o_reason/monero.parquet
- split: economics
path: gpt4o_reason/economics.parquet
- split: law
path: gpt4o_reason/law.parquet
- split: politics
path: gpt4o_reason/politics.parquet
- split: history
path: gpt4o_reason/history.parquet
- split: quant
path: gpt4o_reason/quant.parquet
- split: travel
path: gpt4o_reason/travel.parquet
- split: workplace
path: gpt4o_reason/workplace.parquet
- split: genealogy
path: gpt4o_reason/genealogy.parquet
- split: hsm
path: gpt4o_reason/hsm.parquet
- config_name: llama70b_reason
features:
- name: id
dtype: string
- name: query
dtype: string
- name: original_query
dtype: string
- name: reformulation_model
dtype: string
- name: gold_ids
sequence: string
- name: gold_answers
sequence: string
data_files:
- split: bitcoin
path: llama70b_reason/bitcoin.parquet
- split: cardano
path: llama70b_reason/cardano.parquet
- split: iota
path: llama70b_reason/iota.parquet
- split: monero
path: llama70b_reason/monero.parquet
- split: economics
path: llama70b_reason/economics.parquet
- split: law
path: llama70b_reason/law.parquet
- split: politics
path: llama70b_reason/politics.parquet
- split: history
path: llama70b_reason/history.parquet
- split: quant
path: llama70b_reason/quant.parquet
- split: travel
path: llama70b_reason/travel.parquet
- split: workplace
path: llama70b_reason/workplace.parquet
- split: genealogy
path: llama70b_reason/genealogy.parquet
- split: hsm
path: llama70b_reason/hsm.parquet
- config_name: qwen72b_reason
features:
- name: id
dtype: string
- name: query
dtype: string
- name: original_query
dtype: string
- name: reformulation_model
dtype: string
- name: gold_ids
sequence: string
- name: gold_answers
sequence: string
data_files:
- split: bitcoin
path: qwen72b_reason/bitcoin.parquet
- split: cardano
path: qwen72b_reason/cardano.parquet
- split: iota
path: qwen72b_reason/iota.parquet
- split: monero
path: qwen72b_reason/monero.parquet
- split: economics
path: qwen72b_reason/economics.parquet
- split: law
path: qwen72b_reason/law.parquet
- split: politics
path: qwen72b_reason/politics.parquet
- split: history
path: qwen72b_reason/history.parquet
- split: quant
path: qwen72b_reason/quant.parquet
- split: travel
path: qwen72b_reason/travel.parquet
- split: workplace
path: qwen72b_reason/workplace.parquet
- split: genealogy
path: qwen72b_reason/genealogy.parquet
- split: hsm
path: qwen72b_reason/hsm.parquet
---
# TEMPO: A Realistic Multi-Domain Benchmark for Temporal Reasoning-Intensive Retrieval
**TEMPO** is the first benchmark combining **temporal reasoning** with **reasoning-intensive retrieval** across 13 diverse technical domains. Unlike existing benchmarks that focus on simple fact-seeking queries (e.g., "When did X happen?"), TEMPO targets complex, real-world information needs that require synthesizing evidence across time periods, tracking evolution, and comparing historical baselines with current states.
## Table of Contents
- [Dataset Summary](#dataset-summary)
- [Dataset Structure](#dataset-structure)
- [Configurations](#configurations)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Statistics](#statistics)
- [Domains](#domains)
- [Key Features](#key-features)
- [Citation](#citation)
---
## Dataset Summary
Real-world information needs often require reasoning about temporal evolution and synthesizing evidence across time periods. However, existing benchmarks either focus on simple date-lookup (Temporal QA) or logical inference without temporal grounding (Reasoning-Intensive Retrieval).
**TEMPO** addresses this gap by providing:
1. **1,730 Complex Queries**: Naturally occurring, expert-level questions from Stack Exchange requiring deep temporal reasoning.
2. **1.6M+ Documents**: A realistic retrieval corpus containing positive evidence and "hard" temporal negatives.
3. **Step-wise Retrieval Planning**: 3,976 decomposed retrieval steps mapped to gold documents for multi-hop evaluation.
4. **Multi-Level Annotation**: Fine-grained annotations for temporal intent, reasoning classes (e.g., trend analysis, event localization), and key time anchors.
**Paper:** *TEMPO: A Realistic Multi-Domain Benchmark for Temporal Reasoning-Intensive Retrieval* **Repository:** [GitHub Link](https://github.com/tempo-bench/Tempo)
---
## Dataset Structure
### Configurations
The dataset is organized into four configurations (subsets) to support different retrieval and reasoning tasks. You can load specific configurations as follows:
```python
from datasets import load_dataset
# Load the Corpus (Knowledge Base)
corpus = load_dataset("tempo26/Tempo", "documents")
# Load the Main Retrieval Task (Query -> Docs)
queries = load_dataset("tempo26/Tempo", "examples")
# Load the Multi-Step Task (Query -> Step -> Docs)
steps = load_dataset("tempo26/Tempo", "steps")
# Load Reasoning/CoT Augmented Queries
reasoning = load_dataset("tempo26/Tempo", "deepseek_reason")
```
### Data Fields
#### 1. `documents` (The Corpus)
Contains the searchable knowledge base.
* **`id`**: Unique document identifier (e.g., `bitcoin/585a73c2.txt`).
* **`content`**: The full text content of the document.
#### 2. `examples` (Main Queries)
Standard retrieval queries with gold standards.
* **`id`**: Unique query identifier (e.g., `126019_1`).
* **`query`**: The complex natural language question/post.
* **`gold_ids`**: List of document IDs that contain the answer.
* **`gold_answers`**: The reference answer text (HTML/Markdown formatted).
* **`query_guidance`**: A dictionary containing rich temporal metadata:
* `temporal_intent`: (e.g., `when`, `duration`, `before_after`).
* `temporal_reasoning_class_primary`: The type of reasoning required (see below).
* `key_time_anchors`: List of explicit time expressions.
#### 3. `steps` (Step-wise Evaluation)
Used for evaluating multi-hop temporal reasoning (Task 2).
* **`id`**: Step identifier.
* **`query`**: The context of the original query.
* **`step_instruction`**: (If available) The specific sub-question for this step.
* **`gold_ids`**: The specific documents relevant *only* to this reasoning step.
#### 4. `deepseek_reason`
Queries augmented with Chain-of-Thought (CoT) reasoning paths generated by DeepSeek-32B.
* **`id`**: Query identifier.
* **`query`**: The original query.
* **`reasoning`**: The generated CoT reasoning trace (inside `<think>` tags) followed by the final reformulated answer/plan.
---
## Statistics
| Domain Category | # Queries | # Documents | Avg. Docs/Query | Avg. Query Length |
| :--- | :---: | :---: | :---: | :---: |
| **Blockchain** | 226 | 335,957 | 3.0 | 176 words |
| **Social Sciences** | 1,069 | 676,931 | 3.5 | 316 words |
| **Applied** | 285 | 427,399 | 2.6 | 335 words |
| **STEM (HSM)** | 150 | 213,818 | 2.5 | 303 words |
| **TOTAL** | **1,730** | **1,654,055** | **--** | **~300 words** |
---
## Domains
The dataset covers **13 expert domains** from Stack Exchange, split into 4 categories:
1. **Blockchain**: Bitcoin, Cardano, IOTA, Monero.
2. **Social Sciences**: Economics, Law, Politics, History (Largest domain).
3. **Applied Fields**: Quantitative Finance (Quant), Travel, Workplace, Genealogy.
4. **STEM**: History of Science and Mathematics (HSM).
This diverse coverage ensures that models are tested on specialized vocabulary and domain-specific temporal logic (e.g., block heights in Bitcoin vs. legislation enactment dates in Law).
---
## Key Features
### Temporal Reasoning Classes
Queries are annotated with 10 fine-grained reasoning classes:
* **Event Analysis & Localization (EAL)**: Pinpointing when events occurred.
* **Time Period Contextualization (TPC)**: Situating phenomena in historical periods.
* **Origins & Evolution Comparative (OEC)**: Tracking concept evolution.
* **Trends & Cross-Period Comparison (TCP)**: Comparing states across eras.
* *(Plus: Event Verification, Causation Analysis, Materials Provenance, etc.)*
### Hard Negatives
To prevent simple lexical matching, TEMPO includes "Hard Negatives" (mined via GPT-4o). These documents are topically relevant (sharing keywords and entities) but are **temporally mismatched** (e.g., discussing the wrong year, an outdated version of a protocol, or a different legislative era).
---
## Citation
If you use TEMPO in your research, please cite our paper:
```bibtex
soon
```
*Note: This dataset is derived from public Stack Exchange data (CC-BY-SA).*
提供机构:
tempo26



