five

tempo26/Tempo

收藏
Hugging Face2026-01-12 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/tempo26/Tempo
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc-by-4.0 size_categories: - 1K<n<10K task_categories: - text-retrieval - question-answering tags: - temporal-reasoning - rag - retrieval - complex-reasoning - stackexchange configs: - config_name: documents features: - name: id dtype: string - name: content dtype: string data_files: - split: bitcoin path: documents/bitcoin.parquet - split: cardano path: documents/cardano.parquet - split: iota path: documents/iota.parquet - split: monero path: documents/monero.parquet - split: economics path: documents/economics.parquet - split: law path: documents/law.parquet - split: politics path: documents/politics.parquet - split: history path: documents/history.parquet - split: quant path: documents/quant.parquet - split: travel path: documents/travel.parquet - split: workplace path: documents/workplace.parquet - split: genealogy path: documents/genealogy.parquet - split: hsm path: documents/hsm.parquet - config_name: examples features: - name: id dtype: string - name: query dtype: string - name: gold_ids sequence: string - name: gold_answers sequence: string data_files: - split: bitcoin path: examples/bitcoin.parquet - split: cardano path: examples/cardano.parquet - split: iota path: examples/iota.parquet - split: monero path: examples/monero.parquet - split: economics path: examples/economics.parquet - split: law path: examples/law.parquet - split: politics path: examples/politics.parquet - split: history path: examples/history.parquet - split: quant path: examples/quant.parquet - split: travel path: examples/travel.parquet - split: workplace path: examples/workplace.parquet - split: genealogy path: examples/genealogy.parquet - split: hsm path: examples/hsm.parquet - config_name: steps features: - name: id dtype: string - name: query dtype: string - name: gold_ids sequence: string data_files: - split: bitcoin path: steps/bitcoin.parquet - split: cardano path: steps/cardano.parquet - split: iota path: steps/iota.parquet - split: monero path: steps/monero.parquet - split: economics path: steps/economics.parquet - split: law path: steps/law.parquet - split: politics path: steps/politics.parquet - split: history path: steps/history.parquet - split: quant path: steps/quant.parquet - split: travel path: steps/travel.parquet - split: workplace path: steps/workplace.parquet - split: genealogy path: steps/genealogy.parquet - split: hsm path: steps/hsm.parquet - config_name: deepseek_reason features: - name: id dtype: string - name: query dtype: string - name: original_query dtype: string - name: reformulation_model dtype: string - name: gold_ids sequence: string - name: gold_answers sequence: string data_files: - split: bitcoin path: deepseek_reason/bitcoin.parquet - split: cardano path: deepseek_reason/cardano.parquet - split: iota path: deepseek_reason/iota.parquet - split: monero path: deepseek_reason/monero.parquet - split: economics path: deepseek_reason/economics.parquet - split: law path: deepseek_reason/law.parquet - split: politics path: deepseek_reason/politics.parquet - split: history path: deepseek_reason/history.parquet - split: quant path: deepseek_reason/quant.parquet - split: travel path: deepseek_reason/travel.parquet - split: workplace path: deepseek_reason/workplace.parquet - split: genealogy path: deepseek_reason/genealogy.parquet - split: hsm path: deepseek_reason/hsm.parquet - config_name: gpt4o_reason features: - name: id dtype: string - name: query dtype: string - name: original_query dtype: string - name: reformulation_model dtype: string - name: gold_ids sequence: string - name: gold_answers sequence: string data_files: - split: bitcoin path: gpt4o_reason/bitcoin.parquet - split: cardano path: gpt4o_reason/cardano.parquet - split: iota path: gpt4o_reason/iota.parquet - split: monero path: gpt4o_reason/monero.parquet - split: economics path: gpt4o_reason/economics.parquet - split: law path: gpt4o_reason/law.parquet - split: politics path: gpt4o_reason/politics.parquet - split: history path: gpt4o_reason/history.parquet - split: quant path: gpt4o_reason/quant.parquet - split: travel path: gpt4o_reason/travel.parquet - split: workplace path: gpt4o_reason/workplace.parquet - split: genealogy path: gpt4o_reason/genealogy.parquet - split: hsm path: gpt4o_reason/hsm.parquet - config_name: llama70b_reason features: - name: id dtype: string - name: query dtype: string - name: original_query dtype: string - name: reformulation_model dtype: string - name: gold_ids sequence: string - name: gold_answers sequence: string data_files: - split: bitcoin path: llama70b_reason/bitcoin.parquet - split: cardano path: llama70b_reason/cardano.parquet - split: iota path: llama70b_reason/iota.parquet - split: monero path: llama70b_reason/monero.parquet - split: economics path: llama70b_reason/economics.parquet - split: law path: llama70b_reason/law.parquet - split: politics path: llama70b_reason/politics.parquet - split: history path: llama70b_reason/history.parquet - split: quant path: llama70b_reason/quant.parquet - split: travel path: llama70b_reason/travel.parquet - split: workplace path: llama70b_reason/workplace.parquet - split: genealogy path: llama70b_reason/genealogy.parquet - split: hsm path: llama70b_reason/hsm.parquet - config_name: qwen72b_reason features: - name: id dtype: string - name: query dtype: string - name: original_query dtype: string - name: reformulation_model dtype: string - name: gold_ids sequence: string - name: gold_answers sequence: string data_files: - split: bitcoin path: qwen72b_reason/bitcoin.parquet - split: cardano path: qwen72b_reason/cardano.parquet - split: iota path: qwen72b_reason/iota.parquet - split: monero path: qwen72b_reason/monero.parquet - split: economics path: qwen72b_reason/economics.parquet - split: law path: qwen72b_reason/law.parquet - split: politics path: qwen72b_reason/politics.parquet - split: history path: qwen72b_reason/history.parquet - split: quant path: qwen72b_reason/quant.parquet - split: travel path: qwen72b_reason/travel.parquet - split: workplace path: qwen72b_reason/workplace.parquet - split: genealogy path: qwen72b_reason/genealogy.parquet - split: hsm path: qwen72b_reason/hsm.parquet --- # TEMPO: A Realistic Multi-Domain Benchmark for Temporal Reasoning-Intensive Retrieval **TEMPO** is the first benchmark combining **temporal reasoning** with **reasoning-intensive retrieval** across 13 diverse technical domains. Unlike existing benchmarks that focus on simple fact-seeking queries (e.g., "When did X happen?"), TEMPO targets complex, real-world information needs that require synthesizing evidence across time periods, tracking evolution, and comparing historical baselines with current states. ## Table of Contents - [Dataset Summary](#dataset-summary) - [Dataset Structure](#dataset-structure) - [Configurations](#configurations) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Statistics](#statistics) - [Domains](#domains) - [Key Features](#key-features) - [Citation](#citation) --- ## Dataset Summary Real-world information needs often require reasoning about temporal evolution and synthesizing evidence across time periods. However, existing benchmarks either focus on simple date-lookup (Temporal QA) or logical inference without temporal grounding (Reasoning-Intensive Retrieval). **TEMPO** addresses this gap by providing: 1. **1,730 Complex Queries**: Naturally occurring, expert-level questions from Stack Exchange requiring deep temporal reasoning. 2. **1.6M+ Documents**: A realistic retrieval corpus containing positive evidence and "hard" temporal negatives. 3. **Step-wise Retrieval Planning**: 3,976 decomposed retrieval steps mapped to gold documents for multi-hop evaluation. 4. **Multi-Level Annotation**: Fine-grained annotations for temporal intent, reasoning classes (e.g., trend analysis, event localization), and key time anchors. **Paper:** *TEMPO: A Realistic Multi-Domain Benchmark for Temporal Reasoning-Intensive Retrieval* **Repository:** [GitHub Link](https://github.com/tempo-bench/Tempo) --- ## Dataset Structure ### Configurations The dataset is organized into four configurations (subsets) to support different retrieval and reasoning tasks. You can load specific configurations as follows: ```python from datasets import load_dataset # Load the Corpus (Knowledge Base) corpus = load_dataset("tempo26/Tempo", "documents") # Load the Main Retrieval Task (Query -> Docs) queries = load_dataset("tempo26/Tempo", "examples") # Load the Multi-Step Task (Query -> Step -> Docs) steps = load_dataset("tempo26/Tempo", "steps") # Load Reasoning/CoT Augmented Queries reasoning = load_dataset("tempo26/Tempo", "deepseek_reason") ``` ### Data Fields #### 1. `documents` (The Corpus) Contains the searchable knowledge base. * **`id`**: Unique document identifier (e.g., `bitcoin/585a73c2.txt`). * **`content`**: The full text content of the document. #### 2. `examples` (Main Queries) Standard retrieval queries with gold standards. * **`id`**: Unique query identifier (e.g., `126019_1`). * **`query`**: The complex natural language question/post. * **`gold_ids`**: List of document IDs that contain the answer. * **`gold_answers`**: The reference answer text (HTML/Markdown formatted). * **`query_guidance`**: A dictionary containing rich temporal metadata: * `temporal_intent`: (e.g., `when`, `duration`, `before_after`). * `temporal_reasoning_class_primary`: The type of reasoning required (see below). * `key_time_anchors`: List of explicit time expressions. #### 3. `steps` (Step-wise Evaluation) Used for evaluating multi-hop temporal reasoning (Task 2). * **`id`**: Step identifier. * **`query`**: The context of the original query. * **`step_instruction`**: (If available) The specific sub-question for this step. * **`gold_ids`**: The specific documents relevant *only* to this reasoning step. #### 4. `deepseek_reason` Queries augmented with Chain-of-Thought (CoT) reasoning paths generated by DeepSeek-32B. * **`id`**: Query identifier. * **`query`**: The original query. * **`reasoning`**: The generated CoT reasoning trace (inside `<think>` tags) followed by the final reformulated answer/plan. --- ## Statistics | Domain Category | # Queries | # Documents | Avg. Docs/Query | Avg. Query Length | | :--- | :---: | :---: | :---: | :---: | | **Blockchain** | 226 | 335,957 | 3.0 | 176 words | | **Social Sciences** | 1,069 | 676,931 | 3.5 | 316 words | | **Applied** | 285 | 427,399 | 2.6 | 335 words | | **STEM (HSM)** | 150 | 213,818 | 2.5 | 303 words | | **TOTAL** | **1,730** | **1,654,055** | **--** | **~300 words** | --- ## Domains The dataset covers **13 expert domains** from Stack Exchange, split into 4 categories: 1. **Blockchain**: Bitcoin, Cardano, IOTA, Monero. 2. **Social Sciences**: Economics, Law, Politics, History (Largest domain). 3. **Applied Fields**: Quantitative Finance (Quant), Travel, Workplace, Genealogy. 4. **STEM**: History of Science and Mathematics (HSM). This diverse coverage ensures that models are tested on specialized vocabulary and domain-specific temporal logic (e.g., block heights in Bitcoin vs. legislation enactment dates in Law). --- ## Key Features ### Temporal Reasoning Classes Queries are annotated with 10 fine-grained reasoning classes: * **Event Analysis & Localization (EAL)**: Pinpointing when events occurred. * **Time Period Contextualization (TPC)**: Situating phenomena in historical periods. * **Origins & Evolution Comparative (OEC)**: Tracking concept evolution. * **Trends & Cross-Period Comparison (TCP)**: Comparing states across eras. * *(Plus: Event Verification, Causation Analysis, Materials Provenance, etc.)* ### Hard Negatives To prevent simple lexical matching, TEMPO includes "Hard Negatives" (mined via GPT-4o). These documents are topically relevant (sharing keywords and entities) but are **temporally mismatched** (e.g., discussing the wrong year, an outdated version of a protocol, or a different legislative era). --- ## Citation If you use TEMPO in your research, please cite our paper: ```bibtex soon ``` *Note: This dataset is derived from public Stack Exchange data (CC-BY-SA).*
提供机构:
tempo26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作