RECOR
收藏RECOR: Reasoning-focused Multi-turn Conversational Retrieval Benchmark
概述
RECOR是一个用于评估推理密集型会话信息检索系统的基准测试,旨在解决传统会话搜索评估与真实世界信息寻求场景中复杂推理需求之间的差距。
数据集统计
| 指标 | 数值 |
|---|---|
| 总会话数 | 707 |
| 总轮次 | 2,971 |
| 领域数 | 11 |
| 平均每会话轮次 | 4.2 |
领域
| 来源 | 领域 |
|---|---|
| BRIGHT | biology, earth_science, economics, psychology, robotics, sustainable_living |
| StackExchange | Drones, hardware, law, medicalsciences, politics |
数据获取
选项1: Python (推荐) python from datasets import load_dataset benchmark = load_dataset("RECOR-Benchmark/RECOR", "benchmark", split="biology") corpus = load_dataset("RECOR-Benchmark/RECOR", "corpus", split="biology") all_benchmarks = load_dataset("RECOR-Benchmark/RECOR", "benchmark") all_corpus = load_dataset("RECOR-Benchmark/RECOR", "corpus")
可用领域: biology, earth_science, economics, psychology, robotics, sustainable_living, Drones, hardware, law, medicalsciences, politics
选项2: 命令行 bash huggingface-cli download RECOR-Benchmark/RECOR --repo-type dataset --local-dir ./RECOR-data
选项3: 浏览与下载文件 访问 https://huggingface.co/datasets/RECOR-Benchmark/RECOR/tree/main/data 浏览和下载单个文件。
数据格式
基准测试文件 ({domain}_benchmark.jsonl):
json
{
"id": "biology_0",
"task": "biology",
"original_query": "How do mitochondria generate ATP?",
"original_answer": "Mitochondria generate ATP through...",
"turns": [
{
"turn_id": 1,
"query": "What happens during the electron transport chain?",
"answer": "The electron transport chain...",
"gold_doc_ids": ["doc_123", "doc_456"],
"conversation_history": "No previous conversation.",
"subquestion_reasoning": "Understanding ETC is foundational...",
"subquestion_reasoning_metadata": {
"target_information": "...",
"relevance_signals": ["..."],
"irrelevance_signals": ["..."]
}
}
],
"metadata": {"num_turns": 3, "created_at": "..."}
}
注意: BRIGHT领域使用
gold_doc_ids,StackExchange领域使用supporting_doc_ids。
文档文件 ({domain}_documents.jsonl):
json
{"doc_id": "document_id", "content": "Document text content..."}
文件结构
data/ ├── benchmark/ # 会话基准测试 (11个文件) │ └── {domain}_benchmark.jsonl └── corpus/ # 文档语料库 (11个文件) └── {domain}_documents.jsonl
评估指标
检索: Recall@K, MRR, nDCG@10
生成 (自动): ROUGE-L, METEOR, BERTScore
生成 (LLM评判): Correctness, Completeness, Relevance, Coherence, Faithfulness
许可证
MIT License




