AIM-Harvard/proof-of-time
收藏Hugging Face2026-01-21 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/AIM-Harvard/proof-of-time
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含用于评估LLM代理在学术论文分析任务上的基准,这些任务需要理解研究趋势、引用和未来方向。所有评估数据使用**训练后截止**(2025年)的论文,以避免数据污染。数据集包括:1. **基准任务**(3.8 MB):包含多项选择题和评估样本的JSONL文件;2. **沙盒数据**(66 MB):历史论文数据、教师出版物和用于代理评估的SOTA指标。基准套件专注于时间推理:代理必须分析历史模式以预测未来研究方向、获奖者和引用影响。任务需要真正理解研究趋势而非记忆。
This dataset contains benchmarks for evaluating LLM agents on academic paper analysis tasks that require understanding research trends, citations, and future directions. All evaluation data uses **post-training-cutoff** (2025) papers to avoid data contamination. The dataset includes: 1. **Benchmark Tasks** (3.8 MB): JSONL files with multiple-choice questions and evaluation samples; 2. **Sandbox Data** (66 MB): Historical paper data, faculty publications, and SOTA metrics for agent evaluation. The benchmark suite focuses on temporal reasoning: agents must analyze historical patterns to make predictions about future research directions, award recipients, and citation impact. Tasks require genuine understanding of research trends rather than memorization.
提供机构:
AIM-Harvard



