AIM-Harvard/proof-of-time

Name: AIM-Harvard/proof-of-time
Creator: AIM-Harvard
Published: 2026-01-21 19:03:30
License: 暂无描述

Hugging Face2026-01-21 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/AIM-Harvard/proof-of-time

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含用于评估LLM代理在学术论文分析任务上的基准，这些任务需要理解研究趋势、引用和未来方向。所有评估数据使用**训练后截止**（2025年）的论文，以避免数据污染。数据集包括：1. **基准任务**（3.8 MB）：包含多项选择题和评估样本的JSONL文件；2. **沙盒数据**（66 MB）：历史论文数据、教师出版物和用于代理评估的SOTA指标。基准套件专注于时间推理：代理必须分析历史模式以预测未来研究方向、获奖者和引用影响。任务需要真正理解研究趋势而非记忆。

This dataset contains benchmarks for evaluating LLM agents on academic paper analysis tasks that require understanding research trends, citations, and future directions. All evaluation data uses **post-training-cutoff** (2025) papers to avoid data contamination. The dataset includes: 1. **Benchmark Tasks** (3.8 MB): JSONL files with multiple-choice questions and evaluation samples; 2. **Sandbox Data** (66 MB): Historical paper data, faculty publications, and SOTA metrics for agent evaluation. The benchmark suite focuses on temporal reasoning: agents must analyze historical patterns to make predictions about future research directions, award recipients, and citation impact. Tasks require genuine understanding of research trends rather than memorization.

提供机构：

AIM-Harvard

5,000+

优质数据集

54 个

任务类型

进入经典数据集