AutoExperiment

Name: AutoExperiment
Creator: Open-sourced by the authors
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/j1mk1m/AutoExperiment

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为AutoExperiment，旨在评估智能体根据同行评审的研究论文运行实验的能力，通过遮蔽关键功能并评估智能体复现结果的能力。该数据集支持高达275,990个可能的样本，针对不同数量的遮蔽功能，每种设置最多可以选择100个样本进行评估。它包含85个独特的功能，可产生不同遮蔽级别的众多样本。该数据集的任务是评估人工智能智能体实施和运行机器学习实验的能力。

This dataset, named AutoExperiment, is designed to evaluate the ability of AI Agents to execute experiments based on peer-reviewed research papers, by masking key functions and assessing the agents' capability to reproduce experimental results. Supporting up to 275,990 potential samples, it allows selecting up to 100 samples for evaluation under each setting corresponding to different numbers of masked functions. Comprising 85 unique functions, it can generate a large number of samples with varying masking levels. The core task of this dataset is to evaluate the capability of AI Agents to implement and run machine learning experiments.

提供机构：

Open-sourced by the authors

5,000+

优质数据集

54 个

任务类型

进入经典数据集