aisa-group/PostTrainBench-Trajectories
收藏Hugging Face2026-04-22 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/aisa-group/PostTrainBench-Trajectories
下载链接
链接失效反馈官方服务:
资源简介:
PostTrainBench代理轨迹数据集包含来自PostTrainBench基准测试的代理轨迹数据,用于衡量CLI代理在预训练LLMs上进行后训练的能力。每个代理被赋予一个预训练的基础LLM、特定基准的评估脚本和10小时的NVIDIA H100 80GB GPU使用时间,代理需自主选择后训练策略(如SFT、LoRA、RLHF等)来提高模型在目标基准上的性能。数据集包含224条轨迹,涵盖8个代理运行,每个运行包含28个任务(7个基准x4个基础模型),每条轨迹记录最多10小时的自主代理活动。
The PostTrainBench Agent Traces dataset contains agent traces from the PostTrainBench benchmark, which measures CLI agents ability to post-train pre-trained LLMs. Each agent is given a pre-trained base LLM, an evaluation script for a specific benchmark, and 10 hours on an NVIDIA H100 80GB GPU. The agent must autonomously improve the models performance on the target benchmark using any post-training strategy it chooses (SFT, LoRA, RLHF, etc.). The dataset includes 224 traces across 8 agent runs, with 28 tasks per run (7 benchmarks x 4 base models), and each trace covers up to 10 hours of autonomous agent activity.
提供机构:
aisa-group



