jablonkagroup/corral-intervention-reports
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/jablonkagroup/corral-intervention-reports
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: catalyst
features:
- name: model
dtype: string
- name: agent_type
dtype: string
- name: environment
dtype: string
- name: condition
dtype: string
- name: condition_type
dtype: string
- name: step
dtype: int64
- name: prompt_tokens
dtype: int64
- name: completion_tokens
dtype: int64
- name: total_tokens
dtype: int64
- name: Total Tool Calls
dtype: string
- name: Average Score
dtype: float64
- name: Overall Average Duration
dtype: float64
- name: Overall Success Rate
dtype: int64
- name: Overall Total Duration
dtype: float64
- name: Pass@1
dtype: float64
- name: Pass@10
dtype: float64
- name: Pass@11
dtype: float64
- name: Pass@12
dtype: float64
- name: Pass@13
dtype: float64
- name: Pass@14
dtype: float64
- name: Pass@15
dtype: float64
- name: Pass@2
dtype: float64
- name: Pass@3
dtype: float64
- name: Pass@4
dtype: float64
- name: Pass@5
dtype: float64
- name: Pass@6
dtype: float64
- name: Pass@7
dtype: float64
- name: Pass@8
dtype: float64
- name: Pass@9
dtype: float64
- name: Pass^1
dtype: float64
- name: Pass^10
dtype: float64
- name: Pass^11
dtype: float64
- name: Pass^12
dtype: float64
- name: Pass^13
dtype: float64
- name: Pass^14
dtype: float64
- name: Pass^15
dtype: float64
- name: Pass^2
dtype: float64
- name: Pass^3
dtype: float64
- name: Pass^4
dtype: float64
- name: Pass^5
dtype: float64
- name: Pass^6
dtype: float64
- name: Pass^7
dtype: float64
- name: Pass^8
dtype: float64
- name: Pass^9
dtype: float64
- name: Total Surrendered Trials
dtype: int64
- name: Total Tasks
dtype: int64
- name: Total Tool Execution Duration
dtype: float64
- name: Tool Verbosity
dtype: string
- name: Total Benchmark Duration
dtype: float64
- name: Task Results
dtype: string
splits:
- name: train
num_examples: 2
- config_name: md
features:
- name: model
dtype: string
- name: agent_type
dtype: string
- name: environment
dtype: string
- name: condition
dtype: string
- name: condition_type
dtype: string
- name: step
dtype: int64
- name: prompt_tokens
dtype: int64
- name: completion_tokens
dtype: int64
- name: total_tokens
dtype: int64
- name: Total Tool Calls
dtype: string
- name: Average Score
dtype: float64
- name: Overall Average Duration
dtype: float64
- name: Overall Success Rate
dtype: float64
- name: Overall Total Duration
dtype: float64
- name: Pass@1
dtype: float64
- name: Pass@10
dtype: float64
- name: Pass@11
dtype: float64
- name: Pass@12
dtype: float64
- name: Pass@13
dtype: float64
- name: Pass@14
dtype: float64
- name: Pass@15
dtype: float64
- name: Pass@2
dtype: float64
- name: Pass@3
dtype: float64
- name: Pass@4
dtype: float64
- name: Pass@5
dtype: float64
- name: Pass@6
dtype: float64
- name: Pass@7
dtype: float64
- name: Pass@8
dtype: float64
- name: Pass@9
dtype: float64
- name: Pass^1
dtype: float64
- name: Pass^10
dtype: float64
- name: Pass^11
dtype: float64
- name: Pass^12
dtype: float64
- name: Pass^13
dtype: float64
- name: Pass^14
dtype: float64
- name: Pass^15
dtype: float64
- name: Pass^2
dtype: float64
- name: Pass^3
dtype: float64
- name: Pass^4
dtype: float64
- name: Pass^5
dtype: float64
- name: Pass^6
dtype: float64
- name: Pass^7
dtype: float64
- name: Pass^8
dtype: float64
- name: Pass^9
dtype: float64
- name: Total Surrendered Trials
dtype: int64
- name: Total Tasks
dtype: int64
- name: Total Tool Execution Duration
dtype: float64
- name: Tool Verbosity
dtype: string
- name: Total Benchmark Duration
dtype: float64
- name: Task Results
dtype: string
splits:
- name: train
num_examples: 18
- config_name: ml
features:
- name: model
dtype: string
- name: agent_type
dtype: string
- name: environment
dtype: string
- name: condition
dtype: string
- name: condition_type
dtype: string
- name: step
dtype: int64
- name: prompt_tokens
dtype: int64
- name: completion_tokens
dtype: int64
- name: total_tokens
dtype: int64
- name: Total Tool Calls
dtype: string
- name: Average Score
dtype: float64
- name: Overall Average Duration
dtype: float64
- name: Overall Success Rate
dtype: float64
- name: Overall Total Duration
dtype: float64
- name: Pass@1
dtype: float64
- name: Pass@10
dtype: float64
- name: Pass@11
dtype: float64
- name: Pass@12
dtype: float64
- name: Pass@13
dtype: float64
- name: Pass@14
dtype: float64
- name: Pass@15
dtype: float64
- name: Pass@2
dtype: float64
- name: Pass@3
dtype: float64
- name: Pass@4
dtype: float64
- name: Pass@5
dtype: float64
- name: Pass@6
dtype: float64
- name: Pass@7
dtype: float64
- name: Pass@8
dtype: float64
- name: Pass@9
dtype: float64
- name: Pass^1
dtype: float64
- name: Pass^10
dtype: float64
- name: Pass^11
dtype: float64
- name: Pass^12
dtype: float64
- name: Pass^13
dtype: float64
- name: Pass^14
dtype: float64
- name: Pass^15
dtype: float64
- name: Pass^2
dtype: float64
- name: Pass^3
dtype: float64
- name: Pass^4
dtype: float64
- name: Pass^5
dtype: float64
- name: Pass^6
dtype: float64
- name: Pass^7
dtype: float64
- name: Pass^8
dtype: float64
- name: Pass^9
dtype: float64
- name: Total Surrendered Trials
dtype: int64
- name: Total Tasks
dtype: int64
- name: Total Tool Execution Duration
dtype: float64
- name: Tool Verbosity
dtype: string
- name: Total Benchmark Duration
dtype: float64
- name: Task Results
dtype: string
splits:
- name: train
num_examples: 18
- config_name: resistor
features:
- name: model
dtype: string
- name: agent_type
dtype: string
- name: environment
dtype: string
- name: condition
dtype: string
- name: condition_type
dtype: string
- name: step
dtype: int64
- name: prompt_tokens
dtype: int64
- name: completion_tokens
dtype: int64
- name: total_tokens
dtype: int64
- name: Total Tool Calls
dtype: string
- name: Average Score
dtype: float64
- name: Overall Average Duration
dtype: float64
- name: Overall Success Rate
dtype: float64
- name: Overall Total Duration
dtype: float64
- name: Pass@1
dtype: float64
- name: Pass@10
dtype: float64
- name: Pass@11
dtype: float64
- name: Pass@12
dtype: float64
- name: Pass@13
dtype: float64
- name: Pass@14
dtype: float64
- name: Pass@15
dtype: float64
- name: Pass@2
dtype: float64
- name: Pass@3
dtype: float64
- name: Pass@4
dtype: float64
- name: Pass@5
dtype: float64
- name: Pass@6
dtype: float64
- name: Pass@7
dtype: float64
- name: Pass@8
dtype: float64
- name: Pass@9
dtype: float64
- name: Pass^1
dtype: float64
- name: Pass^10
dtype: float64
- name: Pass^11
dtype: float64
- name: Pass^12
dtype: float64
- name: Pass^13
dtype: float64
- name: Pass^14
dtype: float64
- name: Pass^15
dtype: float64
- name: Pass^2
dtype: float64
- name: Pass^3
dtype: float64
- name: Pass^4
dtype: float64
- name: Pass^5
dtype: float64
- name: Pass^6
dtype: float64
- name: Pass^7
dtype: float64
- name: Pass^8
dtype: float64
- name: Pass^9
dtype: float64
- name: Total Surrendered Trials
dtype: int64
- name: Total Tasks
dtype: int64
- name: Total Tool Execution Duration
dtype: float64
- name: Tool Verbosity
dtype: string
- name: Total Benchmark Duration
dtype: float64
- name: Task Results
dtype: string
splits:
- name: train
num_examples: 18
- config_name: retrosynthesis
features:
- name: model
dtype: string
- name: agent_type
dtype: string
- name: environment
dtype: string
- name: condition
dtype: string
- name: condition_type
dtype: string
- name: step
dtype: int64
- name: prompt_tokens
dtype: int64
- name: completion_tokens
dtype: int64
- name: total_tokens
dtype: int64
- name: Total Tool Calls
dtype: string
- name: Average Score
dtype: float64
- name: Overall Average Duration
dtype: float64
- name: Overall Success Rate
dtype: float64
- name: Overall Total Duration
dtype: float64
- name: Pass@1
dtype: float64
- name: Pass@10
dtype: float64
- name: Pass@11
dtype: float64
- name: Pass@12
dtype: float64
- name: Pass@13
dtype: float64
- name: Pass@14
dtype: float64
- name: Pass@15
dtype: float64
- name: Pass@2
dtype: float64
- name: Pass@3
dtype: float64
- name: Pass@4
dtype: float64
- name: Pass@5
dtype: float64
- name: Pass@6
dtype: float64
- name: Pass@7
dtype: float64
- name: Pass@8
dtype: float64
- name: Pass@9
dtype: float64
- name: Pass^1
dtype: float64
- name: Pass^10
dtype: float64
- name: Pass^11
dtype: float64
- name: Pass^12
dtype: float64
- name: Pass^13
dtype: float64
- name: Pass^14
dtype: float64
- name: Pass^15
dtype: float64
- name: Pass^2
dtype: float64
- name: Pass^3
dtype: float64
- name: Pass^4
dtype: float64
- name: Pass^5
dtype: float64
- name: Pass^6
dtype: float64
- name: Pass^7
dtype: float64
- name: Pass^8
dtype: float64
- name: Pass^9
dtype: float64
- name: Total Surrendered Trials
dtype: int64
- name: Total Tasks
dtype: int64
- name: Total Tool Execution Duration
dtype: float64
- name: Tool Verbosity
dtype: string
- name: Total Benchmark Duration
dtype: float64
- name: Task Results
dtype: string
splits:
- name: train
num_examples: 18
- config_name: spectra
features:
- name: model
dtype: string
- name: agent_type
dtype: string
- name: environment
dtype: string
- name: condition
dtype: string
- name: condition_type
dtype: string
- name: step
dtype: int64
- name: prompt_tokens
dtype: int64
- name: completion_tokens
dtype: int64
- name: total_tokens
dtype: int64
- name: Total Tool Calls
dtype: string
- name: Average Score
dtype: float64
- name: Overall Average Duration
dtype: float64
- name: Overall Success Rate
dtype: float64
- name: Overall Total Duration
dtype: float64
- name: Pass@1
dtype: float64
- name: Pass@10
dtype: float64
- name: Pass@11
dtype: float64
- name: Pass@12
dtype: float64
- name: Pass@13
dtype: float64
- name: Pass@14
dtype: float64
- name: Pass@15
dtype: float64
- name: Pass@2
dtype: float64
- name: Pass@3
dtype: float64
- name: Pass@4
dtype: float64
- name: Pass@5
dtype: float64
- name: Pass@6
dtype: float64
- name: Pass@7
dtype: float64
- name: Pass@8
dtype: float64
- name: Pass@9
dtype: float64
- name: Pass^1
dtype: float64
- name: Pass^10
dtype: float64
- name: Pass^11
dtype: float64
- name: Pass^12
dtype: float64
- name: Pass^13
dtype: float64
- name: Pass^14
dtype: float64
- name: Pass^15
dtype: float64
- name: Pass^2
dtype: float64
- name: Pass^3
dtype: float64
- name: Pass^4
dtype: float64
- name: Pass^5
dtype: float64
- name: Pass^6
dtype: float64
- name: Pass^7
dtype: float64
- name: Pass^8
dtype: float64
- name: Pass^9
dtype: float64
- name: Total Surrendered Trials
dtype: int64
- name: Total Tasks
dtype: int64
- name: Total Tool Execution Duration
dtype: float64
- name: Tool Verbosity
dtype: string
- name: Total Benchmark Duration
dtype: float64
- name: Task Results
dtype: string
splits:
- name: train
num_examples: 18
- config_name: wetlab
features:
- name: model
dtype: string
- name: agent_type
dtype: string
- name: environment
dtype: string
- name: condition
dtype: string
- name: condition_type
dtype: string
- name: step
dtype: int64
- name: prompt_tokens
dtype: int64
- name: completion_tokens
dtype: int64
- name: total_tokens
dtype: int64
- name: Total Tool Calls
dtype: string
- name: Average Score
dtype: float64
- name: Overall Average Duration
dtype: float64
- name: Overall Success Rate
dtype: float64
- name: Overall Total Duration
dtype: float64
- name: Pass@1
dtype: float64
- name: Pass@10
dtype: float64
- name: Pass@11
dtype: float64
- name: Pass@12
dtype: float64
- name: Pass@13
dtype: float64
- name: Pass@14
dtype: float64
- name: Pass@15
dtype: float64
- name: Pass@2
dtype: float64
- name: Pass@3
dtype: float64
- name: Pass@4
dtype: float64
- name: Pass@5
dtype: float64
- name: Pass@6
dtype: float64
- name: Pass@7
dtype: float64
- name: Pass@8
dtype: float64
- name: Pass@9
dtype: float64
- name: Pass^1
dtype: float64
- name: Pass^10
dtype: float64
- name: Pass^11
dtype: float64
- name: Pass^12
dtype: float64
- name: Pass^13
dtype: float64
- name: Pass^14
dtype: float64
- name: Pass^15
dtype: float64
- name: Pass^2
dtype: float64
- name: Pass^3
dtype: float64
- name: Pass^4
dtype: float64
- name: Pass^5
dtype: float64
- name: Pass^6
dtype: float64
- name: Pass^7
dtype: float64
- name: Pass^8
dtype: float64
- name: Pass^9
dtype: float64
- name: Total Surrendered Trials
dtype: int64
- name: Total Tasks
dtype: int64
- name: Total Tool Execution Duration
dtype: float64
- name: Tool Verbosity
dtype: string
- name: Total Benchmark Duration
dtype: float64
- name: Task Results
dtype: string
splits:
- name: train
num_examples: 18
---
提供机构:
jablonkagroup
搜集汇总
数据集介绍
构建方式
在化学信息学领域,评估智能体在复杂科学任务中的干预能力至关重要。corral-intervention-reports数据集通过系统化实验设计构建,涵盖了催化剂设计、分子动力学、材料科学、电阻器合成、逆合成分析、光谱解析及湿实验室操作等多个专业子领域。每个子领域配置独立的实验环境,记录智能体在不同干预条件下的执行步骤、工具调用次数、任务完成率及耗时等关键指标。数据采集过程严格遵循标准化协议,确保实验条件的一致性,从而生成具有高可重复性的评估报告。
特点
该数据集以其多维度的评估框架而著称,囊括了从基础性能指标到深度行为分析的全方位特征。核心特征包括智能体类型、环境配置、干预条件分类及分步执行轨迹,同时整合了令牌消耗量、工具调用频率、任务成功率等量化指标。特别值得注意的是,数据集提供了从Pass@1到Pass@15的细粒度通过率分布,以及任务执行时长、工具冗余度等行为学参数,为深入解析智能体在科学工作流中的决策模式与效率瓶颈提供了丰富的数据支撑。
使用方法
研究人员可利用该数据集进行智能体在化学信息学场景下的系统性能力评估。通过加载不同子领域配置(如catalyst、retrosynthesis等),可横向比较各类模型在特定科学任务中的表现差异。典型应用包括分析干预条件对任务成功率的影响规律、探究工具调用模式与执行效率的关联性、以及基于时间序列数据优化智能体的决策路径。数据集支持通过标准数据管道直接导入,其结构化特征设计便于开发定制化评估指标,为下一代科学智能体的算法改进提供实证基础。
背景与挑战
背景概述
在人工智能与化学科学交叉领域,自主智能体执行实验任务的能力评估日益成为研究焦点。corral-intervention-reports数据集应运而生,旨在系统记录智能体在多样化化学实验环境中的干预报告与性能指标。该数据集由相关研究团队构建,其核心在于探究智能体在催化剂设计、材料发现、电阻器优化、逆合成分析、光谱解析及湿实验室操作等复杂场景下的决策效能与鲁棒性。通过整合多维度评估特征,如任务成功率、工具调用效率与耗时分析,该数据集为量化智能体在真实化学研究中的适应性提供了基准,推动了自动化实验与智能化学信息系统的发展。
当前挑战
该数据集致力于解决化学领域智能体在动态、不确定实验环境中进行有效干预的评估难题,其挑战体现在智能体需处理高维、异构的化学数据与实验约束,同时保持决策的准确性与可解释性。在构建过程中,挑战主要源于化学实验的复杂性与安全性要求,需设计涵盖多类实验场景的标准化任务,并确保干预报告的完整性与一致性。此外,数据采集需协调实验资源与智能体交互的实时记录,平衡评估指标的全面性与计算可行性,以建立可靠且可扩展的基准体系。
常用场景
经典使用场景
在化学与材料科学领域,人工智能驱动的自主代理正逐步成为实验设计与分析的重要工具。corral-intervention-reports数据集通过记录多种化学任务中智能代理的干预报告,为评估代理在催化剂设计、光谱解析、湿实验室操作等复杂场景下的性能提供了标准化基准。该数据集典型地用于系统比较不同模型或代理类型在特定化学环境中的成功率、任务完成时间及工具调用效率,从而揭示智能代理在结构化科学工作流中的行为模式与能力边界。
解决学术问题
该数据集有效应对了化学信息学与人工智能交叉研究中缺乏系统性评估框架的挑战。通过提供涵盖逆合成分析、电阻器设计、材料发现等多个化学子领域的细粒度性能指标,它使研究人员能够量化智能代理在解决非结构化科学问题时的可靠性、泛化能力与计算效率。这为理解机器学习模型在专业领域中的实际推理局限、优化代理架构以及设计更鲁棒的化学任务自动化系统奠定了实证基础,推动了化学人工智能从概念验证迈向实际应用的关键转变。
衍生相关工作
基于该数据集衍生的经典工作主要集中在化学领域专用智能代理的基准测试与算法改进。例如,研究团队利用其多任务评估框架开发了针对化学逆合成问题的强化学习代理,显著提升了路线规划的准确率。另有工作结合该数据集的干预日志,设计了新型工具调用策略以减少无效操作并优化任务执行流程。这些衍生研究不仅扩展了化学人工智能的应用边界,也为构建更通用、更可靠的跨学科科学智能代理提供了方法论借鉴与性能参照。
以上内容由遇见数据集搜集并总结生成



