kshitijthakkar/smoltrace-results-20260424_122614
收藏Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/kshitijthakkar/smoltrace-results-20260424_122614
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了SMOLTRACE基准运行的评估结果,详细记录了模型标识符、评估日期、任务ID、代理类型、测试难度级别、测试提示/问题、测试是否通过、工具是否被调用、是否正确使用工具、最终答案是否被调用、响应是否正确、使用的工具列表、代理步骤数、代理的最终响应、错误信息(如果失败)、OpenTelemetry跟踪ID、执行时间(毫秒)、总令牌消耗、API成本(美元)以及详细的跟踪数据JSON。数据集旨在为Smolagents(HuggingFace的轻量级代理库)提供一个全面的基准测试和评估框架,支持自动代理评估、详细的执行洞察、GPU指标收集、CO2排放和电力成本跟踪以及排行榜聚合和比较。
This dataset contains evaluation results from a SMOLTRACE benchmark run, detailing model identifiers, evaluation dates, task IDs, agent types, test difficulty levels, test prompts/questions, whether tests passed, whether tools were invoked, whether the correct tools were used, whether final answers were called, whether responses were correct, lists of tools used, number of agent steps taken, agents final responses, error messages (if failed), OpenTelemetry trace IDs, execution times in milliseconds, total token consumption, API costs in USD, and detailed trace data JSON. The dataset aims to provide a comprehensive benchmarking and evaluation framework for Smolagents (HuggingFaces lightweight agent library), supporting automated agent evaluation, detailed execution insights, GPU metrics collection, CO2 emissions and power cost tracking, and leaderboard aggregation and comparison.
提供机构:
kshitijthakkar



