five

kshitijthakkar/smoltrace-results-20260424_111528

收藏
Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/kshitijthakkar/smoltrace-results-20260424_111528
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了一个SMOLTRACE基准运行的评估结果,记录了使用`openai/gpt-5.4-nano`模型进行的测试结果。数据集提供了详细的评估信息,包括模型标识、评估日期、任务ID、代理类型、测试难度级别、测试提示/问题、是否成功、是否调用了工具、是否正确使用了工具、是否调用了最终答案、响应是否正确、使用的工具列表、采取的步骤数、代理的最终响应、错误消息(如果失败)、OpenTelemetry跟踪ID、执行时间(毫秒)、总消耗令牌数、API成本(美元)以及详细的跟踪数据JSON。数据集还提供了使用方法,包括如何加载数据集、过滤成功测试和计算成功率。此外,README还介绍了SMOLTRACE框架的关键特性,如自动化代理评估、基于OpenTelemetry的跟踪、GPU指标收集、CO2排放和功率成本跟踪,以及排行榜聚合和比较。

This dataset contains evaluation results from a SMOLTRACE benchmark run, recording the test results using the `openai/gpt-5.4-nano` model. The dataset provides detailed evaluation information, including model identifier, evaluation date, task ID, agent type, test difficulty level, test prompt/question, whether the test passed, whether a tool was invoked, whether the correct tool was used, whether final_answer was called, whether the response was correct, a comma-separated list of tools used, number of agent steps taken, agents final response, error message if failed, OpenTelemetry trace ID, execution time in milliseconds, total tokens consumed, API cost in USD, and JSON with detailed trace data. The dataset also provides usage instructions, including how to load the dataset, filter successful tests, and calculate the success rate. Additionally, the README introduces the key features of the SMOLTRACE framework, such as automated agent evaluation, OpenTelemetry-based tracing, GPU metrics collection, CO2 emissions and power cost tracking, and leaderboard aggregation and comparison.
提供机构:
kshitijthakkar
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作