T-Eval

Name: T-Eval
Creator: OpenDataLab
Published: 2026-05-24 12:30:56
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-18 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/T-Eval

下载链接

链接失效反馈

官方服务：

资源简介：

T-Eval disentangles the tool utilization evaluation into several sub-domains along model capabilities, facilitating the inner understanding of both holistic and isolated competency of LLMs. We conduct extensive experiments on T-Eval and in-depth analysis of various LLMs. T-Eval not only exhibits consistency with the outcome-oriented evaluation but also provides a more fine-grained analysis of the capabilities of LLMs, providing a new perspective in LLM evaluation on tool-utilization ability.

T-Eval 依据模型能力维度，将工具使用能力评估拆解为多个细分子领域，以便深入理解大语言模型（Large Language Model）的整体性能与单项能力。我们在T-Eval上开展了大规模实验，并对多款大语言模型进行了深度分析。T-Eval不仅与结果导向型评估结果保持一致，还能对大语言模型的能力开展更细粒度的分析，为大语言模型的工具使用能力评估提供了全新视角。

提供机构：

OpenDataLab

创建时间：

2024-05-14

搜集汇总

数据集介绍