madrylab/platinum-bench-paper-version
收藏Hugging Face2025-02-06 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/madrylab/platinum-bench-paper-version
下载链接
链接失效反馈官方服务:
资源简介:
Platinum Benchmarks 是一系列经过精心策划的基准测试,旨在通过最小化标签错误和歧义来测量模型的可靠性。该数据集包含十五个子数据集,每个子数据集都是通过手动修订现有数据集中的问题而创建的。这些子数据集包含了各种类型的问题和答案,以及用于特定任务的附加信息,如表格、图像和方程。
Platinum Benchmarks are a set of carefully curated benchmark tests designed to measure the reliability of models by minimizing label errors and ambiguity. The dataset comprises fifteen sub-datasets, each created by manually revising questions from existing datasets. These sub-datasets include various types of questions and answers, as well as additional information for specific tasks such as tables, images, and equations.
提供机构:
madrylab



