five

madrylab/platinum-bench

收藏
Hugging Face2025-04-14 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/madrylab/platinum-bench
下载链接
链接失效反馈
官方服务:
资源简介:
Platinum Benchmarks是一组经过精心策划的基准测试,旨在最小化标签错误和歧义,以便测量模型的可靠性。该数据集包含十五个经过手动修订的基准测试,这些测试是从现有数据集中修订而来的。修订过程中,对于至少一个模型出错的示例进行了手动重新注释。数据集结构包括问题提示、目标答案等字段,并且还包含原始数据集的字段。为了使用修订后的基准测试,需要过滤掉被拒绝的问题。

Platinum Benchmarks are a set of carefully curated benchmarks designed to minimize label errors and ambiguity, allowing for the measurement of model reliability. This dataset contains fifteen manually revised benchmarks, which are revised from existing datasets. The revision process involves manually re-annotating examples that were incorrectly labeled by at least one model. The dataset structure includes fields such as question prompts, target answers, and also includes fields from the original dataset. To use the revised benchmarks, it is necessary to filter out rejected questions.
提供机构:
madrylab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作