five

cxzdsa1234/measurement-db

收藏
Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/cxzdsa1234/measurement-db
下载链接
链接失效反馈
官方服务:
资源简介:
Measurement Data Bank (MDB)是一个精心整理的AI评估基准数据集合,包含146个AI评估基准的响应矩阵,这些矩阵被标准化为`(subjects × items)`矩阵,用于IRT/心理测量分析。数据集分为三类:92个已准备好的基准(BENCHMARKS),这些基准提供了真实的每(模型,项目)响应矩阵;14个仅聚合的基准(BENCHMARKS_AGGREGATE),这些基准包含多模型数据,但仅针对条件/类别级别,而非单个项目;以及40个待处理的基准(BENCHMARKS_PENDING),这些基准包括没有多模型评估数据的问题/目录。每个基准都有一个自包含的`build.py`脚本,用于下载原始数据、构建响应矩阵、生成热图、转换为`.pt`有效载荷并上传到HuggingFace Hub。数据集还提供了详细的统计信息,包括基准数量、唯一项目数量、总项目数量、总单元格数量等,并提供了快速入门指南和目录结构说明。

The Measurement Data Bank (MDB) is a curated collection of response matrices from 146 AI evaluation benchmarks, standardized as `(subjects × items)` matrices for IRT / psychometric analysis. The dataset is divided into three categories: 92 ready benchmarks (BENCHMARKS) that provide real per-(model, item) response matrices; 14 aggregate-only benchmarks (BENCHMARKS_AGGREGATE) that contain multi-model data but at the level of conditions/categories, not individual items; and 40 pending benchmarks (BENCHMARKS_PENDING) that include questions/catalogs with no multi-model evaluation data yet. Each benchmark has a single self-contained `build.py` that downloads raw data, builds a response matrix, generates a heatmap, converts the result to a `.pt` payload, and uploads it to HuggingFace Hub. The dataset also provides detailed statistics, including the number of benchmarks, unique items, total items, total cells, etc., and offers a quick start guide and directory structure explanation.
提供机构:
cxzdsa1234
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作