cxzdsa1234/measurement-db

Name: cxzdsa1234/measurement-db
Creator: cxzdsa1234
Published: 2026-04-24 07:03:23
License: 暂无描述

Hugging Face2026-04-24 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/cxzdsa1234/measurement-db

下载链接

链接失效反馈

官方服务：

资源简介：

Measurement Data Bank (MDB)是一个精心整理的AI评估基准数据集合，包含146个AI评估基准的响应矩阵，这些矩阵被标准化为`(subjects × items)`矩阵，用于IRT/心理测量分析。数据集分为三类：92个已准备好的基准（BENCHMARKS），这些基准提供了真实的每（模型，项目）响应矩阵；14个仅聚合的基准（BENCHMARKS_AGGREGATE），这些基准包含多模型数据，但仅针对条件/类别级别，而非单个项目；以及40个待处理的基准（BENCHMARKS_PENDING），这些基准包括没有多模型评估数据的问题/目录。每个基准都有一个自包含的`build.py`脚本，用于下载原始数据、构建响应矩阵、生成热图、转换为`.pt`有效载荷并上传到HuggingFace Hub。数据集还提供了详细的统计信息，包括基准数量、唯一项目数量、总项目数量、总单元格数量等，并提供了快速入门指南和目录结构说明。

The Measurement Data Bank (MDB) is a curated collection of response matrices from 146 AI evaluation benchmarks, standardized as `(subjects × items)` matrices for IRT / psychometric analysis. The dataset is divided into three categories: 92 ready benchmarks (BENCHMARKS) that provide real per-(model, item) response matrices; 14 aggregate-only benchmarks (BENCHMARKS_AGGREGATE) that contain multi-model data but at the level of conditions/categories, not individual items; and 40 pending benchmarks (BENCHMARKS_PENDING) that include questions/catalogs with no multi-model evaluation data yet. Each benchmark has a single self-contained `build.py` that downloads raw data, builds a response matrix, generates a heatmap, converts the result to a `.pt` payload, and uploads it to HuggingFace Hub. The dataset also provides detailed statistics, including the number of benchmarks, unique items, total items, total cells, etc., and offers a quick start guide and directory structure explanation.

提供机构：

cxzdsa1234

5,000+

优质数据集

54 个

任务类型

进入经典数据集