aims-foundations/measurement-db
收藏Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/aims-foundations/measurement-db
下载链接
链接失效反馈官方服务:
资源简介:
Measurement Data Bank (MDB)是一个包含146个AI评估基准的响应矩阵的数据集,用于IRT/心理测量分析。数据集由AIMS Foundation维护,并托管在HuggingFace Hub上。数据集包括92个已准备好的基准、14个仅聚合的基准和40个待处理的基准。每个基准都包含一个自包含的`build.py`文件,用于下载原始数据、构建响应矩阵、生成热图、转换为`.pt`有效负载并上传到HuggingFace Hub。数据集的结构包括原始数据目录、处理后的数据目录(包含响应矩阵、热图可视化、项目内容、模型摘要和任务元数据等)。数据集还提供了详细的统计信息,包括基准数量、唯一项目数量、总项目数量、总单元格数量、响应矩阵数量等。
The Measurement Data Bank (MDB) is a dataset of response matrices from 146 AI evaluation benchmarks, standardized as `(subjects × items)` matrices for IRT / psychometric analysis. It is curated by the AIMS Foundation and hosted on HuggingFace Hub. The dataset includes 92 ready benchmarks, 14 aggregate-only benchmarks, and 40 pending benchmarks. Each benchmark has a single self-contained `build.py` that downloads raw data, builds a response matrix, generates a heatmap, converts the result to a `.pt` payload, and uploads it to HuggingFace Hub. The dataset structure includes raw data directories and processed data directories (containing response matrices, heatmap visualizations, item content, model summaries, and task metadata, etc.). The dataset also provides detailed statistics, including the number of benchmarks, unique items, total items, total cells, response matrices, etc.
提供机构:
aims-foundations



