stair-lab/reeval
收藏Hugging Face2025-06-21 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/stair-lab/reeval
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是根据论文Reliable and Efficient Amortized Model-based Evaluation实现的,它将HELM数据转换为长格式和响应矩阵格式,并使用Llama-3.1-8B-Instruct和Mistral-7B-Instruct-v0.3两个语言模型来获取问题的嵌入。数据集包含了用于自适应测试实验的测试参与者能力参数和问题难度参数。
This dataset is implemented based on the paper Reliable and Efficient Amortized Model-based Evaluation. It converts HELM data into long format and response matrix format, and uses two language models, Llama-3.1-8B-Instruct and Mistral-7B-Instruct-v0.3, to obtain embeddings for questions. The dataset includes test taker ability parameters and question difficulty parameters for adaptive testing experiments.
提供机构:
stair-lab



