JianhaoNJU/IrtNet-Dataset
收藏Hugging Face2025-10-17 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/JianhaoNJU/IrtNet-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集用于通过项目反应理论学习大型语言模型的能力的紧凑表示。数据集主要与EmbedLLM的数据集相似,通过多数表决的方式对同一查询的多个答案进行合并。如果0和1的数量相同,则优先选择1。此步骤确保了每个模型-查询对的唯一真实值,这对测试集尤其重要。数据集包含了来自10个公共基准的35,673个查询,包括ASDiv、GPQA、GSM8K、MathQA、LogiQA、MedMCQA、MMLU、SocialIQA、PIQA和TruthfulQA。对112个开源语言模型对这些查询的答案的正确性进行了评估。查询被转换为768维的嵌入,使用all-mpnet-base-v2句子转换器。查询被分为29,673个查询的训练集、3,000个查询的验证集和3,000个查询的测试集。
The dataset is used for learning compact representations of LLM abilities via Item Response Theory. It is mostly similar to the EmbedLLM dataset and involves consolidating multiple answers from a model to the same query using a majority vote. The dataset comprises 35,673 queries from 10 public benchmarks, including ASDiv, GPQA, GSM8K, MathQA, LogiQA, MedMCQA, MMLU, SocialIQA, PIQA, and TruthfulQA. The correctness of answers from 112 open-source language models to these queries was evaluated. The queries were converted into 768-dimensional embeddings using the all-mpnet-base-v2 sentence transformer. The dataset is split into a training set of 29,673 queries, a validation set of 3,000 queries, and a test set of 3,000 queries.
提供机构:
JianhaoNJU



