Catlaugh/idk_eval

Name: Catlaugh/idk_eval
Creator: Catlaugh
Published: 2025-12-17 11:57:46
License: 暂无描述

Hugging Face2025-12-17 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/Catlaugh/idk_eval

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个用于评估大型语言模型（LLMs）表达不确定性（如‘我不知道’）能力的多项选择题（MCQ）数据集。数据集合并了来自MMLU-Pro、LEXam和MedXpertQA的MCQs，支持不同选项长度（k=4-10）。数据集包括法律和医学领域的多个子类别，排除了MMLU的‘other’类别。每个类别采样了115个问题，并生成了不同选项长度的变体。数据集的特性包括问题ID、问题文本、选项列表、答案索引和类别等。

An MCQ Dataset for Evaluating LLMs’ Ability to Express Uncertainty (`I Dont Know`). This dataset merges MCQs sampled from `MMLU-Pro`, `LEXam`, and `MedXpertQA`, supporting variable option lengths (k = 4–10). The dataset includes multiple categories from legal and medical domains, excluding the other category from MMLU. Each category was sampled with 115 questions, and variants with different option lengths were generated. The dataset features include question ID, question text, options list, answer index, and category.

提供机构：

Catlaugh

5,000+

优质数据集

54 个

任务类型

进入经典数据集