m-a-p/Encyclo-K
收藏Hugging Face2026-02-09 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/m-a-p/Encyclo-K
下载链接
链接失效反馈官方服务:
资源简介:
Encyclo-K是一个基于声明的基准测试,从根本上重新思考了基准测试的构建方式。我们的关键观察是,问题本身不需要是策划的基本单位——个别知识声明可以是。
### 主要特点
- **动态评估**:我们从权威教科书中提取独立的知识声明,并在测试时通过随机抽样动态组合成评估问题。组合空间太大,无法记忆,从而实现可靠的定期数据集刷新。
- **多声明理解**:每个问题聚合8-10个声明,用于全面的多知识评估,超越了单声明问题所能探究的范围。
- **成本效益高的注释**:注释者只需验证格式合规性,无需领域专业知识,大大降低了注释成本。
- **抗污染性**:即使个别声明出现在训练数据中,它们的组合形成了一个组合空间,太大而无法记忆。
### 问题分布
数据集包含11个学科、44个领域和62个子领域的5,038个问题。学科分布与声明比例成比例:科学问题最多(1,242,24.7%),而哲学问题最少(61,1.2%)。每个问题包含8-10个声明、4-8个选项和2-4个组合。
**_Encyclo-K_** is a statement-based benchmark that rethinks benchmark construction from the ground up. Our key observation is that the question itself need not be the atomic unit of curation—individual knowledge statements can be.
### Key Features
- **Dynamic Evaluation**: We extract standalone knowledge statements from authoritative textbooks and dynamically compose them into evaluation questions through random sampling at test time. The combinatorial space is too vast to memorize, enabling reliable periodic dataset refresh.
- **Multi-Statement Comprehension**: Each question aggregates 8-10 statements for comprehensive multi-knowledge assessment, going beyond what single-statement questions can probe.
- **Cost-Effective Annotation**: Annotators only verify formatting compliance without requiring domain expertise, substantially reducing annotation costs.
- **Contamination Resistance**: Even if individual statements appear in training data, their compositions form a combinatorial space too vast to memorize.
### Question Distribution
The dataset comprises 5,038 questions across 11 disciplines, 44 fields, and 62 subfields. The disciplinary distribution is proportional to statement ratios: Science has the most questions (1,242, 24.7%), while Philosophy has the fewest (61, 1.2%). Each question contains 8–10 statements, 4–8 options, and 2–4 combinations.
提供机构:
m-a-p



