omega-explorative
收藏魔搭社区2025-12-05 更新2025-06-28 收录
下载链接:
https://modelscope.cn/datasets/allenai/omega-explorative
下载链接
链接失效反馈官方服务:
资源简介:
# Explorative Math Problems
This dataset contains explorative mathematical problem settings in paper "OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization" that assess whether a model can **faithfully extend** a single reasoning strategy beyond the range of complexities seen during training.
## Overview
Exploratory generalization assesses whether a model can faithfully extend a single reasoning strategy beyond the range of complexities seen during training. Concretely, the model is exposed to problems drawn from one template, all lying within a "low-complexity" regime, and is then evaluated on **harder** instances from the same family. This axis probes robustness: does the model generalize the same algorithm to higher complexity problems? or does it merely memorize solutions at a fixed difficulty level?
Each explorative setting consists of:
- **Training Dataset**: Low-complexity problems from a specific mathematical domain
- **Test In-Distribution**: Problems of similar complexity to training data (test-in)
- **Test Out-of-Distribution**: Higher complexity problems from the same domain/template (test-out)
## Quick Start
```python
from datasets import load_dataset
# Load all explorative settings
dataset = load_dataset("allenai/omega-explorative")
# Load a specific explorative setting with all splits
func_area_data = load_dataset("allenai/omega-explorative", "algebra_func_area")
train_data = func_area_data["train"] # Low complexity training problems
test_in_data = func_area_data["test_in"] # Similar complexity test problems
test_out_data = func_area_data["test_out"] # Higher complexity test problems
# Load just specific splits
train_only = load_dataset("allenai/omega-explorative", "algebra_func_area", split="train")
test_out_only = load_dataset("allenai/omega-explorative", "algebra_func_area", split="test_out")
```
## Dataset Description
Each explorative setting provides three splits to evaluate different aspects of generalization:
- **Train**: Low-complexity problems used for training/few-shot learning
- **Test-in**: Problems of similar complexity to training data (in-distribution evaluation)
- **Test-out**: Higher complexity problems from the same template (out-of-distribution evaluation)
The key challenge is whether models can extend the same reasoning strategy from low-complexity training problems to higher-complexity test problems in the same domain.
## Citation
If you use this dataset, please cite the original work:
```bibtex
@article{sun2024omega,
title = {OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization},
author = {Yiyou Sun and Shawn Hu and Georgia Zhou and Ken Zheng and Hannaneh Hajishirzi and Nouha Dziri and Dawn Song},
journal = {arXiv preprint arXiv:2506.18880},
year = {2024},
}
```
## Related Resources
- **Transformative Dataset**: See [omega-transformative](https://huggingface.co/datasets/sunyiyou/omega-transformative) for transformative reasoning challenges
- **Compositional Dataset**: See [omega-compositional](https://huggingface.co/datasets/sunyiyou/omega-compositional) for compositional reasoning challenges
- **Paper**: See the full details in [paper](https://arxiv.org/pdf/2506.18880)
- **Code Repository**: See generation code on [github](https://github.com/sunblaze-ucb/math_ood)
# 探索性数学问题数据集
本数据集取自论文《OMEGA:大语言模型(Large Language Model)能否在数学领域跳出固有思维?评估探索式、组合式与转换式泛化能力》(OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization)中的探索式数学问题设定,用于评估模型能否**忠实拓展**单一推理策略,将其应用于训练阶段未见过的更高复杂度场景。
## 概述
探索式泛化旨在评估模型能否将单一推理策略忠实拓展至训练阶段接触的复杂度范围之外。具体而言,模型将学习来自同一模板、且均处于“低复杂度”区间的问题,随后在同一问题家族中难度更高的样本上进行评估。该维度用于探究模型的鲁棒性:模型是能够将同一算法泛化至更高复杂度的问题,还是仅能记住固定难度下的解题方案?
每个探索式任务设定包含以下部分:
- **训练集**:来自特定数学领域的低复杂度问题
- **分布内测试集(test-in)**:与训练数据复杂度相近的测试问题
- **分布外测试集(test-out)**:来自同一领域/模板的高复杂度测试问题
## 快速上手
python
from datasets import load_dataset
# 加载所有探索式任务设定
dataset = load_dataset("allenai/omega-explorative")
# 加载指定的探索式任务设定及其全部拆分
func_area_data = load_dataset("allenai/omega-explorative", "algebra_func_area")
train_data = func_area_data["train"] # 低复杂度训练样本
test_in_data = func_area_data["test_in"] # 同复杂度分布内测试样本
test_out_data = func_area_data["test_out"] # 高复杂度分布外测试样本
# 仅加载指定拆分
train_only = load_dataset("allenai/omega-explorative", "algebra_func_area", split="train")
test_out_only = load_dataset("allenai/omega-explorative", "algebra_func_area", split="test_out")
## 数据集说明
每个探索式任务设定包含三个数据拆分,用于评估泛化能力的不同维度:
- **训练集**:用于训练或少样本学习的低复杂度问题
- **分布内测试集**:与训练数据复杂度相近的测试样本(用于分布内评估)
- **分布外测试集**:来自同一模板的高复杂度问题(用于分布外评估)
本数据集的核心挑战在于:模型能否将低复杂度训练样本中习得的推理策略,拓展至同一领域内的高复杂度测试样本中。
## 引用格式
若使用本数据集,请引用以下原始文献:
bibtex
@article{sun2024omega,
title = {OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization},
author = {Yiyou Sun and Shawn Hu and Georgia Zhou and Ken Zheng and Hannaneh Hajishirzi and Nouha Dziri and Dawn Song},
journal = {arXiv preprint arXiv:2506.18880},
year = {2024},
}
## 相关资源
- **转换式推理数据集**:可访问 [omega-transformative](https://huggingface.co/datasets/sunyiyou/omega-transformative) 获取转换式推理挑战数据集
- **组合式推理数据集**:可访问 [omega-compositional](https://huggingface.co/datasets/sunyiyou/omega-compositional) 获取组合式推理挑战数据集
- **论文原文**:可访问 [paper](https://arxiv.org/pdf/2506.18880) 获取完整研究细节
- **代码仓库**:可访问 [github](https://github.com/sunblaze-ucb/math_ood) 获取数据集生成代码
提供机构:
maas
创建时间:
2025-06-25



