mismayil/cresowlve
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/mismayil/cresowlve
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: en
features:
- name: id
dtype: string
- name: question
dtype: string
- name: answer
dtype: string
- name: difficulty
dtype: int64
- name: explanation
dtype: string
- name: other_answers
dtype: string
- name: knowledge_domains
list: string
- name: creative_domains
list: string
- name: cultures
list: string
splits:
- name: test
num_bytes: 1156563
num_examples: 2061
download_size: 619691
dataset_size: 1156563
- config_name: ru
features:
- name: id
dtype: string
- name: question
dtype: string
- name: answer
dtype: string
- name: difficulty
dtype: int64
- name: explanation
dtype: string
- name: other_answers
dtype: string
- name: knowledge_domains
list: string
- name: creative_domains
list: string
- name: cultures
list: string
splits:
- name: test
num_bytes: 1759010
num_examples: 2061
download_size: 852319
dataset_size: 1759010
configs:
- config_name: en
data_files:
- split: test
path: en/test-*
- config_name: ru
data_files:
- split: test
path: ru/test-*
license: apache-2.0
task_categories:
- question-answering
- text-generation
language:
- en
- ru
tags:
- creativity
- problem-solving
size_categories:
- 1K<n<10K
---
# CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge
## Dataset Description
This is a bilingual benchmark for creative problem-solving grounded in real-world knowledge and solvable by human experts.
CresOWLve spans a diverse range of knowledge and creative domains, varies in difficulty, requires multiple creative thinking strategies, and is manually validated to ensure quality.
It contains ~2K open-ended questions with answers and explanations.
## Dataset Sources
- **Repository:** https://github.com/mismayil/cresowlve
- **Paper:** https://arxiv.org/abs/2604.03374
## Dataset Structure
Each sample has the following fields:
- `id`: Unique sample ID.
- `question`: Open-ended question in text.
- `answer`: Original answer to the question in text.
- `difficulty`: Difficulty level of the question (1-5)
- `explanation`: Explanation for the reference answer if available.
- `other_answers`: Other acceptable answers if any.
- `knowledge_domains`: List of knowledge domains (subjects, topics) involved in answering the question.
- `creative_domains`: List of creative domains/thinking strategies involved in answering the question.
- `cultures`: List of cultures/demographics involved in answering the question.
## Knowledge Domains
The benchmark questions involve knowledge about at least one of the following:
Literature, History, Film & Media Studies, Languages & Linguistics, Human Geography,
Religious Studies, Anthropology, Physical Education & Sports, Biology, Engineering & Technology,
Visual Arts, Music, Political Science, Home Economics & Daily Life, Performing Arts,
Psychology, Sociology, Earth & Environmental Science, Military, Physics,
Astronomy & Space Science, Business Studies, Philosophy, Design & Architecture,
Medicine & Health Sciences, Economics, Mathematics, Chemistry, Law & Criminology, Other Sciences,
Art History & Visual Culture, Education, Communication, Archaeology
## Creative Language & Thinking
The benchmark questions involve several creative language domains and skills:
lateral thinking, analogy, abstraction, joke, pun, metaphor, commonsense reasoning, poem, idiom,
neologism, sarcasm, proverb, divergent thinking, compositionality, simile
## Cultures & Demographics
The benchmark questions require knowledge about entities and people from diverse cultures:
English, Russian, French, German, Italian, Greek, Latin, American, Spanish, Japanese, Polish, Arabic,
Dutch, Swedish, Chinese, Hebrew, Ukrainian, Roman, Indian, Norwegian, Danish, Scottish, Portuguese,
Turkish, Czech, Swiss, Egyptian, Georgian, Irish, Persian, Brazilian, European, Armenian and many others.
## Source Data
The benchmark questions are sourced from the well-known Russian intellectual game [What?Where?When?](https://en.wikipedia.org/wiki/What%3F_Where%3F_When%3F).
To ensure accessibility and relevance, we design a multi-stage benchmark construction pipeline that filters unsuitable
and non-creative questions and translates the remaining puzzles into English with manual validation.
The resulting dataset provides a diverse and high-quality benchmark for evaluating creative problem-solving grounded in real-world knowledge.
See the paper for more details on the benchmark construction.
## Citation
```
@misc{ismayilzada2026cresowlvebenchmarkingcreativeproblemsolving,
title={CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge},
author={Mete Ismayilzada and Renqing Cuomao and Daniil Yurshevich and Anna Sotnikova and Lonneke van der Plas and Antoine Bosselut},
year={2026},
eprint={2604.03374},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2604.03374},
}
```
提供机构:
mismayil



