five

mismayil/cresowlve

收藏
Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/mismayil/cresowlve
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: en features: - name: id dtype: string - name: question dtype: string - name: answer dtype: string - name: difficulty dtype: int64 - name: explanation dtype: string - name: other_answers dtype: string - name: knowledge_domains list: string - name: creative_domains list: string - name: cultures list: string splits: - name: test num_bytes: 1156563 num_examples: 2061 download_size: 619691 dataset_size: 1156563 - config_name: ru features: - name: id dtype: string - name: question dtype: string - name: answer dtype: string - name: difficulty dtype: int64 - name: explanation dtype: string - name: other_answers dtype: string - name: knowledge_domains list: string - name: creative_domains list: string - name: cultures list: string splits: - name: test num_bytes: 1759010 num_examples: 2061 download_size: 852319 dataset_size: 1759010 configs: - config_name: en data_files: - split: test path: en/test-* - config_name: ru data_files: - split: test path: ru/test-* license: apache-2.0 task_categories: - question-answering - text-generation language: - en - ru tags: - creativity - problem-solving size_categories: - 1K<n<10K --- # CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge ## Dataset Description This is a bilingual benchmark for creative problem-solving grounded in real-world knowledge and solvable by human experts. CresOWLve spans a diverse range of knowledge and creative domains, varies in difficulty, requires multiple creative thinking strategies, and is manually validated to ensure quality. It contains ~2K open-ended questions with answers and explanations. ## Dataset Sources - **Repository:** https://github.com/mismayil/cresowlve - **Paper:** https://arxiv.org/abs/2604.03374 ## Dataset Structure Each sample has the following fields: - `id`: Unique sample ID. - `question`: Open-ended question in text. - `answer`: Original answer to the question in text. - `difficulty`: Difficulty level of the question (1-5) - `explanation`: Explanation for the reference answer if available. - `other_answers`: Other acceptable answers if any. - `knowledge_domains`: List of knowledge domains (subjects, topics) involved in answering the question. - `creative_domains`: List of creative domains/thinking strategies involved in answering the question. - `cultures`: List of cultures/demographics involved in answering the question. ## Knowledge Domains The benchmark questions involve knowledge about at least one of the following: Literature, History, Film & Media Studies, Languages & Linguistics, Human Geography, Religious Studies, Anthropology, Physical Education & Sports, Biology, Engineering & Technology, Visual Arts, Music, Political Science, Home Economics & Daily Life, Performing Arts, Psychology, Sociology, Earth & Environmental Science, Military, Physics, Astronomy & Space Science, Business Studies, Philosophy, Design & Architecture, Medicine & Health Sciences, Economics, Mathematics, Chemistry, Law & Criminology, Other Sciences, Art History & Visual Culture, Education, Communication, Archaeology ## Creative Language & Thinking The benchmark questions involve several creative language domains and skills: lateral thinking, analogy, abstraction, joke, pun, metaphor, commonsense reasoning, poem, idiom, neologism, sarcasm, proverb, divergent thinking, compositionality, simile ## Cultures & Demographics The benchmark questions require knowledge about entities and people from diverse cultures: English, Russian, French, German, Italian, Greek, Latin, American, Spanish, Japanese, Polish, Arabic, Dutch, Swedish, Chinese, Hebrew, Ukrainian, Roman, Indian, Norwegian, Danish, Scottish, Portuguese, Turkish, Czech, Swiss, Egyptian, Georgian, Irish, Persian, Brazilian, European, Armenian and many others. ## Source Data The benchmark questions are sourced from the well-known Russian intellectual game [What?Where?When?](https://en.wikipedia.org/wiki/What%3F_Where%3F_When%3F). To ensure accessibility and relevance, we design a multi-stage benchmark construction pipeline that filters unsuitable and non-creative questions and translates the remaining puzzles into English with manual validation. The resulting dataset provides a diverse and high-quality benchmark for evaluating creative problem-solving grounded in real-world knowledge. See the paper for more details on the benchmark construction. ## Citation ``` @misc{ismayilzada2026cresowlvebenchmarkingcreativeproblemsolving, title={CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge}, author={Mete Ismayilzada and Renqing Cuomao and Daniil Yurshevich and Anna Sotnikova and Lonneke van der Plas and Antoine Bosselut}, year={2026}, eprint={2604.03374}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2604.03374}, } ```
提供机构:
mismayil
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作