five

alexandrainst/multi-zebra-logic

收藏
Hugging Face2026-02-18 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/alexandrainst/multi-zebra-logic
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 configs: - config_name: dataset_da_huse_2x3_5rh data_files: - split: train path: dataset_da_huse_2x3_5rh/train-* - split: val path: dataset_da_huse_2x3_5rh/val-* - split: test path: dataset_da_huse_2x3_5rh/test-* - config_name: dataset_da_huse_4x5_5rh data_files: - split: train path: dataset_da_huse_4x5_5rh/train-* - split: val path: dataset_da_huse_4x5_5rh/val-* - split: test path: dataset_da_huse_4x5_5rh/test-* - config_name: dataset_da_smoerrebroed_2x3_5rh data_files: - split: train path: dataset_da_smoerrebroed_2x3_5rh/train-* - split: val path: dataset_da_smoerrebroed_2x3_5rh/val-* - split: test path: dataset_da_smoerrebroed_2x3_5rh/test-* - config_name: dataset_da_smoerrebroed_4x5_5rh data_files: - split: train path: dataset_da_smoerrebroed_4x5_5rh/train-* - split: val path: dataset_da_smoerrebroed_4x5_5rh/val-* - split: test path: dataset_da_smoerrebroed_4x5_5rh/test-* - config_name: dataset_de_hauser_2x3_5rh data_files: - split: train path: dataset_de_hauser_2x3_5rh/train-* - split: val path: dataset_de_hauser_2x3_5rh/val-* - split: test path: dataset_de_hauser_2x3_5rh/test-* - config_name: dataset_de_hauser_4x5_5rh data_files: - split: train path: dataset_de_hauser_4x5_5rh/train-* - split: val path: dataset_de_hauser_4x5_5rh/val-* - split: test path: dataset_de_hauser_4x5_5rh/test-* - config_name: dataset_en_houses_2x3_5rh data_files: - split: train path: dataset_en_houses_2x3_5rh/train-* - split: val path: dataset_en_houses_2x3_5rh/val-* - split: test path: dataset_en_houses_2x3_5rh/test-* - config_name: dataset_en_houses_4x5_5rh data_files: - split: train path: dataset_en_houses_4x5_5rh/train-* - split: val path: dataset_en_houses_4x5_5rh/val-* - split: test path: dataset_en_houses_4x5_5rh/test-* - config_name: dataset_fo_hus_2x3_5rh data_files: - split: train path: dataset_fo_hus_2x3_5rh/train-* - split: val path: dataset_fo_hus_2x3_5rh/val-* - split: test path: dataset_fo_hus_2x3_5rh/test-* - config_name: dataset_fo_hus_4x5_5rh data_files: - split: train path: dataset_fo_hus_4x5_5rh/train-* - split: val path: dataset_fo_hus_4x5_5rh/val-* - split: test path: dataset_fo_hus_4x5_5rh/test-* - config_name: dataset_is_husum_2x3_5rh data_files: - split: train path: dataset_is_husum_2x3_5rh/train-* - split: val path: dataset_is_husum_2x3_5rh/val-* - split: test path: dataset_is_husum_2x3_5rh/test-* - config_name: dataset_is_husum_4x5_5rh data_files: - split: train path: dataset_is_husum_4x5_5rh/train-* - split: val path: dataset_is_husum_4x5_5rh/val-* - split: test path: dataset_is_husum_4x5_5rh/test-* - config_name: dataset_nb_hus_2x3_5rh data_files: - split: train path: dataset_nb_hus_2x3_5rh/train-* - split: val path: dataset_nb_hus_2x3_5rh/val-* - split: test path: dataset_nb_hus_2x3_5rh/test-* - config_name: dataset_nb_hus_4x5_5rh data_files: - split: train path: dataset_nb_hus_4x5_5rh/train-* - split: val path: dataset_nb_hus_4x5_5rh/val-* - split: test path: dataset_nb_hus_4x5_5rh/test-* - config_name: dataset_nl_huizen_2x3_5rh data_files: - split: train path: dataset_nl_huizen_2x3_5rh/train-* - split: val path: dataset_nl_huizen_2x3_5rh/val-* - split: test path: dataset_nl_huizen_2x3_5rh/test-* - config_name: dataset_nl_huizen_4x5_5rh data_files: - split: train path: dataset_nl_huizen_4x5_5rh/train-* - split: val path: dataset_nl_huizen_4x5_5rh/val-* - split: test path: dataset_nl_huizen_4x5_5rh/test-* - config_name: dataset_nn_hus_2x3_5rh data_files: - split: train path: dataset_nn_hus_2x3_5rh/train-* - split: val path: dataset_nn_hus_2x3_5rh/val-* - split: test path: dataset_nn_hus_2x3_5rh/test-* - config_name: dataset_nn_hus_4x5_5rh data_files: - split: train path: dataset_nn_hus_4x5_5rh/train-* - split: val path: dataset_nn_hus_4x5_5rh/val-* - split: test path: dataset_nn_hus_4x5_5rh/test-* - config_name: dataset_sv_hus_2x3_5rh data_files: - split: train path: dataset_sv_hus_2x3_5rh/train-* - split: val path: dataset_sv_hus_2x3_5rh/val-* - split: test path: dataset_sv_hus_2x3_5rh/test-* - config_name: dataset_sv_hus_4x5_5rh data_files: - split: train path: dataset_sv_hus_4x5_5rh/train-* - split: val path: dataset_sv_hus_4x5_5rh/val-* - split: test path: dataset_sv_hus_4x5_5rh/test-* dataset_info: - config_name: dataset_da_huse_2x3_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 186251 num_examples: 128 - name: val num_bytes: 185855 num_examples: 128 - name: test num_bytes: 1486529 num_examples: 1024 download_size: 235998 dataset_size: 1858635 - config_name: dataset_da_huse_4x5_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: object_3 sequence: string - name: object_4 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 386741 num_examples: 128 - name: val num_bytes: 382961 num_examples: 128 - name: test num_bytes: 3057562 num_examples: 1024 download_size: 582625 dataset_size: 3827264 - config_name: dataset_da_smoerrebroed_2x3_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 226795 num_examples: 128 - name: val num_bytes: 226605 num_examples: 128 - name: test num_bytes: 1806513 num_examples: 1024 download_size: 252762 dataset_size: 2259913 - config_name: dataset_da_smoerrebroed_4x5_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: object_3 sequence: string - name: object_4 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 445587 num_examples: 128 - name: val num_bytes: 446334 num_examples: 128 - name: test num_bytes: 3571817 num_examples: 1024 download_size: 621751 dataset_size: 4463738 - config_name: dataset_de_hauser_2x3_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 210461 num_examples: 128 - name: val num_bytes: 209458 num_examples: 128 - name: test num_bytes: 1681221 num_examples: 1024 download_size: 245877 dataset_size: 2101140 - config_name: dataset_de_hauser_4x5_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: object_3 sequence: string - name: object_4 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 414711 num_examples: 128 - name: val num_bytes: 411765 num_examples: 128 - name: test num_bytes: 3306651 num_examples: 1024 download_size: 598027 dataset_size: 4133127 - config_name: dataset_en_houses_2x3_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 187633 num_examples: 128 - name: val num_bytes: 187789 num_examples: 128 - name: test num_bytes: 1499066 num_examples: 1024 download_size: 232744 dataset_size: 1874488 - config_name: dataset_en_houses_4x5_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: object_3 sequence: string - name: object_4 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 390702 num_examples: 128 - name: val num_bytes: 391538 num_examples: 128 - name: test num_bytes: 3132404 num_examples: 1024 download_size: 583096 dataset_size: 3914644 - config_name: dataset_fo_hus_2x3_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 206735 num_examples: 128 - name: val num_bytes: 206826 num_examples: 128 - name: test num_bytes: 1647768 num_examples: 1024 download_size: 250344 dataset_size: 2061329 - config_name: dataset_fo_hus_4x5_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: object_3 sequence: string - name: object_4 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 419235 num_examples: 128 - name: val num_bytes: 419310 num_examples: 128 - name: test num_bytes: 3352975 num_examples: 1024 download_size: 614766 dataset_size: 4191520 - config_name: dataset_is_husum_2x3_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 196218 num_examples: 128 - name: val num_bytes: 196952 num_examples: 128 - name: test num_bytes: 1579934 num_examples: 1024 download_size: 236885 dataset_size: 1973104 - config_name: dataset_is_husum_4x5_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: object_3 sequence: string - name: object_4 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 416366 num_examples: 128 - name: val num_bytes: 415324 num_examples: 128 - name: test num_bytes: 3329711 num_examples: 1024 download_size: 605274 dataset_size: 4161401 - config_name: dataset_nb_hus_2x3_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 183889 num_examples: 128 - name: val num_bytes: 182943 num_examples: 128 - name: test num_bytes: 1464132 num_examples: 1024 download_size: 232180 dataset_size: 1830964 - config_name: dataset_nb_hus_4x5_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: object_3 sequence: string - name: object_4 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 376601 num_examples: 128 - name: val num_bytes: 377016 num_examples: 128 - name: test num_bytes: 3022073 num_examples: 1024 download_size: 576540 dataset_size: 3775690 - config_name: dataset_nl_huizen_2x3_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 195096 num_examples: 128 - name: val num_bytes: 195899 num_examples: 128 - name: test num_bytes: 1567462 num_examples: 1024 download_size: 239606 dataset_size: 1958457 - config_name: dataset_nl_huizen_4x5_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: object_3 sequence: string - name: object_4 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 403493 num_examples: 128 - name: val num_bytes: 405850 num_examples: 128 - name: test num_bytes: 3237891 num_examples: 1024 download_size: 591186 dataset_size: 4047234 - config_name: dataset_nn_hus_2x3_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 183493 num_examples: 128 - name: val num_bytes: 184052 num_examples: 128 - name: test num_bytes: 1478353 num_examples: 1024 download_size: 232646 dataset_size: 1845898 - config_name: dataset_nn_hus_4x5_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: object_3 sequence: string - name: object_4 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 379289 num_examples: 128 - name: val num_bytes: 379321 num_examples: 128 - name: test num_bytes: 3036246 num_examples: 1024 download_size: 580227 dataset_size: 3794856 - config_name: dataset_sv_hus_2x3_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 186037 num_examples: 128 - name: val num_bytes: 184058 num_examples: 128 - name: test num_bytes: 1477385 num_examples: 1024 download_size: 235852 dataset_size: 1847480 - config_name: dataset_sv_hus_4x5_5rh features: - name: introduction dtype: string - name: clues sequence: string - name: question dtype: string - name: format_instructions dtype: string - name: format_example dtype: string - name: solution struct: - name: object_1 sequence: string - name: object_2 sequence: string - name: object_3 sequence: string - name: object_4 sequence: string - name: clue_types sequence: string - name: red_herrings sequence: int64 splits: - name: train num_bytes: 386173 num_examples: 128 - name: val num_bytes: 384286 num_examples: 128 - name: test num_bytes: 3078935 num_examples: 1024 download_size: 590601 dataset_size: 3849394 task_categories: - text-generation language: - da - en - nl - de - fo - is - nn - sv - nb pretty_name: MultiZebraLogic size_categories: - 10K<n<100K --- # Dataset Card for the MultiZebraLogic dataset This dataset includes zebra puzzles in multiple European languages and in two sizes: 2x3 and 4x5. It can be used for evaluating logical reasoning ability. The data has been generated using the code in [this repo](https://github.com/alexandrainst/zebra_puzzles). ## Dataset Details ### Dataset Description Zebra puzzles are a type of constraint satisfaction problem. They describe a number of objects, N_objects, that each have attributes, N_attributes. The goal is to couple the objects with the correct attributes, given some clues. Each solution can be described as a N_objects x N_attributes matrix. To increase difficulty, we include "red herrings" which follow the same structure as true clues, but contain no relevant information. We use 5 red herrings per puzzle. Most dataset folders contain puzzles with the "houses" theme, where the objects are houses and each attribute describes an inhabitant. Attributes are randomly selected categories such as nationalities and jobs. This is included in Danish, Dutch (draft version), English, Faroese, German, Icelandic, Norwegian Bokmål, Norwegian Nynorsk and Swedish. We are currently testing changes to template phrasing in the Danish version, so these may differ slightly from the rest. We also include data with the "smøerrebrød" theme, where the objects are smørrebrød (open sandwiches) and each attribute is an ingredient. Categories are ingredient types such as bread or garnish. This theme is only included in Danish. The dataset includes puzzles, solutions, lists of included clue types and indices of red herring clues. The training sets contain 128 puzzles each which are meant as examples for practise. The test sets contain 1024 puzzles each. - **Created by:** Sofie Helene Bruun (sofie.bruun@alexandra.dk) and Dan Saattrup Smart (dan.smart@alexandra.dk) from the Alexandra Institute. - **Funded by:** The EU Horizon project TrustLLM (grant agreement number 101135671) and [Danish Foundation Models](https://www.foundationmodels.dk/) - **Language(s) (NLP):** Danish (da), Dutch (nl), English (en), Faroese (fo), German (de), Icelandic (is), Norwegian Bokmål (nb), Norwegian Nynorsk (nn) and Swedish (sv). - **License:** apache-2.0 ### Dataset Sources - **Repository:** https://github.com/alexandrainst/zebra_puzzles - **Paper:** https://arxiv.org/abs/2511.03553 ## Uses <!-- Address questions around how the dataset is intended to be used. --> Logical reasoning ability can be evaluated by comparing reponses to puzzles to the true solutions. For examples of how this can be done, see the associated repository. The dataset contains examples of the suggested JSON format of reponses for evaluation of LLM's. Part of the dataset is intended for use in [EuroEval](https://github.com/EuroEval/EuroEval). ### Direct Use <!-- This section describes suitable use cases for the dataset. --> Each puzzle can be combined from the columns: introduction, clues and question. Evaluation can be performed by comparing a response to the solution column. When evaluating LLM's, consider including the format_instructions and format_example columns in each puzzle, so it is clear how the intended response should be formatted. To create puzzles without red herrings, remove the clues with indices defined in the red_herrings column. To analyse the effect of different clue types and red herrings, the clue_types and red_herrings columns can be compared to model performance. ### Out-of-Scope Use <!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. --> The clue_types and red_herrings columns should not be included during the solving process, as they will reduce the need for understanding the natural language prompt. Of course, the solution column should also not be included. ## Dataset Structure <!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. --> Each puzzle is generated randomly and independently of other puzzles. The columns are: - *introduction* (str): Defines the overall rules and introduces the attributes. - *clues* (list[str]): Clues relating the attributes and objects. 5 red herrings are included per puzzle. - *question* (str): The question to answer by a solution. - *format_instructions* (str): Instructions on how to respond in JSON format. This is relevant for LLM's. - *format_example* (str): An example of the reponse format with the included attribute categories from the puzzle (but not the exact attributes). - *solution* (dict[str,list[str]]): The solution matrix in JSON format. - *clue_types* (list[str]): The list of clue types matching the clues column. - *red_herrings* (list[int]): A list of indices to the red herring clues. ## Dataset Creation ### Creation Rationale <!-- Motivation for the creation of this dataset. --> The motivation is creating a multilingual benchmark for logical reasoning. The data allows us to compare logical reasoning ability of LLM's and compare scores across languages. Most of the dataset follows the traditional house theme, which is easy to translate. The smørrebrød theme is included to make it possible to compare the house theme to puzzles matching a European culture tied to a specific language. ### Source Data <!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). --> The data is created from words and phrases defined in the zebra puzzle repository. #### Data Collection and Processing <!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, tools and libraries used, etc. --> The included words and phrases have been drafted by the author with the help of Google Translate, GPT-4.1 in Github Copilot, dictionaries and Wikipedia. Relevant code and a few puzzles have been reviewed by native/fluent speakers of each included language (except in Dutch). More details are included in the [paper](https://arxiv.org/abs/2511.03553). #### Who are the source data producers? <!-- This section describes the people or systems who originally created the data. It should also include self-reported demographic or identity information for the source data creators if this information is available. --> Sofie Helene Bruun from the Alexandra Institute with help from other people involved in LLM evaluation across Europe. #### Personal and Sensitive Information <!-- State whether the dataset contains data that might be considered personal, sensitive, or private (e.g., data that reveals addresses, uniquely identifiable names or aliases, racial or ethnic origins, sexual orientations, religious beliefs, political opinions, financial or health data, etc.). If efforts were made to anonymize the data, describe the anonymization process. --> No personal or sensitive information is included. ## Bias, Risks, and Limitations <!-- This section is meant to convey both technical and sociotechnical limitations. --> Not every combination of words in the dataset has been read by a native speaker of each language, so there is a risk that an included combination sounds unnatural or creates an unintended meaning. Attributes are combined randomly and might accidentally match stereotypes or traits of real people. The randomly generated smørrebrød are typically not representative of traditional Danish cuisine, although many of the ingredients are. ### Recommendations <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. ## Citation <!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. --> If you use this dataset in your research, please cite our paper: **BibTeX:** @misc{bruun2025multizebralogicmultilinguallogicalreasoning, title={MultiZebraLogic: A Multilingual Logical Reasoning Benchmark}, author={Sofie Helene Bruun and Dan Saattrup Smart}, year={2025}, eprint={2511.03553}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2511.03553}, } ## Dataset Card Contact sofie.bruun@alexandra.dk dan.smart@alexandra.dk
提供机构:
alexandrainst
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作