jonasaise/swesat-skolprov-superlim-merged

Name: jonasaise/swesat-skolprov-superlim-merged
Creator: jonasaise
Published: 2026-02-25 11:03:44
License: 暂无描述

Hugging Face2026-02-25 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/jonasaise/swesat-skolprov-superlim-merged

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - sv task_categories: - question-answering - multiple-choice - text-classification - text-generation pretty_name: Swedish NLU Benchmark Collection --- # Dataset Card for the Swedish NLU Benchmark Collection ## Dataset Description This dataset is a comprehensive, deduplicated benchmark collection specifically designed for evaluating the Swedish Natural Language Understanding (NLU) capabilities of Large Language Models (LLMs). The dataset merges high-quality multiple-choice scholastic examinations with a diverse suite of NLP and reasoning tasks. The benchmark contains over 450,000 unique queries compiled into a unified JSONL format, making it plug-and-play ready for LLM generation evaluation through standardized [system_prompt](cci:1://file:///home/jonas/Projects/swesat_nlu/scripts/merge_benchmarks.py:7:0-18:17), [prompt](cci:1://file:///home/jonas/Projects/swesat_nlu/scripts/merge_benchmarks.py:21:0-32:21), and string [answer](cci:1://file:///home/jonas/Projects/swesat_nlu/scripts/merge_benchmarks.py:35:0-51:71) fields. Structural data labels have been translated to Swedish (e.g., `inkorrekt` to `Inkorrekt`) to minimize false penalizations for monolingual Swedish models. ### Dataset Sources This merged dataset consolidates data from three primary sources: 1. **[Swesat](https://github.com/NLP-RISE/swesat):** A collection of multiple-choice Swedish scholastic aptitude tests (Högskoleprovet). 2. **[Swedish Skolprov](https://huggingface.co/datasets/Ekgren/swedish_skolprov):** Additional Swedish national school exams mapped to standard A-E multiple-choice formats. 3. **[SuperLim-2](https://huggingface.co/datasets/sbx/superlim-2):** A Swedish counterpart to the GLUE/SuperGLUE evaluation suites featuring 15 diverse NLI, QA, semantic similarity, and text generation tasks. ## Format and Structure Every instance in the dataset has been unified into the following uniform schema, optimizing it for immediate zero-shot or few-shot inference: ```json { "uid": "Unique identifier for the question", "test_id": "Original exam/dataset identifier", "section": "Section or task name within the source", "subsection": "Specific subset or task type", "question_id": "Original ID (if applicable)", "question_resource": "Any associated reading materials/context", "question": "The raw question or prompt", "option_a": "Multiple choice option A (if applicable)", "option_b": "Multiple choice option B (if applicable)", "option_c": "Multiple choice option C (if applicable)", "option_d": "Multiple choice option D (if applicable)", "option_e": "Multiple choice option E (if applicable)", "system_prompt": "Recommended system instruction to precisely guide the LLM's task", "prompt": "Constructed user prompt combining the context, question, and formatted options", "answer": "The expected exact-match target answer or normalized label (in Swedish)", "source": "swesat | skolprov | superlim-2" } ``` ```bib @inproceedings{berdicevskis-etal-2023-superlim, title = "Superlim: A {S}wedish Language Understanding Evaluation Benchmark", author = {Berdicevskis, Aleksandrs and Bouma, Gerlof and Kurtz, Robin and Morger, Felix and {\"O}hman, Joey and Adesam, Yvonne and Borin, Lars and Dann{\'e}lls, Dana and Forsberg, Markus and Isbister, Tim and Lindahl, Anna and Malmsten, Martin and Rekathati, Faton and Sahlgren, Magnus and Volodina, Elena and B{\"o}rjeson, Love and Hengchen, Simon and Tahmasebi, Nina}, editor = "Bouamor, Houda and Pino, Juan and Bali, Kalika", booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing", month = dec, year = "2023", address = "Singapore", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.emnlp-main.506", doi = "10.18653/v1/2023.emnlp-main.506", pages = "8137--8153", abstract = "We present Superlim, a multi-task NLP benchmark and analysis platform for evaluating Swedish language models, a counterpart to the English-language (Super)GLUE suite. We describe the dataset, the tasks, the leaderboard and report the baseline results yielded by a reference implementation. The tested models do not approach ceiling performance on any of the tasks, which suggests that Superlim is truly difficult, a desirable quality for a benchmark. We address methodological challenges, such as mitigating the Anglocentric bias when creating datasets for a less-resourced language; choosing the most appropriate measures; documenting the datasets and making the leaderboard convenient and transparent. We also highlight other potential usages of the dataset, such as, for instance, the evaluation of cross-lingual transfer learning.", } ``` ```bib @article{SweSAT2024, title={SweSAT-1.0: The Swedish University Entrance Exam as a Benchmark for Large Language Models}, author={Kurfalı, Murathan and Zahra, Shorouq and Gogoulou, Evangelia and Dürlich, Luise and Carlsson, Fredrik and Nivre, Joakim}, booktitle = "Proceedings of The Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)", month = march, year = "2025", address = "Talinn, Estonia" } ```

提供机构：

jonasaise

5,000+

优质数据集

54 个

任务类型

进入经典数据集