jonasaise/swesat-skolprov-superlim-merged
收藏Hugging Face2026-02-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/jonasaise/swesat-skolprov-superlim-merged
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- sv
task_categories:
- question-answering
- multiple-choice
- text-classification
- text-generation
pretty_name: Swedish NLU Benchmark Collection
---
# Dataset Card for the Swedish NLU Benchmark Collection
## Dataset Description
This dataset is a comprehensive, deduplicated benchmark collection specifically designed for evaluating the Swedish Natural Language Understanding (NLU) capabilities of Large Language Models (LLMs). The dataset merges high-quality multiple-choice scholastic examinations with a diverse suite of NLP and reasoning tasks.
The benchmark contains over 450,000 unique queries compiled into a unified JSONL format, making it plug-and-play ready for LLM generation evaluation through standardized [system_prompt](cci:1://file:///home/jonas/Projects/swesat_nlu/scripts/merge_benchmarks.py:7:0-18:17), [prompt](cci:1://file:///home/jonas/Projects/swesat_nlu/scripts/merge_benchmarks.py:21:0-32:21), and string [answer](cci:1://file:///home/jonas/Projects/swesat_nlu/scripts/merge_benchmarks.py:35:0-51:71) fields. Structural data labels have been translated to Swedish (e.g., `inkorrekt` to `Inkorrekt`) to minimize false penalizations for monolingual Swedish models.
### Dataset Sources
This merged dataset consolidates data from three primary sources:
1. **[Swesat](https://github.com/NLP-RISE/swesat):** A collection of multiple-choice Swedish scholastic aptitude tests (Högskoleprovet).
2. **[Swedish Skolprov](https://huggingface.co/datasets/Ekgren/swedish_skolprov):** Additional Swedish national school exams mapped to standard A-E multiple-choice formats.
3. **[SuperLim-2](https://huggingface.co/datasets/sbx/superlim-2):** A Swedish counterpart to the GLUE/SuperGLUE evaluation suites featuring 15 diverse NLI, QA, semantic similarity, and text generation tasks.
## Format and Structure
Every instance in the dataset has been unified into the following uniform schema, optimizing it for immediate zero-shot or few-shot inference:
```json
{
"uid": "Unique identifier for the question",
"test_id": "Original exam/dataset identifier",
"section": "Section or task name within the source",
"subsection": "Specific subset or task type",
"question_id": "Original ID (if applicable)",
"question_resource": "Any associated reading materials/context",
"question": "The raw question or prompt",
"option_a": "Multiple choice option A (if applicable)",
"option_b": "Multiple choice option B (if applicable)",
"option_c": "Multiple choice option C (if applicable)",
"option_d": "Multiple choice option D (if applicable)",
"option_e": "Multiple choice option E (if applicable)",
"system_prompt": "Recommended system instruction to precisely guide the LLM's task",
"prompt": "Constructed user prompt combining the context, question, and formatted options",
"answer": "The expected exact-match target answer or normalized label (in Swedish)",
"source": "swesat | skolprov | superlim-2"
}
```
```bib
@inproceedings{berdicevskis-etal-2023-superlim,
title = "Superlim: A {S}wedish Language Understanding Evaluation Benchmark",
author = {Berdicevskis, Aleksandrs and
Bouma, Gerlof and
Kurtz, Robin and
Morger, Felix and
{\"O}hman, Joey and
Adesam, Yvonne and
Borin, Lars and
Dann{\'e}lls, Dana and
Forsberg, Markus and
Isbister, Tim and
Lindahl, Anna and
Malmsten, Martin and
Rekathati, Faton and
Sahlgren, Magnus and
Volodina, Elena and
B{\"o}rjeson, Love and
Hengchen, Simon and
Tahmasebi, Nina},
editor = "Bouamor, Houda and
Pino, Juan and
Bali, Kalika",
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.emnlp-main.506",
doi = "10.18653/v1/2023.emnlp-main.506",
pages = "8137--8153",
abstract = "We present Superlim, a multi-task NLP benchmark and analysis platform for evaluating Swedish language models, a counterpart to the English-language (Super)GLUE suite. We describe the dataset, the tasks, the leaderboard and report the baseline results yielded by a reference implementation. The tested models do not approach ceiling performance on any of the tasks, which suggests that Superlim is truly difficult, a desirable quality for a benchmark. We address methodological challenges, such as mitigating the Anglocentric bias when creating datasets for a less-resourced language; choosing the most appropriate measures; documenting the datasets and making the leaderboard convenient and transparent. We also highlight other potential usages of the dataset, such as, for instance, the evaluation of cross-lingual transfer learning.",
}
```
```bib
@article{SweSAT2024,
title={SweSAT-1.0: The Swedish University Entrance Exam as a Benchmark for Large Language Models},
author={Kurfalı, Murathan and Zahra, Shorouq and Gogoulou, Evangelia and Dürlich, Luise and Carlsson, Fredrik and Nivre, Joakim},
booktitle = "Proceedings of The Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)",
month = march,
year = "2025",
address = "Talinn, Estonia"
}
```
提供机构:
jonasaise



