HiTZ/BASSE

Name: HiTZ/BASSE
Creator: HiTZ
Published: 2025-11-21 11:17:29
License: 暂无描述

Hugging Face2025-11-21 更新2026-01-03 收录

下载链接：

https://hf-mirror.com/datasets/HiTZ/BASSE

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: - config_name: es features: - name: idx dtype: int32 - name: url dtype: string - name: document dtype: string - name: summary dtype: string - name: model dtype: string - name: prompt dtype: string - name: coherence list: float32 - name: consistency list: float32 - name: fluency list: float32 - name: relevance list: float32 - name: 5W1H list: float32 - name: round dtype: int32 - name: references list: string splits: - name: test num_bytes: 7362289 num_examples: 990 download_size: 570893 dataset_size: 7362289 - config_name: es-round-0 features: - name: idx dtype: int32 - name: url dtype: string - name: document dtype: string - name: summary dtype: string - name: model dtype: string - name: prompt dtype: string - name: coherence list: float32 - name: consistency list: float32 - name: fluency list: float32 - name: relevance list: float32 - name: 5W1H list: float32 - name: round dtype: int32 - name: references list: string splits: - name: test num_bytes: 1278681 num_examples: 210 download_size: 119614 dataset_size: 1278681 - config_name: eu features: - name: idx dtype: int32 - name: url dtype: string - name: document dtype: string - name: summary dtype: string - name: model dtype: string - name: prompt dtype: string - name: coherence list: float32 - name: consistency list: float32 - name: fluency list: float32 - name: relevance list: float32 - name: 5W1H list: float32 - name: round dtype: int32 - name: references list: string splits: - name: test num_bytes: 5652475 num_examples: 990 download_size: 539955 dataset_size: 5652475 - config_name: eu-round-0 features: - name: idx dtype: int32 - name: url dtype: string - name: document dtype: string - name: summary dtype: string - name: model dtype: string - name: prompt dtype: string - name: coherence list: float32 - name: consistency list: float32 - name: fluency list: float32 - name: relevance list: float32 - name: 5W1H list: float32 - name: round dtype: int32 - name: references list: string splits: - name: test num_bytes: 911972 num_examples: 210 download_size: 102957 dataset_size: 911972 configs: - config_name: es data_files: - split: test path: es/test-* - config_name: es-round-0 data_files: - split: test path: es-round-0/test-* - config_name: eu data_files: - split: test path: eu/test-* - config_name: eu-round-0 data_files: - split: test path: eu-round-0/test-* license: cc-by-nc-sa-4.0 task_categories: - summarization - text-generation language: - eu - es pretty_name: BASSE size_categories: - 1K<n<10K --- # BASSE: BAsque and Spanish Summarization Evaluation BASSE is a multilingual (Basque and Spanish) dataset designed primarily for the **meta-evaluation of automatic summarization metrics and LLM-as-a-Judge models**. ## Dataset Details ### Dataset Description BASSE is a multilingual (Basque and Spanish) dataset designed primarily for the **meta-evaluation of automatic summarization metrics and LLM-as-a-Judge models**. We generated automatic summaries for 90 news documents in these two languages (45 each) using Anthropic's **Claude**, OpenAI's **GPT-4o**, Reka AI's **Reka**, Meta's **Llama 3.1 Instruct** and Cohere's **Command R+**. For each of these models, we use four different prompts (**base**, **core**, **5W1H**, **tldr**; [see paper for more details](https://arxiv.org/abs/2503.17039)), with the goal of generating a diverse array of summaries, both regarding quality and style. We also include human-generated reference summaries for each news document. After generating these summaries, we annotated them for **Coherence**, **Consistency**, **Fluency**, **Relevance**, and **5W1H** on a 5-point Likert scale, largely following the annotation protocol from [SummEval](https://github.com/Yale-LILY/SummEval). * **Curated by**: Jeremy Barnes, Begoña Altuna, Alba Bonet, and Naiara Perez * **Language(s) (NLP)**: Spanish (`es-ES`), Basque (`es-EU`) * **License**: [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) ## Dataset Sources * **Respository**: https://github.com/hitz-zentroa/summarization * **Paper**: [Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?](https://arxiv.org/abs/2503.17039) ## Dataset Structure BASSE instances correspond to one hand-annotated summary. There are 3 types of summaries: * LLM-generated summaries (20 generation setups, with 5 LLMs and 4 prompts) * Human-generated summaries * Lead-ins or subheads belonging to the original document ### Data Splits BASSE consists of a **test** split for Basque and a **test** split for Spanish. Each was annotated in 3 consecutive rounds, round 1 and 2 involving 3 annotators (for an IAA study) and the 3rd round involving a single annotator. The composition of BASSE is as follows: | | Round 1 | Round 2 | Round 3 | Total | |-----------------------------------|--------:|--------:|--------:|------:| | Annotators | 3 | 3 | 1 | - | | Documents | 10 | 5 | 30 | 45 | | Summaries, of which | 240 | 120 | 630 | 990 | |    Subheads | 10 | 5 | 30 | 45 | |    Human summaries | 30 | 15 | 0 | 45 | |    LLM summaries | 200 | 100 | 600 | 900 | ### Data Instances Alongside the hand-annotated summary, we provide the original document, one or more reference summaries, and information about how the annotated summary was obtained. The fields included in each instance are specifically the following: * `"idx"` (str): A unique identifier defined for the summary. * `"url"` (str): URL of the original document. * `"round"` (int): `1`, `2`, or `3` - Which annotation round this example comes from. * `"document"` (str): The original news document to be summarized. * `"references"` (list[str]): The human-generated reference summaries. * `"summary"` (str): The original document's summary. * `"model"` (str): `human`, `subhead`, `claude`, `commandr`, `gpt4o`, `reka`, `llama3` - Who generated the summary. * `"prompt"`(str): `base`, `core`, `5w1h`, or `tldr` - prompt type used to generate the summary with an LLM; or the human annotator's identifier. * `"coherence"` (list[float]): human annotations on a 5-point Likert scale for coherence scores. * `"consistency"` (list[float]): human annotations on a 5-point Likert scale for consistency scores. * `"fluency"` (list[float]): human annotations on a 5-point Likert scale for fluency scores. * `"relevance"` (list[float]): human annotations on a 5-point Likert scale for relevance scores. * `"5W1H"` (list[float]): human annotations on a 5-point Likert scale for 5W1H scores. ## Acknowledgements This work has been partially supported by the Basque Government (IKER-GAITU project), the Spanish Ministry for Digital Transformation and of Civil Service, and the EU-funded NextGenerationEU Recovery, Transformation and Resilience Plan (ILENIA project, 2022/TL-22/00215335 and 2022/TL22/00215334). Additional support was provided through DeepR3 (TED2021-130295B-C31) funded by MCIN/AEI/10.13039/501100011033 and European Union NextGeneration EU/PRTR; also through NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation (CIPROM/2021/021) and the grant CIBEST/2023/8, both funded by the Generalitat Valenciana. ## Licensing We release BASSE under a [CC BY-NC-SA 4.0 license](https://creativecommons.org/licenses/by-nc-sa/4.0/). ## Citation **BibTeX:** ``` @misc{barnes2025summarizationmetricsspanishbasque, title={Summarization Metrics for {S}panish and {B}asque: Do Automatic Scores and {LLM}-Judges Correlate with Humans?}, author={Jeremy Barnes and Naiara Perez and Alba Bonet-Jover and Begoña Altuna}, year={2025}, eprint={2503.17039}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2503.17039}, } ``` **APA:** Barnes, J., Perez, N., Bonet-Jover, A., & Altuna, B. (2025). Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?. _arXiv preprint arXiv:2503.17039_.

提供机构：

HiTZ

5,000+

优质数据集

54 个

任务类型

进入经典数据集