five

HiTZ/BasqueSumm

收藏
Hugging Face2025-11-21 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/HiTZ/BasqueSumm
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-sa-4.0 task_categories: - summarization - text-generation - fill-mask language: - eu pretty_name: BasqueSumm size_categories: - 100K<n<1M --- # BasqueSumm BasqueSumm was automatically compiled from www.berria.eus using [trafilatura](https://trafilatura.readthedocs.io) to extract the texts. Each instance has the following key-value pairs: * `"date"` (str): When the article was published, formatted as `"yyyy-mm-dd"`. * `"url"` (str): The URL of the original publication. * `"category"` (str): the articles topic, e.g., economy, society. * `"title"` (str): The title of the article. * `"subtitle"` (str): The subtitle of the article. * `"summary"` (str): The combined title + subtitle, which acts as a proxy for a reference summary. * `"text"` (str): The news article. ## Dataset Details * **Curated by**: Jeremy Barnes * **Language(s) (NLP)**: Basque (`es-EU`) * **License**: [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) ## Dataset Sources * **Respository**: https://github.com/hitz-zentroa/summarization * **Paper**: [Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?](https://arxiv.org/abs/2503.17039) ## Acknowledgements This work has been partially supported by the Basque Government (IKER-GAITU project), the Spanish Ministry for Digital Transformation and of Civil Service, and the EU-funded NextGenerationEU Recovery, Transformation and Resilience Plan (ILENIA project, 2022/TL-22/00215335 and 2022/TL22/00215334). Additional support was provided through DeepR3 (TED2021-130295B-C31) funded by MCIN/AEI/10.13039/501100011033 and European Union NextGeneration EU/PRTR; also through NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation (CIPROM/2021/021) and the grant CIBEST/2023/8, both funded by the Generalitat Valenciana. ## Licensing We release BASSE under a [CC BY-NC-SA 4.0 license](https://creativecommons.org/licenses/by-nc-sa/4.0/) ## Citation **BibTeX:** ``` @misc{barnes2025summarizationmetricsspanishbasque, title={Summarization Metrics for {S}panish and {B}asque: Do Automatic Scores and {LLM}-Judges Correlate with Humans?}, author={Jeremy Barnes and Naiara Perez and Alba Bonet-Jover and Begoña Altuna}, year={2025}, eprint={2503.17039}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2503.17039}, } ``` **APA:** Barnes, J., Perez, N., Bonet-Jover, A., & Altuna, B. (2025). Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?. _arXiv preprint arXiv:2503.17039_.
提供机构:
HiTZ
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作