proxectonos/calame-gl

Name: proxectonos/calame-gl
Creator: proxectonos
Published: 2026-04-21 13:52:55
License: 暂无描述

Hugging Face2026-04-21 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/proxectonos/calame-gl

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - gl pretty_name: calame-gl task_categories: - text-generation task_ids: - language-modeling tags: - galician - evaluation - benchmark - language-modeling - text-completion - calame license: mit size_categories: - 1K<n<10K --- # CALAME Galician ## Dataset description CALAME-gl is a Galician translation/adaptation of the Portuguese [CALAME-PT](https://huggingface.co/datasets/NOVA-vision-language/calame-pt) benchmark. The dataset is composed of short texts or contexts and their respective last words. These contexts are designed to contain enough information for a human or a model to infer the final word, while avoiding contexts that are excessively specific or overly ambiguous. This release contains 930 instances in JSON format and is intended primarily for evaluation. ## Dataset structure The dataset is distributed in JSON format as a list of examples. Each instance contains the following fields: - `id`: example identifier - `sentence`: context in Galician - `last_word`: final word associated with the context ### Example ```json { "id": 0, "sentence": "Os fans de GTA están ansiosos polo lanzamento do próximo xogo da serie, cuxo lanzamento pódese demorar algúns anos máis. Os rumores apuntan a que o GTA VI será unha versión moderna de Vice City e contará cun mapa que muda co paso do tempo. Alén diso, existe a posibilidade dunha protagonista feminina, o que trae máis expectativas ao xogo. Mentres agardamos, queda imaxinar o que esa nova aventura nos", "last_word": "depara" } ``` ## Data source and creation This dataset is based on the Portuguese benchmark [CALAME-PT](https://huggingface.co/datasets/NOVA-vision-language/calame-pt) and was translated/adapted into Galician. The Galician version preserves the same evaluation-oriented structure as the original dataset: each example contains a context and its corresponding final word. The goal of this version is to provide a Galician benchmark for evaluating a model's ability to infer or predict the final word of a context. ## Intended uses This dataset can be used for: - evaluation of language models in Galician - text completion evaluation - last-word prediction tasks - low-resource NLP research ## Limitations - This dataset is a translated/adapted version of the original Portuguese CALAME-PT benchmark. - It contains 930 examples, so it is intended primarily for evaluation rather than large-scale training. - Since this is a translated/adapted version, some examples may reflect translation choices or stylistic variation relative to the source dataset. ## Licensing This dataset follows the same license as the original CALAME-PT dataset: MIT. ## Usage Example with `datasets`: ```python from datasets import load_dataset ds = load_dataset("json", data_files="calame-gl.json") print(ds["train"][0]) ``` Example of accessing the context and final word: ```python from datasets import load_dataset ds = load_dataset("json", data_files="calame-gl.json")["train"] print(ds[0]["sentence"]) print(ds[0]["last_word"]) ``` ## Acknowledgements This dataset was compiled within the Nós Project, funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project ILENIA with reference 2022/TL22/00215336.

提供机构：

proxectonos

5,000+

优质数据集

54 个

任务类型

进入经典数据集