five

sapienzanlp/arc_italian

收藏
Hugging Face2025-12-02 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/sapienzanlp/arc_italian
下载链接
链接失效反馈
官方服务:
资源简介:
ARC - Italian (IT)数据集是AI2推理挑战(ARC)的意大利语翻译版本,包含选择题和文本补全任务。数据集分为ARC Challenge和ARC Easy两个版本,分别包含不同数量的训练、验证和测试数据。数据集支持意大利语和英语,并且是完全并行的,便于跨语言评估。翻译过程使用了开源工具OBenTO-LLM,确保了翻译的透明性和可重复性。数据集格式包括唯一ID、任务类型、原始英文文本、意大利语翻译、选项、选项翻译和正确答案索引。数据集遵循CC BY-SA 4.0许可协议。

The ARC - Italian (IT) dataset is an Italian translation of the AI2 Reasoning Challenge (ARC), containing multiple-choice questions and text completion tasks. The dataset is divided into two versions: ARC Challenge and ARC Easy, each containing different amounts of training, validation, and test data. The dataset supports both Italian and English and is fully parallel, facilitating cross-language evaluation. The translation process used the open-source tool OBenTO-LLM, ensuring transparency and reproducibility in translation. The dataset format includes a unique ID, task type, original English text, Italian translation, choices, choice translations, and the correct answer index. The dataset is distributed under the CC BY-SA 4.0 license.
提供机构:
sapienzanlp
原始信息汇总

ARC - Italian (IT) 数据集概述

数据集详情

  • 任务类别: 文本生成
  • 语言: 意大利语, 英语
  • 数据规模: 1K<n<10K

数据集版本

  • ARC Challenge:
    • 训练集: 1,105 行
    • 验证集: 292 行
    • 测试集: 1,151 行
  • ARC Easy:
    • 训练集: 2,2193 行
    • 验证集: 557 行
    • 测试集: 2,322 行

数据集特点

  • 包含多选题和文本补全任务。
  • 数据集是英语和意大利语的完全平行版本。
  • 翻译过程使用开源工具 🍱 OBenTO-LLM

数据格式

  • id: 样本唯一ID
  • category: 任务类型,可以是 questiontext_completion
  • input_text: 原始英语句子
  • input_text_translation: 意大利语翻译
  • choices: 原始英语选项
  • choice_translations: 意大利语选项翻译
  • gold_index: 正确答案的索引

示例

问题示例

json { "id": "Mercury_SC_407695", "category": "question", "input_text": "Juan and LaKeisha roll a few objects down a ramp. They want to see which object rolls the farthest. What should they do so they can repeat their investigation?", "input_text_translation": "Juan e LaKeisha fanno scivolare alcuni oggetti giù per una rampa. Vogliono vedere quale oggetto scivola più lontano. Cosa dovrebbero fare per ripetere la loro indagine?", "choices": [ "Put the objects in groups.", "Change the height of the ramp.", "Choose different objects to roll.", "Record the details of the investigation." ], "choice_translations": [ "Mettere gli oggetti in gruppi.", "Cambiare laltezza della rampa.", "Scegliere oggetti diversi da scivolare.", "Registrare i dettagli dellindagine." ], "gold_index": 3 }

文本补全示例

json { "id": "Mercury_7217053", "category": "text_completion", "input_text": "Biological evolution can occur through all of these except", "input_text_translation": "Levoluzione biologica può avvenire attraverso tutte queste eccezion fatta", "choices": [ "competition.", "fossilization.", "variation.", "adaptation." ], "choice_translations": [ "concorrenza.", "fossilizzazione.", "variazione.", "adattamento." ], "gold_index": 1 }

许可证

  • 许可证: CC BY-SA 4.0
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作