Eloquent/HalluciGen-Translation

Name: Eloquent/HalluciGen-Translation
Creator: Eloquent
Published: 2024-11-13 09:04:10
License: 暂无描述

Hugging Face2024-11-13 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/Eloquent/HalluciGen-Translation

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-sa-4.0 language: - de - en - fr configs: - config_name: trial data_files: - split: trial_de_en path: de-en/trial.de-en.jsonl - split: trial_en_de path: de-en/trial.en-de.jsonl - split: trial_fr_en path: fr-en/trial.fr-en.jsonl - split: trial_en_fr path: fr-en/trial.en-fr.jsonl - config_name: test_detection data_files: - split: test_detection_de_en path: de-en/test_detection.de-en.jsonl - split: test_detection_en_de path: de-en/test_detection.en-de.jsonl - split: test_detection_fr_en path: fr-en/test_detection.fr-en.jsonl - split: test_detection_en_fr path: fr-en/test_detection.en-fr.jsonl - config_name: test_generation data_files: - split: test_generation_de_en path: de-en/test_generation.de-en.jsonl - split: test_generation_en_de path: de-en/test_generation.en-de.jsonl - split: test_generation_fr_en path: fr-en/test_generation.fr-en.jsonl - split: test_generation_en_fr path: fr-en/test_generation.en-fr.jsonl - config_name: cross_model_evaluation sep: ',' data_files: - split: cross_model_evaluation_de_en path: de-en/cross_model_evaluation.de-en.jsonl - split: cross_model_evaluation_en_de path: de-en/cross_model_evaluation.en-de.jsonl - split: cross_model_evaluation_fr_en path: fr-en/cross_model_evaluation.fr-en.jsonl - split: cross_model_evaluation_en_fr path: fr-en/cross_model_evaluation.en-fr.jsonl pretty_name: HalluciGen Translation size_categories: - n<1K --- # Task 2: HalluciGen - Tranlsation This dataset contains the trial and test splits per language pair for the Translation scenario of the [HalluciGen task](https://docs.google.com/document/d/1yeohpm3YJAXKj9BI2JDXJ3ap9Vi2dnHkA2OsDI94QZ4/edit#heading=h.jtyt8tmnayhb), which is part of the 2024 ELOQUENT lab. NOTE: A gold-labeled version of the dataset will be released in a new repository. #### Dataset schema - *id*: unique identifier of the example - *langpair*: the source and target language pair of the example - *source*: original model input for translation - *hyp1*: first alternative translation of the source - *hyp2*: second alternative translation of the source - *label*: *hyp1* or *hyp2*, based on which of those has been annotated as hallucination - *type*: hallucination category assigned. Possible values: addition, named-entity, number, conversion, date, tense, negation, gender, pronoun, antonym, natural #### Trial Data This is a small list of examples, provided to help the participants get familiar with the task. Each example contains the following fields: *id*, *langpair*, *source*, *hyp1*, *hyp2*, *type*, *label*. ```python from datasets import load_dataset #load the trial data for all language pairs trial_ds = load_dataset("Eloquent/HalluciGen-Translation", name="trial") #load the trial data only for the German->English pair trial_ds_de_en = load_dataset("Eloquent/HalluciGen-Translation", name="trial", split="trial_de_en") ``` #### Test data for the detection step The files "test_detection.langpair.jsonl" contain the test splits for the detection step for the specific *langpair*. Each example contains the following fields: *id*, *langpair*, *source* *hyp1*, *hyp2*. ```python from datasets import load_dataset #load the test data for the detection step for all language pairs data = load_dataset("Eloquent/HalluciGen-Translation", "test_detection") ``` #### Test data for the generation step The files "test_generation.langpair.jsonl" contain the test splits for the detection step for the specific *langpair*. Each example contains the following fields: *id*, *langpair*, *source* . ```python from datasets import load_dataset #load the test data for the generation step for all language pairs data = load_dataset("Eloquent/HalluciGen-Translation", "test_generation") ``` #### Test data for the cross-model evaluation of the generation step (released 3 May, 2024) The file "cross_model_evaluation.langpair.jsonl" contains the test splits for the cross-model evaluation of the generation step for the specific *langpair*. Each example contains the following fields: *id*, *langpair*, *source*, *hyp1*, *hyp2*. ```python from datasets import load_dataset #load the test data for the cross-model evaluation of the generation step for all language pairs data = load_dataset("Eloquent/HalluciGen-Translation", "cross_model_evaluation") ```

提供机构：

Eloquent

原始信息汇总

数据集概述

数据集名称

名称: HalluciGen Translation

数据集内容

任务: 翻译
语言对: 德语-英语, 英语-德语, 法语-英语, 英语-法语
数据文件配置:
- trial: 包含四个语言对的试验数据文件
- test_detection: 包含四个语言对的检测测试数据文件
- test_generation: 包含四个语言对的生成测试数据文件
- cross_model_evaluation: 包含四个语言对的跨模型评估测试数据文件

数据集结构

字段:
- id: 唯一标识符
- langpair: 源语言和目标语言对
- source: 原始翻译输入
- hyp1: 第一个替代翻译
- hyp2: 第二个替代翻译
- label: 标注为幻觉的翻译选项
- type: 幻觉类别

数据集大小

类别: n<1K

许可证

许可证: cc-by-nc-sa-4.0

5,000+

优质数据集

54 个

任务类型

进入经典数据集