five

Eloquent/HalluciGen-Translation

收藏
Hugging Face2024-11-13 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/Eloquent/HalluciGen-Translation
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-sa-4.0 language: - de - en - fr configs: - config_name: trial data_files: - split: trial_de_en path: de-en/trial.de-en.jsonl - split: trial_en_de path: de-en/trial.en-de.jsonl - split: trial_fr_en path: fr-en/trial.fr-en.jsonl - split: trial_en_fr path: fr-en/trial.en-fr.jsonl - config_name: test_detection data_files: - split: test_detection_de_en path: de-en/test_detection.de-en.jsonl - split: test_detection_en_de path: de-en/test_detection.en-de.jsonl - split: test_detection_fr_en path: fr-en/test_detection.fr-en.jsonl - split: test_detection_en_fr path: fr-en/test_detection.en-fr.jsonl - config_name: test_generation data_files: - split: test_generation_de_en path: de-en/test_generation.de-en.jsonl - split: test_generation_en_de path: de-en/test_generation.en-de.jsonl - split: test_generation_fr_en path: fr-en/test_generation.fr-en.jsonl - split: test_generation_en_fr path: fr-en/test_generation.en-fr.jsonl - config_name: cross_model_evaluation sep: ',' data_files: - split: cross_model_evaluation_de_en path: de-en/cross_model_evaluation.de-en.jsonl - split: cross_model_evaluation_en_de path: de-en/cross_model_evaluation.en-de.jsonl - split: cross_model_evaluation_fr_en path: fr-en/cross_model_evaluation.fr-en.jsonl - split: cross_model_evaluation_en_fr path: fr-en/cross_model_evaluation.en-fr.jsonl pretty_name: HalluciGen Translation size_categories: - n<1K --- # Task 2: HalluciGen - Tranlsation This dataset contains the trial and test splits per language pair for the Translation scenario of the [HalluciGen task](https://docs.google.com/document/d/1yeohpm3YJAXKj9BI2JDXJ3ap9Vi2dnHkA2OsDI94QZ4/edit#heading=h.jtyt8tmnayhb), which is part of the 2024 ELOQUENT lab. NOTE: A gold-labeled version of the dataset will be released in a new repository. #### Dataset schema - *id*: unique identifier of the example - *langpair*: the source and target language pair of the example - *source*: original model input for translation - *hyp1*: first alternative translation of the source - *hyp2*: second alternative translation of the source - *label*: *hyp1* or *hyp2*, based on which of those has been annotated as hallucination - *type*: hallucination category assigned. Possible values: addition, named-entity, number, conversion, date, tense, negation, gender, pronoun, antonym, natural #### Trial Data This is a small list of examples, provided to help the participants get familiar with the task. Each example contains the following fields: *id*, *langpair*, *source*, *hyp1*, *hyp2*, *type*, *label*. ```python from datasets import load_dataset #load the trial data for all language pairs trial_ds = load_dataset("Eloquent/HalluciGen-Translation", name="trial") #load the trial data only for the German->English pair trial_ds_de_en = load_dataset("Eloquent/HalluciGen-Translation", name="trial", split="trial_de_en") ``` #### Test data for the detection step The files "test_detection.langpair.jsonl" contain the test splits for the detection step for the specific *langpair*. Each example contains the following fields: *id*, *langpair*, *source* *hyp1*, *hyp2*. ```python from datasets import load_dataset #load the test data for the detection step for all language pairs data = load_dataset("Eloquent/HalluciGen-Translation", "test_detection") ``` #### Test data for the generation step The files "test_generation.langpair.jsonl" contain the test splits for the detection step for the specific *langpair*. Each example contains the following fields: *id*, *langpair*, *source* . ```python from datasets import load_dataset #load the test data for the generation step for all language pairs data = load_dataset("Eloquent/HalluciGen-Translation", "test_generation") ``` #### Test data for the cross-model evaluation of the generation step (released 3 May, 2024) The file "cross_model_evaluation.langpair.jsonl" contains the test splits for the cross-model evaluation of the generation step for the specific *langpair*. Each example contains the following fields: *id*, *langpair*, *source*, *hyp1*, *hyp2*. ```python from datasets import load_dataset #load the test data for the cross-model evaluation of the generation step for all language pairs data = load_dataset("Eloquent/HalluciGen-Translation", "cross_model_evaluation") ```
提供机构:
Eloquent
原始信息汇总

数据集概述

数据集名称

  • 名称: HalluciGen Translation

数据集内容

  • 任务: 翻译
  • 语言对: 德语-英语, 英语-德语, 法语-英语, 英语-法语
  • 数据文件配置:
    • trial: 包含四个语言对的试验数据文件
    • test_detection: 包含四个语言对的检测测试数据文件
    • test_generation: 包含四个语言对的生成测试数据文件
    • cross_model_evaluation: 包含四个语言对的跨模型评估测试数据文件

数据集结构

  • 字段:
    • id: 唯一标识符
    • langpair: 源语言和目标语言对
    • source: 原始翻译输入
    • hyp1: 第一个替代翻译
    • hyp2: 第二个替代翻译
    • label: 标注为幻觉的翻译选项
    • type: 幻觉类别

数据集大小

  • 类别: n<1K

许可证

  • 许可证: cc-by-nc-sa-4.0
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作