digitalfabrik/integreat-qa

Name: digitalfabrik/integreat-qa
Creator: digitalfabrik
Published: 2024-09-27 12:05:40
License: 暂无描述

Hugging Face2024-09-27 更新2025-04-26 收录

下载链接：

https://hf-mirror.com/datasets/digitalfabrik/integreat-qa

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - question-answering task_ids: - extractive-qa language: - de - en tags: - migration - refugees - extractive-qa size_categories: - n<1K annotations_creators: - crowdsourced source_datasets: - original pretty_name: Integreat QA --- # Dataset Our dataset consists of 906 diverse QA pairs in German and English. The dataset is extractive, i.e., answers are given as sentence indices (breaking at the newline character `\n`). Questions are automatically generated using an LLM. The answers are manually annotated using voluntary crowdsourcing. **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) **Paper:** - [https://arxiv.org/abs/1806.03822](https://arxiv.org/abs/1806.03822) - [https://aclanthology.org/2024.konvens-main.25/](https://aclanthology.org/2024.konvens-main.25/) Our dataset is licensed under [cc-by-4.0](https://choosealicense.com/licenses/cc-by-4.0). ## Properties A QA pair consists of - `question` (string): Question - `context` (string): Full text from the Integreat-App - `answers` (number[]): Indices of answer sentences Furthermore, the following properties are present: - `id` (number): A unique id for the QA pair - `language` (string): The language of question and context. - `sourceLanguage` (string | null): If question and context are machine translated, the source language. - `city` (string): The city the page in the Integreat-App belongs to. - `pageId` (number): The page id of the page in the Integreat-App. - `jaccard` (number): The sentence-level inter-annotator agreement of manual answer annotation.

提供机构：

digitalfabrik

5,000+

优质数据集

54 个

任务类型

进入经典数据集