digitalfabrik/integreat-qa
收藏Hugging Face2024-09-27 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/digitalfabrik/integreat-qa
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- question-answering
task_ids:
- extractive-qa
language:
- de
- en
tags:
- migration
- refugees
- extractive-qa
size_categories:
- n<1K
annotations_creators:
- crowdsourced
source_datasets:
- original
pretty_name: Integreat QA
---
# Dataset
Our dataset consists of 906 diverse QA pairs in German and English.
The dataset is extractive, i.e., answers are given as sentence indices (breaking at the newline character `\n`).
Questions are automatically generated using an LLM.
The answers are manually annotated using voluntary crowdsourcing.
**Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
**Paper:**
- [https://arxiv.org/abs/1806.03822](https://arxiv.org/abs/1806.03822)
- [https://aclanthology.org/2024.konvens-main.25/](https://aclanthology.org/2024.konvens-main.25/)
Our dataset is licensed under [cc-by-4.0](https://choosealicense.com/licenses/cc-by-4.0).
## Properties
A QA pair consists of
- `question` (string): Question
- `context` (string): Full text from the Integreat-App
- `answers` (number[]): Indices of answer sentences
Furthermore, the following properties are present:
- `id` (number): A unique id for the QA pair
- `language` (string): The language of question and context.
- `sourceLanguage` (string | null): If question and context are machine translated, the source language.
- `city` (string): The city the page in the Integreat-App belongs to.
- `pageId` (number): The page id of the page in the Integreat-App.
- `jaccard` (number): The sentence-level inter-annotator agreement of manual answer annotation.
提供机构:
digitalfabrik



