five

digitalfabrik/integreat-qa

收藏
Hugging Face2024-09-27 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/digitalfabrik/integreat-qa
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - question-answering task_ids: - extractive-qa language: - de - en tags: - migration - refugees - extractive-qa size_categories: - n<1K annotations_creators: - crowdsourced source_datasets: - original pretty_name: Integreat QA --- # Dataset Our dataset consists of 906 diverse QA pairs in German and English. The dataset is extractive, i.e., answers are given as sentence indices (breaking at the newline character `\n`). Questions are automatically generated using an LLM. The answers are manually annotated using voluntary crowdsourcing. **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) **Paper:** - [https://arxiv.org/abs/1806.03822](https://arxiv.org/abs/1806.03822) - [https://aclanthology.org/2024.konvens-main.25/](https://aclanthology.org/2024.konvens-main.25/) Our dataset is licensed under [cc-by-4.0](https://choosealicense.com/licenses/cc-by-4.0). ## Properties A QA pair consists of - `question` (string): Question - `context` (string): Full text from the Integreat-App - `answers` (number[]): Indices of answer sentences Furthermore, the following properties are present: - `id` (number): A unique id for the QA pair - `language` (string): The language of question and context. - `sourceLanguage` (string | null): If question and context are machine translated, the source language. - `city` (string): The city the page in the Integreat-App belongs to. - `pageId` (number): The page id of the page in the Integreat-App. - `jaccard` (number): The sentence-level inter-annotator agreement of manual answer annotation.
提供机构:
digitalfabrik
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作