five

clarin-pl/PUGG_KBQA

收藏
Hugging Face2024-08-12 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/clarin-pl/PUGG_KBQA
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - expert-generated language_creators: [] language: - pl license: - cc-by-sa-4.0 multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - question-answering task_ids: - open-domain-qa pretty_name: 'PUGG: KBQA dataset for Polish' tags: - knowledge graph - KBQA - wikipedia - wikidata configs: - config_name: all data_files: - split: train path: '*/train.jsonl' - split: test path: '*/test.jsonl' default: true - config_name: natural data_files: - split: train path: natural/train.jsonl - split: test path: natural/test.jsonl - config_name: template-based data_files: - split: train path: template-based/train.jsonl - split: test path: template-based/test.jsonl --- # PUGG: KBQA, MRC, IR Dataset for Polish ## Description This repository contains the PUGG dataset designed for three NLP tasks in the Polish language: - KBQA (Knowledge Base Question Answering) - MRC (Machine Reading Comprehension) - IR (Information Retrieval) ## Paper For more detailed information, please refer to our research paper titled: **"Developing PUGG for Polish: A Modern Approach to KBQA, MRC, and IR Dataset Construction"** Authored by: * Albert Sawczyn * Katsiaryna Viarenich * Konrad Wojtasik * Aleksandra Domogała * Marcin Oleksy * Maciej Piasecki * Tomasz Kajdanowicz **The paper was accepted for ACL 2024 (findings).** ## Repositories The dataset is available in the following repositories: * [General](https://huggingface.co/datasets/clarin-pl/PUGG) - contains all tasks (KBQA, MRC, IR*) For more straightforward usage, the tasks are also available in separate repositories: * [KBQA](https://huggingface.co/datasets/clarin-pl/PUGG_KBQA) **(this repository)** * [MRC](https://huggingface.co/datasets/clarin-pl/PUGG_MRC) * [IR](https://huggingface.co/datasets/clarin-pl/PUGG_IR) The knowledge graph for KBQA task is available in the following repository: * [Knowledge Graph](https://huggingface.co/datasets/clarin-pl/PUGG_KG) Note: If you want to utilize the IR task in the BEIR format (`qrels` in `.tsv` format), please download the [IR](https://huggingface.co/datasets/clarin-pl/PUGG_IR) repository. ## Links * Code: * [Github](https://github.com/CLARIN-PL/PUGG) * Paper: * ACL - TBA * [Arxiv](https://arxiv.org/abs/2408.02337) ## Citation ```bibtex @misc{sawczyn2024developingpuggpolishmodern, title={Developing PUGG for Polish: A Modern Approach to KBQA, MRC, and IR Dataset Construction}, author={Albert Sawczyn and Katsiaryna Viarenich and Konrad Wojtasik and Aleksandra Domogała and Marcin Oleksy and Maciej Piasecki and Tomasz Kajdanowicz}, year={2024}, eprint={2408.02337}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2408.02337}, } ``` ## Contact albert.sawczyn@pwr.edu.pl ## Usage ```python from datasets import load_dataset # loading all dataset = load_dataset("clarin-pl/PUGG_KBQA") # or dataset = load_dataset("clarin-pl/PUGG_KBQA", "all") print(dataset) # loading natural dataset = load_dataset("clarin-pl/PUGG_KBQA", "natural") print(dataset) # loading template-based dataset = load_dataset("clarin-pl/PUGG_KBQA", "template-based") print(dataset) ```
提供机构:
clarin-pl
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作