lmqg/qa_squadshifts_synthetic

Name: lmqg/qa_squadshifts_synthetic
Creator: lmqg
Published: 2023-01-15 14:25:15
License: 暂无描述

Hugging Face2023-01-15 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/lmqg/qa_squadshifts_synthetic

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 pretty_name: Synthetic QA dataset on SQuADShifts. language: en multilinguality: monolingual size_categories: 10K<n<100K source_datasets: - extended|wikipedia task_categories: - question-answering task_ids: - extractive-qa --- # Dataset Card for "lmqg/qa_squadshifts_synthetic" ## Dataset Description - **Repository:** [https://github.com/asahi417/lm-question-generation](https://github.com/asahi417/lm-question-generation) - **Paper:** [https://arxiv.org/abs/2210.03992](https://arxiv.org/abs/2210.03992) - **Point of Contact:** [Asahi Ushio](http://asahiushio.com/) ### Dataset Summary This is a synthetic QA dataset generated with fine-tuned QG models over [`lmqg/qa_squadshifts`](https://huggingface.co/datasets/lmqg/qa_squadshifts), made for question-answering based evaluation (QAE) for question generation model proposed by [Zhang and Bansal, 2019](https://aclanthology.org/D19-1253/). The test split is the original validation set of [`lmqg/qa_squadshifts`](https://huggingface.co/datasets/lmqg/qa_squadshifts), where the model should be evaluate on. ### Supported Tasks and Leaderboards * `question-answering` ### Languages English (en) ## Dataset Structure ### Data Fields The data fields are the same among all splits. #### plain_text - `id`: a `string` feature of id - `title`: a `string` feature of title of the paragraph - `context`: a `string` feature of paragraph - `question`: a `string` feature of question - `answers`: a `json` feature of answers ### Data Splits TBA ## Citation Information ``` @inproceedings{ushio-etal-2022-generative, title = "{G}enerative {L}anguage {M}odels for {P}aragraph-{L}evel {Q}uestion {G}eneration", author = "Ushio, Asahi and Alva-Manchego, Fernando and Camacho-Collados, Jose", booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing", month = dec, year = "2022", address = "Abu Dhabi, U.A.E.", publisher = "Association for Computational Linguistics", } ```

提供机构：

lmqg

原始信息汇总

数据集概述

基本信息

名称: Synthetic QA dataset on SQuADShifts
许可证: cc-by-4.0
语言: 英语 (en)
多语言性: 单语种
规模: 10K<n<100K

数据来源

源数据集: 扩展自wikipedia

任务类型

任务类别: 问答
任务ID: extractive-qa

数据集结构

数据字段:
- id: 字符串类型，标识符
- title: 字符串类型，段落标题
- context: 字符串类型，段落内容
- question: 字符串类型，问题
- answers: JSON格式，答案

数据分割

分割详情: 待定 (TBA)

引用信息

@inproceedings{ushio-etal-2022-generative, title = "{G}enerative {L}anguage {M}odels for {P}aragraph-{L}evel {Q}uestion {G}eneration", author = "Ushio, Asahi and Alva-Manchego, Fernando and Camacho-Collados, Jose", booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing", month = dec, year = "2022", address = "Abu Dhabi, U.A.E.", publisher = "Association for Computational Linguistics", }

5,000+

优质数据集

54 个

任务类型

进入经典数据集