five

lmqg/qa_squadshifts_synthetic

收藏
Hugging Face2023-01-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lmqg/qa_squadshifts_synthetic
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 pretty_name: Synthetic QA dataset on SQuADShifts. language: en multilinguality: monolingual size_categories: 10K<n<100K source_datasets: - extended|wikipedia task_categories: - question-answering task_ids: - extractive-qa --- # Dataset Card for "lmqg/qa_squadshifts_synthetic" ## Dataset Description - **Repository:** [https://github.com/asahi417/lm-question-generation](https://github.com/asahi417/lm-question-generation) - **Paper:** [https://arxiv.org/abs/2210.03992](https://arxiv.org/abs/2210.03992) - **Point of Contact:** [Asahi Ushio](http://asahiushio.com/) ### Dataset Summary This is a synthetic QA dataset generated with fine-tuned QG models over [`lmqg/qa_squadshifts`](https://huggingface.co/datasets/lmqg/qa_squadshifts), made for question-answering based evaluation (QAE) for question generation model proposed by [Zhang and Bansal, 2019](https://aclanthology.org/D19-1253/). The test split is the original validation set of [`lmqg/qa_squadshifts`](https://huggingface.co/datasets/lmqg/qa_squadshifts), where the model should be evaluate on. ### Supported Tasks and Leaderboards * `question-answering` ### Languages English (en) ## Dataset Structure ### Data Fields The data fields are the same among all splits. #### plain_text - `id`: a `string` feature of id - `title`: a `string` feature of title of the paragraph - `context`: a `string` feature of paragraph - `question`: a `string` feature of question - `answers`: a `json` feature of answers ### Data Splits TBA ## Citation Information ``` @inproceedings{ushio-etal-2022-generative, title = "{G}enerative {L}anguage {M}odels for {P}aragraph-{L}evel {Q}uestion {G}eneration", author = "Ushio, Asahi and Alva-Manchego, Fernando and Camacho-Collados, Jose", booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing", month = dec, year = "2022", address = "Abu Dhabi, U.A.E.", publisher = "Association for Computational Linguistics", } ```
提供机构:
lmqg
原始信息汇总

数据集概述

基本信息

  • 名称: Synthetic QA dataset on SQuADShifts
  • 许可证: cc-by-4.0
  • 语言: 英语 (en)
  • 多语言性: 单语种
  • 规模: 10K<n<100K

数据来源

  • 源数据集: 扩展自wikipedia

任务类型

  • 任务类别: 问答
  • 任务ID: extractive-qa

数据集结构

  • 数据字段:
    • id: 字符串类型,标识符
    • title: 字符串类型,段落标题
    • context: 字符串类型,段落内容
    • question: 字符串类型,问题
    • answers: JSON格式,答案

数据分割

  • 分割详情: 待定 (TBA)

引用信息

@inproceedings{ushio-etal-2022-generative, title = "{G}enerative {L}anguage {M}odels for {P}aragraph-{L}evel {Q}uestion {G}eneration", author = "Ushio, Asahi and Alva-Manchego, Fernando and Camacho-Collados, Jose", booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing", month = dec, year = "2022", address = "Abu Dhabi, U.A.E.", publisher = "Association for Computational Linguistics", }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作