five

zeaver/multifactor_squad1.1_zhou

收藏
Hugging Face2023-11-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/zeaver/multifactor_squad1.1_zhou
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-generation - question-answering language: - en tags: - question-generation - HotpotQA size_categories: - 10K<n<100K --- # MultiFactor-HotpotQA-SuppFacts <!-- Provide a quick summary of the dataset. --> The MultiFactor datasets -- SQuAD1.1-Zhou Split [1] in EMNLP 2023 Findings: [*Improving Question Generation with Multi-level Content Planning*](https://arxiv.org/abs/2310.13512). ## 1. Dataset Details ### 1.1 Dataset Description SQuAD1.1-Zhou Split [1, 2] in EMNLP 2023 Findings: [*Improving Question Generation with Multi-level Content Planning*](https://arxiv.org/abs/2310.13512). Based on the dataset in [2], we add the `p_hrase`, `n_phrase` and `full answer` attributes for every dataset instance. The full answer is reconstructed with [QA2D](https://github.com/kelvinguu/qanli) [3]. More details are in paper github: https://github.com/zeaver/MultiFactor. ### 1.2 Dataset Sources <!-- Provide the basic links for the dataset. --> - **Repository:** https://github.com/zeaver/MultiFactor - **Paper:** [*Improving Question Generation with Multi-level Content Planning*](https://arxiv.org/abs/2310.13512). EMNLP Findings, 2023. ## 2. Dataset Structure ```tex . ├── dev.json ├── test.json ├── train.json ├── fa_model_inference ├── dev.json ├── test.json └── train.json ``` Each split is a json file, not jsonl. Please load it with `json.load(f)` directly. And the dataset schema is: ```json { "context": "the given input context", "answer": "the given answer", "question": "the corresponding question", "p_phrase": "the postive phrases in the given context", "n_phrase": "the negative phrases", "full answer": "pseudo-gold full answer (q + a -> a declarative sentence)", } ``` We also provide the *FA_Model*'s inference results in `fa_model_inference/{split}.json`. ## 3. Dataset Card Contact If you have any question, feel free to contact with me: zehua.xia1999@gmail.com ## Reference [1] Rajpurkar, Pranav, et al. [SQuAD: 100,000+ Questions for Machine Comprehension of Text](https://aclanthology.org/D16-1264/). EMNLP, 2016. [2] Zhou, Qingyu, et al. [Neural Question Generation from Text: A Preliminary Study](https://arxiv.org/abs/1704.01792). EMNLP, 2017. [3] Demszky, Dorottya, et al. [Transforming Question Answering Datasets Into Natural Language Inference Datasets](https://arxiv.org/abs/1809.02922). Stanford University. arXiv, 2018.
提供机构:
zeaver
原始信息汇总

MultiFactor-HotpotQA-SuppFacts

1. 数据集详情

1.1 数据集描述

MultiFactor 数据集 -- SQuAD1.1-Zhou Split [1, 2] 在 EMNLP 2023 Findings 中提出:Improving Question Generation with Multi-level Content Planning

基于 [2] 中的数据集,我们为每个数据集实例添加了 p_hrasen_phrasefull answer 属性。完整答案是通过 QA2D [3] 重建的。更多详情请参见论文的 GitHub 仓库:https://github.com/zeaver/MultiFactor。

1.2 数据集来源

2. 数据集结构

tex . ├── dev.json ├── test.json ├── train.json ├── fa_model_inference ├── dev.json ├── test.json └── train.json

每个分割是一个 json 文件,不是 jsonl。请直接使用 json.load(f) 加载。数据集模式如下:

json { "context": "给定的输入上下文", "answer": "给定的答案", "question": "对应的问句", "p_phrase": "给定上下文中的正向短语", "n_phrase": "负向短语", "full answer": "伪金标完整答案(q + a -> 陈述句)", }

我们还提供了 FA_Model 的推理结果在 fa_model_inference/{split}.json 中。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作