zeaver/multifactor_squad1.1_zhou

Name: zeaver/multifactor_squad1.1_zhou
Creator: zeaver
Published: 2023-11-30 12:26:36
License: 暂无描述

Hugging Face2023-11-30 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/zeaver/multifactor_squad1.1_zhou

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - text-generation - question-answering language: - en tags: - question-generation - HotpotQA size_categories: - 10K<n<100K --- # MultiFactor-HotpotQA-SuppFacts  The MultiFactor datasets -- SQuAD1.1-Zhou Split [1] in EMNLP 2023 Findings: [*Improving Question Generation with Multi-level Content Planning*](https://arxiv.org/abs/2310.13512). ## 1. Dataset Details ### 1.1 Dataset Description SQuAD1.1-Zhou Split [1, 2] in EMNLP 2023 Findings: [*Improving Question Generation with Multi-level Content Planning*](https://arxiv.org/abs/2310.13512). Based on the dataset in [2], we add the `p_hrase`, `n_phrase` and `full answer` attributes for every dataset instance. The full answer is reconstructed with [QA2D](https://github.com/kelvinguu/qanli) [3]. More details are in paper github: https://github.com/zeaver/MultiFactor. ### 1.2 Dataset Sources  - **Repository:** https://github.com/zeaver/MultiFactor - **Paper:** [*Improving Question Generation with Multi-level Content Planning*](https://arxiv.org/abs/2310.13512). EMNLP Findings, 2023. ## 2. Dataset Structure ```tex . ├── dev.json ├── test.json ├── train.json ├── fa_model_inference ├── dev.json ├── test.json └── train.json ``` Each split is a json file, not jsonl. Please load it with `json.load(f)` directly. And the dataset schema is: ```json { "context": "the given input context", "answer": "the given answer", "question": "the corresponding question", "p_phrase": "the postive phrases in the given context", "n_phrase": "the negative phrases", "full answer": "pseudo-gold full answer (q + a -> a declarative sentence)", } ``` We also provide the *FA_Model*'s inference results in `fa_model_inference/{split}.json`. ## 3. Dataset Card Contact If you have any question, feel free to contact with me: zehua.xia1999@gmail.com ## Reference [1] Rajpurkar, Pranav, et al. [SQuAD: 100,000+ Questions for Machine Comprehension of Text](https://aclanthology.org/D16-1264/). EMNLP, 2016. [2] Zhou, Qingyu, et al. [Neural Question Generation from Text: A Preliminary Study](https://arxiv.org/abs/1704.01792). EMNLP, 2017. [3] Demszky, Dorottya, et al. [Transforming Question Answering Datasets Into Natural Language Inference Datasets](https://arxiv.org/abs/1809.02922). Stanford University. arXiv, 2018.

提供机构：

zeaver

原始信息汇总

MultiFactor-HotpotQA-SuppFacts

1. 数据集详情

1.1 数据集描述

MultiFactor 数据集 -- SQuAD1.1-Zhou Split [1, 2] 在 EMNLP 2023 Findings 中提出：Improving Question Generation with Multi-level Content Planning。

基于 [2] 中的数据集，我们为每个数据集实例添加了 p_hrase、n_phrase 和 full answer 属性。完整答案是通过 QA2D [3] 重建的。更多详情请参见论文的 GitHub 仓库：https://github.com/zeaver/MultiFactor。

1.2 数据集来源

仓库： https://github.com/zeaver/MultiFactor
论文： Improving Question Generation with Multi-level Content Planning。EMNLP Findings, 2023。

2. 数据集结构

tex . ├── dev.json ├── test.json ├── train.json ├── fa_model_inference ├── dev.json ├── test.json └── train.json

每个分割是一个 json 文件，不是 jsonl。请直接使用 json.load(f) 加载。数据集模式如下：

json { "context": "给定的输入上下文", "answer": "给定的答案", "question": "对应的问句", "p_phrase": "给定上下文中的正向短语", "n_phrase": "负向短语", "full answer": "伪金标完整答案（q + a -> 陈述句）", }

我们还提供了 FA_Model 的推理结果在 fa_model_inference/{split}.json 中。

5,000+

优质数据集

54 个

任务类型

进入经典数据集