zeaver/multifactor_squad1.1_zhou
收藏Hugging Face2023-11-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/zeaver/multifactor_squad1.1_zhou
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-generation
- question-answering
language:
- en
tags:
- question-generation
- HotpotQA
size_categories:
- 10K<n<100K
---
# MultiFactor-HotpotQA-SuppFacts
<!-- Provide a quick summary of the dataset. -->
The MultiFactor datasets -- SQuAD1.1-Zhou Split [1] in EMNLP 2023 Findings: [*Improving Question Generation with Multi-level Content Planning*](https://arxiv.org/abs/2310.13512).
## 1. Dataset Details
### 1.1 Dataset Description
SQuAD1.1-Zhou Split [1, 2] in EMNLP 2023 Findings: [*Improving Question Generation with Multi-level Content Planning*](https://arxiv.org/abs/2310.13512).
Based on the dataset in [2], we add the `p_hrase`, `n_phrase` and `full answer` attributes for every dataset instance.
The full answer is reconstructed with [QA2D](https://github.com/kelvinguu/qanli) [3]. More details are in paper github: https://github.com/zeaver/MultiFactor.
### 1.2 Dataset Sources
<!-- Provide the basic links for the dataset. -->
- **Repository:** https://github.com/zeaver/MultiFactor
- **Paper:** [*Improving Question Generation with Multi-level Content Planning*](https://arxiv.org/abs/2310.13512). EMNLP Findings, 2023.
## 2. Dataset Structure
```tex
.
├── dev.json
├── test.json
├── train.json
├── fa_model_inference
├── dev.json
├── test.json
└── train.json
```
Each split is a json file, not jsonl. Please load it with `json.load(f)` directly. And the dataset schema is:
```json
{
"context": "the given input context",
"answer": "the given answer",
"question": "the corresponding question",
"p_phrase": "the postive phrases in the given context",
"n_phrase": "the negative phrases",
"full answer": "pseudo-gold full answer (q + a -> a declarative sentence)",
}
```
We also provide the *FA_Model*'s inference results in `fa_model_inference/{split}.json`.
## 3. Dataset Card Contact
If you have any question, feel free to contact with me: zehua.xia1999@gmail.com
## Reference
[1] Rajpurkar, Pranav, et al. [SQuAD: 100,000+ Questions for Machine Comprehension of Text](https://aclanthology.org/D16-1264/). EMNLP, 2016.
[2] Zhou, Qingyu, et al. [Neural Question Generation from Text: A Preliminary Study](https://arxiv.org/abs/1704.01792). EMNLP, 2017.
[3] Demszky, Dorottya, et al. [Transforming Question Answering Datasets Into Natural Language Inference Datasets](https://arxiv.org/abs/1809.02922). Stanford University. arXiv, 2018.
提供机构:
zeaver
原始信息汇总
MultiFactor-HotpotQA-SuppFacts
1. 数据集详情
1.1 数据集描述
MultiFactor 数据集 -- SQuAD1.1-Zhou Split [1, 2] 在 EMNLP 2023 Findings 中提出:Improving Question Generation with Multi-level Content Planning。
基于 [2] 中的数据集,我们为每个数据集实例添加了 p_hrase、n_phrase 和 full answer 属性。完整答案是通过 QA2D [3] 重建的。更多详情请参见论文的 GitHub 仓库:https://github.com/zeaver/MultiFactor。
1.2 数据集来源
- 仓库: https://github.com/zeaver/MultiFactor
- 论文: Improving Question Generation with Multi-level Content Planning。EMNLP Findings, 2023。
2. 数据集结构
tex . ├── dev.json ├── test.json ├── train.json ├── fa_model_inference ├── dev.json ├── test.json └── train.json
每个分割是一个 json 文件,不是 jsonl。请直接使用 json.load(f) 加载。数据集模式如下:
json { "context": "给定的输入上下文", "answer": "给定的答案", "question": "对应的问句", "p_phrase": "给定上下文中的正向短语", "n_phrase": "负向短语", "full answer": "伪金标完整答案(q + a -> 陈述句)", }
我们还提供了 FA_Model 的推理结果在 fa_model_inference/{split}.json 中。



