RoMQA

Name: RoMQA
Creator: 华盛顿大学 Meta AI
Published: 2022-11-16 01:30:07
License: 暂无描述

arXiv2022-11-16 更新2024-07-24 收录

下载链接：

https://github.com/facebookresearch/romqa

下载链接

链接失效反馈

官方服务：

资源简介：

RoMQA是由华盛顿大学和Meta AI共同创建的第一个用于健壮、多证据、多答案问答（QA）的基准数据集。该数据集包含从Wikidata知识图中挖掘的相关约束派生的问题集群，旨在评估QA模型对不同约束的鲁棒性。与之前的QA数据集相比，RoMQA包含更多的人工编写问题，这些问题需要对更多的证据文本进行推理，并且平均而言，有更多的正确答案。此外，人类标注者认为RoMQA问题更自然，更可能被人们提出。数据集的创建过程涉及从知识库中采样约束、聚类相关约束、采样形成逻辑查询的隐含约束以及标注语言问题。RoMQA的应用领域在于测试大型语言模型在零样本、少样本和微调设置下的性能，旨在构建更鲁棒的QA方法。

RoMQA is the first benchmark dataset for robust, multi-evidence, multi-answer question answering (QA), co-created by the University of Washington and Meta AI. This dataset comprises question clusters derived from relevant constraints mined from the Wikidata knowledge graph, and it aims to evaluate the robustness of QA models against various constraints. Compared with prior QA datasets, RoMQA includes more manually written questions that require reasoning over multiple pieces of evidence text, and on average, has more correct answers per question. Additionally, human annotators consider RoMQA questions to be more natural and more likely to be posed by ordinary people. The dataset creation process involves sampling constraints from the knowledge base, clustering related constraints, sampling implicit constraints to form logical queries, and annotating natural language questions. The application scenarios of RoMQA focus on testing the performance of large language models under zero-shot, few-shot, and fine-tuning settings, with the goal of developing more robust QA methods.

提供机构：

华盛顿大学 Meta AI

创建时间：

2022-10-26

原始信息汇总

RoMQA 数据集概述

数据来源与获取

数据生成：RoMQA 数据集可以通过脚本从标注、Wikidata 和 T-REx 中重新生成，脚本位于 dataset_construction 目录下。
第三方数据下载：也可以从第三方下载已生成的数据集，下载链接为 romqa_data.zip。
数据解压与放置：下载后需解压至 ./data 目录，实验代码默认该目录包含正确数据文件。

数据集结构

数据分割：如果自行生成数据，应将分割文件手动放置在 data/{open,closed,gold} 目录中。

实验运行

开放设置： bash python train_baselines.py --config-name open --multirun hydra/launcher=slurm hydra.launcher.partition=<partition> model=seq2seq_nl,seq2seq_dpr_nl hydra.launcher.constraint=volta32gb seed=1,2,3,4,5 project=open-1
封闭设置： bash python train_baselines.py --config-name closed --multirun hydra/launcher=slurm hydra.launcher.partition=<partition> model=binary_nl,binary_dpr_nl hydra.launcher.constraint=volta32gb seed=1,2,3,4,5 project=closed-1
黄金证据设置： bash python train_baselines.py --config-name gold --multirun hydra/launcher=slurm hydra.launcher.partition=<partition> model=binary_gold_sent_nl hydra.launcher.constraint=volta32gb seed=1,2,3,4,5 project=gold-1

提交与评估

提交格式：提交的 JSON 文件应包含示例 id 和模型预测的 top-k 实体列表。
开发集验证： bash python predict.py --fdata data/open/top_20.dev.json.bz2 --fout pred.open.dev.json saves/open-1/sweep/15-seq2seq_dpr_nl-default/ python evaluation.py --fpred saves/open-1/sweep/15-seq2seq_dpr_nl-default/pred.open.dev.json --fdata data/gold/dev.json.bz2 --fout open.dev.eval.json
测试集预测： bash python predict.py --fdata data/open/top_20.test.noanswer.json.bz2 --fout pred.open.test.json saves/open-1/sweep/15-seq2seq_dpr_nl-default/ python predict.py --fdata data/closed/top_20.test.noanswer.json.bz2 --fout pred.closed.test.json saves/closed-1/sweep/15-binary_dpr_nl-default/
CodaLab 提交与评估： bash cl upload pred.open.dev.json cl run -n <open_or_closed>dev<my_model_name> -d "<model_name> by <my_name> at <my_affiliation>" --request-docker-image vzhong/romqa:0.1 --request-memory 8g evaluation.py:0x627bae34595e4bf4971197c9cb917f5e pred.json:<my_open_dev_uid> data.json.bz2:0x110deb430b3d46459099462ea65ceaf1 --- python evaluation.py --fpred pred.json --fdata data.json.bz2 --fout results.json

提交限制

每月提交次数：每个团队每月最多提交一次。
滥用处理：滥用提交系统的作者将被从排行榜中除名。

匿名提交

匿名处理：如需匿名提交，请在 CodaLab 中使用 anonymous 作为名称和机构，并通过电子邮件联系作者获取结果。

许可证

主要许可证：RoMQA 主要采用 CC-BY-NC 许可证。
部分组件许可证：部分组件采用其他许可证，如 Apache 2.0、MIT 等。

5,000+

优质数据集

54 个

任务类型

进入经典数据集