five

Jingmiao/PUZZLEQA

收藏
Hugging Face2023-06-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Jingmiao/PUZZLEQA
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 --- ### Acknowledgements The PUZZLEQA is scraped from [NPR Sunday Puzzle Official Website](https://www.npr.org/series/4473090/sunday-puzzle) and [NPR Puzzle Synopsis](https://groups.google.com/g/nprpuzzle), made by a group of fans by running a mailing list that distributed questions and answers for each week’s puzzle. The authors of the dataset cleaned the data and made some multiple choice based on the question and answers. ### Creation The Multiple Choice Dataset is generated from PUZZLEQA dataset using the following algorithm. 1. Read the fr_big_exp.tsv.tsv file 2. Group rule-question-answer triples in a given Sunday together (so the rules of each question will be the same) 3. For each question, randomly select three other answers from answers on the same Sunday. Shuffle 3 selected answers with the correct answer for the given question to obtain 4 choices for this question. \\ 4. identify the correct answer for the given question as the "gold" answer. Recent.tsv is the dataset based on the NPR PUZZLE in 2023. # Citation @inproceedings{zhao2023solving, title={Solving and Generating NPR Sunday Puzzles with Large Language Models}, author={Jingmiao Zhao and Carolyn Jane Anderson}, year={2023}, eprint={2306.12255}, archivePrefix={arXiv}, primaryClass={cs.CL} }
提供机构:
Jingmiao
原始信息汇总

数据集概述

数据来源

数据处理

  • 数据集作者对数据进行了清洗,并根据问题和答案制作了多项选择题。

数据生成算法

  1. 读取fr_big_exp.tsv.tsv文件。
  2. 将同一周日的问题-规则-答案三元组进行分组。
  3. 对于每个问题,从同一周日的答案中随机选择三个其他答案,并与正确答案一起洗牌,形成四个选择。
  4. 将给定问题的正确答案标识为“gold”答案。

数据集文件

  • Recent.tsv:基于2023年NPR PUZZLE的数据集。

引用信息

@inproceedings{zhao2023solving, title={Solving and Generating NPR Sunday Puzzles with Large Language Models}, author={Jingmiao Zhao and Carolyn Jane Anderson}, year={2023}, eprint={2306.12255}, archivePrefix={arXiv}, primaryClass={cs.CL} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作