five

jon-tow/okapi_arc_challenge

收藏
Hugging Face2023-10-24 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/jon-tow/okapi_arc_challenge
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - ar - bn - ca - da - de - es - eu - fr - gu - hi - hr - hu - hy - id - it - kn - ml - mr - ne - nl - pt - ro - ru - sk - sr - sv - ta - te - uk - vi license: cc-by-nc-4.0 --- # okapi_arc_challenge <!-- Provide a quick summary of the dataset. --> Multilingual translation of [AI2's Arc Challenge](https://allenai.org/data/arc) from the paper *"Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback"* ([Lai et al., 2023](https://arxiv.org/abs/2307.16039)) ## Dataset Details ### Dataset Description <!-- Provide a longer summary of what this dataset is. --> ARC is a dataset of 7,787 genuine grade-school level, multiple-choice science questions assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, where the former contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. We also include a corpus of over 14 million science sentences relevant to the task and an implementation of three neural baseline models for this dataset. We pose ARC as a challenge to the community. - **Curated by:** Dac Lai, Viet and Van Nguyen, Chien and Ngo, Nghia Trung and Nguyen, Thuat and Dernoncourt, Franck and Rossi, Ryan A and Nguyen, Thien Huu - **License:** The datasets are CC BY NC 4.0 (allowing only non-commercial use). ### Dataset Sources <!-- Provide the basic links for the dataset. --> - **Repository:** http://nlp.uoregon.edu/download/okapi-eval/datasets/ - **Paper:** Okapi ([Lai et al., 2023](https://arxiv.org/abs/2307.16039)) ## Citation <!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. --> ```bibtex @article{dac2023okapi, title={Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback}, author={Dac Lai, Viet and Van Nguyen, Chien and Ngo, Nghia Trung and Nguyen, Thuat and Dernoncourt, Franck and Rossi, Ryan A and Nguyen, Thien Huu}, journal={arXiv e-prints}, pages={arXiv--2307}, year={2023} } ``` ```bibtex @article{Clark2018ThinkYH, title={Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge}, author={Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord}, journal={ArXiv}, year={2018}, volume={abs/1803.05457} } ```
提供机构:
jon-tow
原始信息汇总

okapi_arc_challenge

数据集详情

数据集描述

ARC是一个包含7,787个真实的小学水平科学多选题的数据集,旨在鼓励高级问答研究。该数据集分为挑战集和简单集,挑战集仅包含被检索算法和词共现算法都回答错误的问题。此外,还包括一个与任务相关的超过1400万条科学句子的语料库以及该数据集的三个神经网络基线模型实现。

  • 策划者: Dac Lai, Viet 和 Van Nguyen, Chien 等人
  • 许可证: CC BY NC 4.0(仅允许非商业使用)

数据集来源

  • 仓库: http://nlp.uoregon.edu/download/okapi-eval/datasets/
  • 论文: Okapi(Lai et al., 2023

引用

bibtex @article{dac2023okapi, title={Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback}, author={Dac Lai, Viet and Van Nguyen, Chien and Ngo, Nghia Trung and Nguyen, Thuat and Dernoncourt, Franck and Rossi, Ryan A and Nguyen, Thien Huu}, journal={arXiv e-prints}, pages={arXiv--2307}, year={2023} }

bibtex @article{Clark2018ThinkYH, title={Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge}, author={Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord}, journal={ArXiv}, year={2018}, volume={abs/1803.05457} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作