jon-tow/okapi_arc_challenge

Name: jon-tow/okapi_arc_challenge
Creator: jon-tow
Published: 2023-10-24 00:02:35
License: 暂无描述

Hugging Face2023-10-24 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/jon-tow/okapi_arc_challenge

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - ar - bn - ca - da - de - es - eu - fr - gu - hi - hr - hu - hy - id - it - kn - ml - mr - ne - nl - pt - ro - ru - sk - sr - sv - ta - te - uk - vi license: cc-by-nc-4.0 --- # okapi_arc_challenge  Multilingual translation of [AI2's Arc Challenge](https://allenai.org/data/arc) from the paper *"Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback"* ([Lai et al., 2023](https://arxiv.org/abs/2307.16039)) ## Dataset Details ### Dataset Description  ARC is a dataset of 7,787 genuine grade-school level, multiple-choice science questions assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, where the former contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. We also include a corpus of over 14 million science sentences relevant to the task and an implementation of three neural baseline models for this dataset. We pose ARC as a challenge to the community. - **Curated by:** Dac Lai, Viet and Van Nguyen, Chien and Ngo, Nghia Trung and Nguyen, Thuat and Dernoncourt, Franck and Rossi, Ryan A and Nguyen, Thien Huu - **License:** The datasets are CC BY NC 4.0 (allowing only non-commercial use). ### Dataset Sources  - **Repository:** http://nlp.uoregon.edu/download/okapi-eval/datasets/ - **Paper:** Okapi ([Lai et al., 2023](https://arxiv.org/abs/2307.16039)) ## Citation  ```bibtex @article{dac2023okapi, title={Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback}, author={Dac Lai, Viet and Van Nguyen, Chien and Ngo, Nghia Trung and Nguyen, Thuat and Dernoncourt, Franck and Rossi, Ryan A and Nguyen, Thien Huu}, journal={arXiv e-prints}, pages={arXiv--2307}, year={2023} } ``` ```bibtex @article{Clark2018ThinkYH, title={Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge}, author={Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord}, journal={ArXiv}, year={2018}, volume={abs/1803.05457} } ```

提供机构：

jon-tow

原始信息汇总

okapi_arc_challenge

数据集详情

数据集描述

ARC是一个包含7,787个真实的小学水平科学多选题的数据集，旨在鼓励高级问答研究。该数据集分为挑战集和简单集，挑战集仅包含被检索算法和词共现算法都回答错误的问题。此外，还包括一个与任务相关的超过1400万条科学句子的语料库以及该数据集的三个神经网络基线模型实现。

策划者： Dac Lai, Viet 和 Van Nguyen, Chien 等人
许可证： CC BY NC 4.0（仅允许非商业使用）

数据集来源

仓库： http://nlp.uoregon.edu/download/okapi-eval/datasets/
论文： Okapi（Lai et al., 2023）

引用

bibtex @article{dac2023okapi, title={Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback}, author={Dac Lai, Viet and Van Nguyen, Chien and Ngo, Nghia Trung and Nguyen, Thuat and Dernoncourt, Franck and Rossi, Ryan A and Nguyen, Thien Huu}, journal={arXiv e-prints}, pages={arXiv--2307}, year={2023} }

bibtex @article{Clark2018ThinkYH, title={Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge}, author={Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord}, journal={ArXiv}, year={2018}, volume={abs/1803.05457} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集