five

SakanaAI/gsm8k-ja-test_250-1319

收藏
Hugging Face2024-05-14 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/SakanaAI/gsm8k-ja-test_250-1319
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 --- # gsm8k-ja-test_250-1319 This dataset contains 1069 Japanese math problems and their solutions. It was used for optimizing LLMs in the paper "[Evolutionary Optimization of Model Merging Recipes](https://arxiv.org/abs/2403.13187)". ## Dataset Details This dataset contains Japanese translations of 1069 math problems and solutions from the [GSM8K](https://huggingface.co/datasets/gsm8k) test set, starting from the 251st example out of 1319. The translation was done using `gpt-4-0125-preview`. We did not use the first 250 examples because they are part of the [MGSM](https://huggingface.co/datasets/juletxara/mgsm) dataset. MGSM is a well-known multilingual version of GSM8k, which includes translations of the first 250 samples from the GSM8k test set. As we were going to use MGSM for the final evaluations, to avoid overlapping with MGSM, we translated the remaining 1069 samples from the GSM8k test set that were not used in MGSM. ### Source Data * [GSM8K](https://huggingface.co/datasets/gsm8k) ### Models * [SakanaAI/EvoLLM-JP-v1-7B](https://huggingface.co/SakanaAI/EvoLLM-JP-v1-7B) * [SakanaAI/EvoLLM-JP-A-v1-7B](https://huggingface.co/SakanaAI/EvoLLM-JP-A-v1-7B) * [SakanaAI/EvoLLM-JP-v1-10B](https://huggingface.co/SakanaAI/EvoLLM-JP-v1-10B) ## Citation ``` @article{DBLP:journals/corr/abs-2110-14168, author = {Karl Cobbe and Vineet Kosaraju and Mohammad Bavarian and Mark Chen and Heewoo Jun and Lukasz Kaiser and Matthias Plappert and Jerry Tworek and Jacob Hilton and Reiichiro Nakano and Christopher Hesse and John Schulman}, title = {Training Verifiers to Solve Math Word Problems}, journal = {CoRR}, volume = {abs/2110.14168}, year = {2021}, url = {https://arxiv.org/abs/2110.14168}, eprinttype = {arXiv}, eprint = {2110.14168}, timestamp = {Mon, 12 Jun 2023 08:23:44 +0200}, biburl = {https://dblp.org/rec/journals/corr/abs-2110-14168.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} } @article{DBLP:journals/corr/abs-2403-13187, author = {Takuya Akiba and Makoto Shing and Yujin Tang and Qi Sun and David Ha}, title = {Evolutionary Optimization of Model Merging Recipes}, journal = {CoRR}, volume = {abs/2403.13187}, year = {2024}, url = {https://doi.org/10.48550/arXiv.2403.13187}, doi = {10.48550/ARXIV.2403.13187}, eprinttype = {arXiv}, eprint = {2403.13187}, timestamp = {Mon, 08 Apr 2024 18:24:51 +0200}, biburl = {https://dblp.org/rec/journals/corr/abs-2403-13187.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} } ```
提供机构:
SakanaAI
原始信息汇总

数据集概述

数据集名称

gsm8k-ja-test_250-1319

数据集内容

本数据集包含1069个日语数学问题及其解答。

数据集用途

用于优化LLMs,并在《Evolutionary Optimization of Model Merging Recipes》论文中被引用。

数据集详情

  • 问题来源:从GSM8K测试集的第251个样本开始,共1069个样本。
  • 翻译工具:使用gpt-4-0125-preview进行翻译。
  • 排除样本:未使用前250个样本,因为这些样本已包含在MGSM数据集中。

数据集来源

  • 原始数据集:GSM8K

相关模型

  • SakanaAI/EvoLLM-JP-v1-7B
  • SakanaAI/EvoLLM-JP-A-v1-7B
  • SakanaAI/EvoLLM-JP-v1-10B

许可证

apache-2.0

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作