five

Simulated Operand Dataset

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/ChenEmmaL/imitation_abacus
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了用于训练强化学习代理解决算术问题的样本操作数,特别是涉及多数字的加法和减法,且这些操作数的范围超出了训练区间。此外,该代理在不同操作数区间的表现进行了评估,重点在于其分布外(OOD)的泛化能力。每个区间的样本量为10万,任务为解决算术问题。

This dataset contains sample operands for training reinforcement learning (RL) agents to solve arithmetic problems, specifically focusing on multi-digit addition and subtraction tasks where the operand ranges fall outside the training intervals. Additionally, we evaluate the agents' performance across different operand ranges, with particular emphasis on their out-of-distribution (OOD) generalization ability. Each range includes 100,000 samples, and the core task for the agents is to solve arithmetic problems.
提供机构:
Authors of the paper
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作