five

Nemotron-RL-knowledge-openqa

收藏
魔搭社区2026-01-02 更新2025-11-22 收录
下载链接:
https://modelscope.cn/datasets/nv-community/Nemotron-RL-knowledge-openqa
下载链接
链接失效反馈
官方服务:
资源简介:
## Dataset Description: The Nemotron-RL-knowledge-openQA is a multi-domain synthetic dataset containing knowledge based questions. It is built from unstructured sources such as books and articles and consists of question–answer pairs requiring short responses. The dataset covers a wide range of domains, including physics, biology, mathematics, computer science, engineering, chemistry, law, and others. This dataset is released as part of NVIDIA [NeMo Gym](https://github.com/NVIDIA-NeMo/Gym), a framework for building reinforcement learning environments to train large language models. NeMo Gym contains a growing collection of training environments and datasets to enable Reinforcement Learning from Verifiable Reward (RLVR). NeMo Gym is an open-source library within the [NVIDIA NeMo framework](https://github.com/NVIDIA-NeMo/), NVIDIA's GPU accelerated, end-to-end training framework for large language models (LLMs), multi-modal models and speech models. This dataset is part of the [Nemo Gym Collection](https://huggingface.co/collections/nvidia/nemo-gym). This dataset is ready for commercial use. ## Dataset Owner(s): NVIDIA Corporation ## Dataset Creation Date: Oct 10, 2025 ## License/Terms of Use: CC-BY 4.0 ## Intended Usage: To be used with [NeMo Gym](https://github.com/NVIDIA-NeMo/Gym) for post-training LLMs. ## Dataset Characterization Data Collection Method<br> * [Automated] <br> Labeling Method<br> * [Synthetic] <br> ## Dataset Format Text Only, Compatible with [NeMo Gym](https://github.com/NVIDIA-NeMo/Gym) ## Dataset Quantification Number of records: 135987 tuples of (question, answer) Features per record: 5 Total Data Storage: 50.5MB ## Reference(s): [NeMo Gym](https://github.com/NVIDIA-NeMo/Gym) ## Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).

数据集描述 Nemotron-RL-knowledge-openQA是一款多领域合成式知识问答数据集,其构建依托书籍、文章等非结构化数据源,由需要简短作答的问答对组成。该数据集覆盖物理、生物、数学、计算机科学、工程学、化学、法学等众多学科领域。 本数据集作为NVIDIA旗下NeMo Gym(https://github.com/NVIDIA-NeMo/Gym)的组成部分发布,该框架用于构建强化学习环境以训练大语言模型。NeMo Gym收录了日益丰富的训练环境与数据集集合,旨在支持基于可验证奖励的强化学习(Reinforcement Learning from Verifiable Reward,RLVR)。 NeMo Gym是NVIDIA NeMo框架(https://github.com/NVIDIA-NeMo/)下的开源库,该框架是NVIDIA推出的GPU加速型端到端训练框架,可用于大语言模型、多模态模型以及语音模型的训练。 本数据集隶属于Nemo Gym数据集合集(https://huggingface.co/collections/nvidia/nemo-gym)。 本数据集支持商用。 数据集所有者:NVIDIA公司 数据集创建日期:2025年10月10日 使用许可/条款:CC-BY 4.0协议 预期用途:配合NeMo Gym(https://github.com/NVIDIA-NeMo/Gym)用于大语言模型的后训练环节。 数据集特征 数据采集方式:*[自动化采集]* 标注方式:*[合成生成]* 数据集格式:仅文本格式,兼容NeMo Gym(https://github.com/NVIDIA-NeMo/Gym) 数据集量化信息 记录总数:135987条(问题、答案)元组 单条记录特征数:5项 总数据存储量:50.5MB 参考文献:NeMo Gym(https://github.com/NVIDIA-NeMo/Gym) 伦理考量 NVIDIA认为可信人工智能是一项共同责任,已建立相关政策与实践规范,以支持各类人工智能应用的开发。开发者在依照服务条款下载或使用本数据集时,应与其内部模型团队协作,确保该模型符合相关行业与应用场景的要求,并应对可能出现的产品误用问题。 若需反馈模型质量、风险、安全漏洞或NVIDIA人工智能相关问题,请访问此处(https://www.nvidia.com/en-us/support/submit-security-vulnerability/)提交。
提供机构:
maas
创建时间:
2025-11-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作