five

dolphin-r1

收藏
魔搭社区2026-04-28 更新2025-02-01 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/dolphin-r1
下载链接
链接失效反馈
官方服务:
资源简介:
# Dolphin R1 🐬 An Apache-2.0 dataset curated by [Eric Hartford](https://huggingface.co/ehartford) and [Cognitive Computations](https://huggingface.co/cognitivecomputations) [![Discord](https://img.shields.io/discord/1156064224225808488?logo=Discord&logoColor=%23ffffff&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FtCMkMDDHwm)](https://discord.gg/cognitivecomputations) Discord: https://discord.gg/cognitivecomputations <img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/hdAvdwZiJaLbGmvSZ3wTT.png" width="600" /> ## Sponsors Our appreciation for the generous sponsors of Dolphin R1 - Without whom this dataset could not exist. - [Dria](https://dria.co) https://x.com/driaforall - Inference Sponsor (DeepSeek) - [Chutes](https://chutes.ai) https://x.com/rayon_labs - Inference Sponsor (Flash) - [Crusoe Cloud](https://crusoe.ai/) - Compute Sponsor - [Andreessen Horowitz](https://a16z.com/) - provided the [grant](https://a16z.com/supporting-the-open-source-ai-community/) that originally launched Dolphin ## Overview We create a 800k sample dataset similar in composition to the one used to train DeepSeek-R1 Distill models. ### Dataset Composition - 300k reasoning samples from DeepSeek-R1 - 300k reasoning samples from Gemini 2.0 flash thinking - 200k samples of Dolphin chat. The purpose of this dataset is to train R1-style reasoning models.

# 海豚R1 🐬 本数据集采用Apache-2.0开源协议,由Eric Hartford(https://huggingface.co/ehartford)与Cognitive Computations(https://huggingface.co/cognitivecomputations)精心打造 [![Discord](https://img.shields.io/discord/1156064224225808488?logo=Discord&logoColor=%23ffffff&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FtCMkMDDHwm)](https://discord.gg/cognitivecomputations) Discord社区:https://discord.gg/cognitivecomputations <img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/hdAvdwZiJaLbGmvSZ3wTT.png" width="600" /> ## 赞助方致谢 谨向海豚R1项目的慷慨赞助方致以诚挚谢意——若无他们的支持,本数据集无法问世: - [Dria](https://dria.co)(https://x.com/driaforall):推理赞助方(基于DeepSeek) - [Chutes](https://chutes.ai)(https://x.com/rayon_labs):推理赞助方(基于Flash) - [Crusoe Cloud](https://crusoe.ai/):算力赞助方 - [Andreessen Horowitz](https://a16z.com/):提供了启动海豚R1项目的资助[grant](https://a16z.com/supporting-the-open-source-ai-community/) ## 概述 本数据集包含80万条样本,其构成与用于训练DeepSeek-R1蒸馏模型的数据集高度相似。 ### 数据集构成 - 30万条DeepSeek-R1推理样本 - 30万条Gemini 2.0 Flash Thinking推理样本 - 20万条海豚对话样本 本数据集旨在训练R1风格的推理模型。
提供机构:
maas
创建时间:
2025-01-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作