dolphin-r1
收藏魔搭社区2026-04-28 更新2025-02-01 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/dolphin-r1
下载链接
链接失效反馈官方服务:
资源简介:
# Dolphin R1 🐬
An Apache-2.0 dataset curated by [Eric Hartford](https://huggingface.co/ehartford) and [Cognitive Computations](https://huggingface.co/cognitivecomputations)
[](https://discord.gg/cognitivecomputations)
Discord: https://discord.gg/cognitivecomputations
<img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/hdAvdwZiJaLbGmvSZ3wTT.png" width="600" />
## Sponsors
Our appreciation for the generous sponsors of Dolphin R1 - Without whom this dataset could not exist.
- [Dria](https://dria.co) https://x.com/driaforall - Inference Sponsor (DeepSeek)
- [Chutes](https://chutes.ai) https://x.com/rayon_labs - Inference Sponsor (Flash)
- [Crusoe Cloud](https://crusoe.ai/) - Compute Sponsor
- [Andreessen Horowitz](https://a16z.com/) - provided the [grant](https://a16z.com/supporting-the-open-source-ai-community/) that originally launched Dolphin
## Overview
We create a 800k sample dataset similar in composition to the one used to train DeepSeek-R1 Distill models.
### Dataset Composition
- 300k reasoning samples from DeepSeek-R1
- 300k reasoning samples from Gemini 2.0 flash thinking
- 200k samples of Dolphin chat.
The purpose of this dataset is to train R1-style reasoning models.
# 海豚R1 🐬
本数据集采用Apache-2.0开源协议,由Eric Hartford(https://huggingface.co/ehartford)与Cognitive Computations(https://huggingface.co/cognitivecomputations)精心打造
[](https://discord.gg/cognitivecomputations)
Discord社区:https://discord.gg/cognitivecomputations
<img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/hdAvdwZiJaLbGmvSZ3wTT.png" width="600" />
## 赞助方致谢
谨向海豚R1项目的慷慨赞助方致以诚挚谢意——若无他们的支持,本数据集无法问世:
- [Dria](https://dria.co)(https://x.com/driaforall):推理赞助方(基于DeepSeek)
- [Chutes](https://chutes.ai)(https://x.com/rayon_labs):推理赞助方(基于Flash)
- [Crusoe Cloud](https://crusoe.ai/):算力赞助方
- [Andreessen Horowitz](https://a16z.com/):提供了启动海豚R1项目的资助[grant](https://a16z.com/supporting-the-open-source-ai-community/)
## 概述
本数据集包含80万条样本,其构成与用于训练DeepSeek-R1蒸馏模型的数据集高度相似。
### 数据集构成
- 30万条DeepSeek-R1推理样本
- 30万条Gemini 2.0 Flash Thinking推理样本
- 20万条海豚对话样本
本数据集旨在训练R1风格的推理模型。
提供机构:
maas
创建时间:
2025-01-30



