marin-community/open-thoughts-4-5000-math-kimi-k2pt5-annotated-32768-tokens
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/marin-community/open-thoughts-4-5000-math-kimi-k2pt5-annotated-32768-tokens
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: row_id
dtype: int64
- name: instruction_seed
dtype: string
- name: _source
dtype: string
- name: gpt41_mini_response
dtype: string
- name: __original_row_idx
dtype: int64
- name: length
dtype: int64
- name: ms_id
dtype: int64
- name: generated_text
dtype: string
- name: final_answer
dtype: string
- name: complete_responses_count
dtype: int64
- name: kimi_k2pt5_generated_text
dtype: string
splits:
- name: train
num_examples: 40000
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# open-thoughts-4-5000-math-kimi-k2pt5-annotated-32768-tokens
Math reasoning responses generated by **Kimi K2.5** (moonshotai/Kimi-K2.5) via a Together AI dedicated instance.
## Overview
- **Total rows:** 40,000
- **Unique prompts:** 5,000 (each with 8 response annotations)
- **Source prompts:** marin-community/open-thoughts-4-30k-math-qwen3-32b-annotated-32768-tokens-n8-reformatted
- **Generation model:** moonshotai/Kimi-K2.5
- **Max tokens:** 32,768
- **Temperature:** 0.8
- **Tokenizer used for stats:** Qwen/Qwen2.5-3B
## Statistics
| Metric | Value |
|--------|-------|
| Avg tokens per response | 22,460 |
| Median tokens per response | 22,040 |
| Responses with `<think>` tag | 100.0% |
| Complete responses (has `</think>` + `\boxed{...}`) | 32,473/40,000 (81.2%) |
| Truncated responses | 7,527/40,000 (18.8%) |
| Empty responses | 0 |
## Columns
| Column | Description |
|--------|-------------|
| `row_id` | Sequential identifier (0-39999) |
| `instruction_seed` | The math problem prompt |
| `kimi_k2pt5_generated_text` | Kimi K2.5 generated response (with `<think>...</think>` reasoning trace) |
| `ms_id` | Math seed ID -- groups all 8 responses for the same prompt |
| `_source` | Source dataset identifier |
| `gpt41_mini_response` | GPT-4.1 mini reference response |
| `length` | Response length |
## Response Format
Each response in the `kimi_k2pt5_generated_text` column follows this format:
<think>[model's reasoning trace]</think>[final answer, typically containing \boxed{...}]
Responses that are truncated (hit the 32,768 token limit) may be missing the closing `</think>` tag and/or the `\boxed{...}` answer.
## Construction
Generated by sending each of the 5,000 math prompts to Kimi K2.5 8 times (n=8) via a Together AI dedicated instance, with max_tokens=32768 and temperature=0.8. The model's reasoning trace (from the message.reasoning API field) is wrapped in <think>...</think> tags.
提供机构:
marin-community



