marin-community/open-thoughts-4-128-math-qwen3-32b-annotated-32768-tokens
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/marin-community/open-thoughts-4-128-math-qwen3-32b-annotated-32768-tokens
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: row_id
dtype: int64
- name: instruction_seed
dtype: string
- name: _source
dtype: string
- name: gpt41_mini_response
dtype: string
- name: __original_row_idx
dtype: int64
- name: length
dtype: int64
- name: ms_id
dtype: int64
- name: generated_text
dtype: string
- name: final_answer
dtype: string
- name: complete_responses_count
dtype: int64
splits:
- name: train
num_bytes: 60577594
num_examples: 1024
download_size: 18689160
dataset_size: 60577594
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# open-thoughts-4-128-math-qwen3-32b-annotated-32768-tokens
Math reasoning responses generated by **Qwen3-32B** ([Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B)).
## Overview
- **Total rows:** 1,024
- **Unique prompts:** 128 (each with 8 response annotations)
- **Source prompts:** [marin-community/open-thoughts-4-30k-math-qwen3-32b-annotated-32768-tokens-n8-reformatted](https://huggingface.co/datasets/marin-community/open-thoughts-4-30k-math-qwen3-32b-annotated-32768-tokens-n8-reformatted)
- **Prompt alignment:** Exact `instruction_seed` match to [marin-community/open-thoughts-4-128-math-kimi-k2pt5-annotated-32768-tokens](https://huggingface.co/datasets/marin-community/open-thoughts-4-128-math-kimi-k2pt5-annotated-32768-tokens)
- **Generation model:** [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B)
- **Max tokens:** 32,768
- **Temperature:** 0.8
- **Tokenizer used for stats:** Qwen/Qwen2.5-3B
## Statistics
| Metric | Value |
|--------|-------|
| Avg tokens per response | 17,088 |
| Median tokens per response | 14,557 |
| Responses with `<think>` tag | 1020/1024 (99.6%) |
| Complete responses (has `</think>` + `\boxed{...}`) | 910/1024 (88.9%) |
| Truncated responses | 110/1024 (10.7%) |
| Empty responses | 4/1024 (0.4%) |
## Columns
| Column | Description |
|--------|-------------|
| `row_id` | Row identifier preserved from the source dataset |
| `instruction_seed` | The math problem prompt |
| `generated_text` | Qwen3-32B generated response with a `<think>...</think>` reasoning trace when present |
| `ms_id` | Math seed ID, groups all 8 responses for the same prompt |
| `_source` | Source dataset identifier |
| `gpt41_mini_response` | GPT-4.1 mini reference response |
| `__original_row_idx` | Row index from the pre-reformatted source pipeline |
| `length` | Length metadata carried over from the source dataset |
| `final_answer` | Extracted final answer when present |
| `complete_responses_count` | Number of complete responses in the source n=8 group for the prompt |
## Response Format
Each response in the `generated_text` column generally follows this format:
```text
<think>
[model reasoning trace]
</think>
[final answer, typically containing \boxed{...}]
```
Most responses include an opening `<think>` tag; empty responses account for the remainder.
Responses that are truncated may be missing the closing `</think>` tag and or the `\boxed{...}` answer.
## Construction
Created by taking the first 1,024 rows of [marin-community/open-thoughts-4-30k-math-qwen3-32b-annotated-32768-tokens-n8-reformatted](https://huggingface.co/datasets/marin-community/open-thoughts-4-30k-math-qwen3-32b-annotated-32768-tokens-n8-reformatted) without shuffling.
The `instruction_seed` sequence was checked against the Kimi K2.5 128-prompt reference dataset and matched exactly across all 1,024 rows.
提供机构:
marin-community



