marin-community/open-thoughts-4-128-math-qwen3-32b-annotated-32768-tokens-n8-reformatted
收藏Hugging Face2026-03-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/marin-community/open-thoughts-4-128-math-qwen3-32b-annotated-32768-tokens-n8-reformatted
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: row_id
dtype: int64
- name: instruction_seed
dtype: string
- name: _source
dtype: string
- name: gpt41_mini_response
dtype: string
- name: __original_row_idx
dtype: int64
- name: length
dtype: int64
- name: ms_id
dtype: int64
- name: generated_text
dtype: string
- name: final_answer
dtype: string
- name: complete_responses_count
dtype: int64
splits:
- name: train
num_bytes: 60576957
num_examples: 1024
download_size: 18689160
dataset_size: 60576957
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# open-thoughts-4-128-math-qwen3-32b-annotated-32768-tokens-n8-reformatted
A 1,024-sample subset of [marin-community/open-thoughts-4-30k-math-qwen3-32b-annotated-32768-tokens-n8-reformatted](https://huggingface.co/datasets/marin-community/open-thoughts-4-30k-math-qwen3-32b-annotated-32768-tokens-n8-reformatted), containing the first 1,024 rows in their original order.
## Overview
- **Total rows:** 1,024
- **Unique prompts:** 128 (each with 8 response annotations, hence n8)
- **Source:** OpenThoughts-4 math problems annotated by Qwen3-32B
- **Max sequence length:** 32,768 tokens
## Columns
| Column | Description |
|--------|-------------|
| `row_id` | Sequential identifier (0–1023) |
| `instruction_seed` | The math problem prompt |
| `generated_text` | Qwen3-32B generated response |
| `ms_id` | Math seed ID — groups all 8 responses for the same prompt |
| `final_answer` | Extracted final answer |
| `_source` | Source dataset identifier |
| `__original_row_idx` | Row index in the pre-reformatted dataset |
| `gpt41_mini_response` | GPT-4.1 mini reference response |
| `length` | Response length |
| `complete_responses_count` | Number of complete responses for this prompt |
## Construction
Extracted by taking the first 1,024 rows of the parent dataset (no shuffling). Verified that all 128 prompts have exactly 8 responses each. The `row_id` column was reset to 0–1023.
提供机构:
marin-community



