marin-community/open-thoughts-4-128-math-qwen3-4b-annotated-32768-tokens

Name: marin-community/open-thoughts-4-128-math-qwen3-4b-annotated-32768-tokens
Creator: marin-community
Published: 2026-04-08 22:45:57
License: 暂无描述

Hugging Face2026-04-08 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/marin-community/open-thoughts-4-128-math-qwen3-4b-annotated-32768-tokens

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: row_id dtype: int64 - name: instruction_seed dtype: string - name: _source dtype: string - name: gpt41_mini_response dtype: string - name: __original_row_idx dtype: int64 - name: length dtype: int64 - name: ms_id dtype: int64 - name: generated_text dtype: string - name: final_answer dtype: string - name: complete_responses_count dtype: int64 splits: - name: train num_bytes: 76226919 num_examples: 1024 download_size: 16120994 dataset_size: 76226919 configs: - config_name: default data_files: - split: train path: data/train-* --- # open-thoughts-4-128-math-qwen3-4b-annotated-32768-tokens Math reasoning responses generated by **Qwen3-4B** ([Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)). ## Overview - **Total rows:** 1,024 - **Unique prompts:** 128 (each with 8 response annotations) - **Source prompts:** [marin-community/open-thoughts-4-30k-math-qwen3-4b-annotated-32768-tokens-n8-reformatted](https://huggingface.co/datasets/marin-community/open-thoughts-4-30k-math-qwen3-4b-annotated-32768-tokens-n8-reformatted) - **Prompt alignment:** Exact `instruction_seed` match to [marin-community/open-thoughts-4-128-math-kimi-k2pt5-annotated-32768-tokens](https://huggingface.co/datasets/marin-community/open-thoughts-4-128-math-kimi-k2pt5-annotated-32768-tokens) - **Generation model:** [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) - **Max tokens:** 32,768 - **Temperature:** 0.8 - **Tokenizer used for stats:** Qwen/Qwen2.5-3B ## Statistics | Metric | Value | |--------|-------| | Avg tokens per response | 20,625 | | Median tokens per response | 19,402 | | Responses with `<think>` tag | 1024/1024 (100.0%) | | Complete responses (has `</think>` + `\boxed{...}`) | 745/1024 (72.8%) | | Truncated responses | 279/1024 (27.2%) | | Empty responses | 0/1024 (0.0%) | ## Columns | Column | Description | |--------|-------------| | `row_id` | Row identifier preserved from the source dataset | | `instruction_seed` | The math problem prompt | | `generated_text` | Qwen3-4B generated response with a `<think>...</think>` reasoning trace | | `ms_id` | Math seed ID, groups all 8 responses for the same prompt | | `_source` | Source dataset identifier | | `gpt41_mini_response` | GPT-4.1 mini reference response | | `__original_row_idx` | Row index from the pre-reformatted source pipeline | | `length` | Length metadata carried over from the source dataset | | `final_answer` | Extracted final answer when present | | `complete_responses_count` | Number of complete responses in the source n=8 group for the prompt | ## Response Format Each response in the `generated_text` column generally follows this format: ```text <think> [model reasoning trace] </think> [final answer, typically containing \boxed{...}] ``` This model emits an opening `<think>` tag for the reasoning trace. Responses that are truncated may be missing the closing `</think>` tag and or the `\boxed{...}` answer. ## Construction Created by taking the first 1,024 rows of [marin-community/open-thoughts-4-30k-math-qwen3-4b-annotated-32768-tokens-n8-reformatted](https://huggingface.co/datasets/marin-community/open-thoughts-4-30k-math-qwen3-4b-annotated-32768-tokens-n8-reformatted) without shuffling. The `instruction_seed` sequence was checked against the Kimi K2.5 128-prompt reference dataset and matched exactly across all 1,024 rows.

提供机构：

marin-community

5,000+

优质数据集

54 个

任务类型

进入经典数据集