marin-community/open-thoughts-4-30k-code-qwen3-30b-a3B-thinking-2507-annotated-32768-tokens-n8-reformatted

Name: marin-community/open-thoughts-4-30k-code-qwen3-30b-a3B-thinking-2507-annotated-32768-tokens-n8-reformatted
Creator: marin-community
Published: 2026-03-09 04:09:23
License: 暂无描述

Hugging Face2026-03-09 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/marin-community/open-thoughts-4-30k-code-qwen3-30b-a3B-thinking-2507-annotated-32768-tokens-n8-reformatted

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: row_id dtype: int64 - name: id dtype: string - name: instruction_seed dtype: string - name: output dtype: string - name: source dtype: string - name: license dtype: string - name: dataset dtype: string - name: difficulty dtype: int64 - name: solution dtype: string - name: index dtype: string - name: _source dtype: string - name: difficulty_reasoning dtype: string - name: __original_row_idx dtype: int64 - name: ms_id dtype: int64 - name: generated_text dtype: string - name: final_answer dtype: string - name: complete_responses_count dtype: int64 splits: - name: train num_examples: 239704 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card for Open-Thoughts-4-30K-Code-Qwen3-30B-A3B-Thinking-2507-Annotated-32768-Tokens-N8-Reformatted ## Overview This dataset is a reformatted version of [marin-community/open-thoughts-4-30k-code-qwen3-30b-a3B-thinking-2507-annotated-32768-tokens-n8](https://huggingface.co/datasets/marin-community/open-thoughts-4-30k-code-qwen3-30b-a3B-thinking-2507-annotated-32768-tokens-n8). The original dataset contained 29,963 samples, each with 8 responses generated by the same model with different random seeds (stored in `generated_text`, `generated_text2`, ..., `generated_text8` columns). This reformatted version expands each response into its own row, resulting in **29,963 x 8 = 239,704 samples** with a single `generated_text` column. The rows are ordered so that the `ms_id` ordering matches the reference dataset [marin-community/open-thoughts-4-30k-code-qwen3-32b-annotated](https://huggingface.co/datasets/marin-community/open-thoughts-4-30k-code-qwen3-32b-annotated). All 8 responses for a given prompt appear contiguously (e.g., rows 0-7 share the same prompt, rows 8-15 share the next prompt, and so on). ## Generation Details - **Model:** [Qwen/Qwen3-30B-A3B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507) - **Temperature:** 0.8 - **Max Output Tokens:** 32768 - **Responses per prompt:** 8 (different random seeds) ## Dataset Statistics - **Number of Samples:** 239,704 - **Number of Unique Prompts:** 29,963 - **Responses per Prompt:** 8 ## Dataset Structure | Column | Description | |--------|-------------| | `row_id` | A unique row identifier (0 to 239,703) | | `instruction_seed` | The original code problem/question text without chat formatting | | `_source` | The origin dataset (e.g., `ai2-adapt-dev/opencode-2-code`); tracks data provenance | | `output` | Reference output from the source dataset | | `__original_row_idx` | The row index from the original source dataset before filtering/processing | | `ms_id` | A unique sample identifier (shared across the 8 responses for the same prompt) | | `generated_text` | A response including chain-of-thought with `<think>` tags, generated by Qwen3-30B-A3B-Thinking-2507 | | `final_answer` | The extracted final answer from `\boxed{...}` after the `</think>` token, or `N/A` if the response is incomplete | | `complete_responses_count` | Number of complete responses (0-8) for this prompt; a response is complete if it contains `</think>` followed by a valid `\boxed{...}` | ## Related Datasets - [open-thoughts-4-30k-code-qwen3-30b-a3B-thinking-2507-annotated-32768-tokens-n8](https://huggingface.co/datasets/marin-community/open-thoughts-4-30k-code-qwen3-30b-a3B-thinking-2507-annotated-32768-tokens-n8) — Original dataset with 8 response columns per row

提供机构：

marin-community

5,000+

优质数据集

54 个

任务类型

进入经典数据集