five

marin-community/open-thoughts-4-30k-code-qwen3-30b-a3B-thinking-2507-annotated-32768-tokens-n8-reformatted

收藏
Hugging Face2026-03-09 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/marin-community/open-thoughts-4-30k-code-qwen3-30b-a3B-thinking-2507-annotated-32768-tokens-n8-reformatted
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: row_id dtype: int64 - name: id dtype: string - name: instruction_seed dtype: string - name: output dtype: string - name: source dtype: string - name: license dtype: string - name: dataset dtype: string - name: difficulty dtype: int64 - name: solution dtype: string - name: index dtype: string - name: _source dtype: string - name: difficulty_reasoning dtype: string - name: __original_row_idx dtype: int64 - name: ms_id dtype: int64 - name: generated_text dtype: string - name: final_answer dtype: string - name: complete_responses_count dtype: int64 splits: - name: train num_examples: 239704 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card for Open-Thoughts-4-30K-Code-Qwen3-30B-A3B-Thinking-2507-Annotated-32768-Tokens-N8-Reformatted ## Overview This dataset is a reformatted version of [marin-community/open-thoughts-4-30k-code-qwen3-30b-a3B-thinking-2507-annotated-32768-tokens-n8](https://huggingface.co/datasets/marin-community/open-thoughts-4-30k-code-qwen3-30b-a3B-thinking-2507-annotated-32768-tokens-n8). The original dataset contained 29,963 samples, each with 8 responses generated by the same model with different random seeds (stored in `generated_text`, `generated_text2`, ..., `generated_text8` columns). This reformatted version expands each response into its own row, resulting in **29,963 x 8 = 239,704 samples** with a single `generated_text` column. The rows are ordered so that the `ms_id` ordering matches the reference dataset [marin-community/open-thoughts-4-30k-code-qwen3-32b-annotated](https://huggingface.co/datasets/marin-community/open-thoughts-4-30k-code-qwen3-32b-annotated). All 8 responses for a given prompt appear contiguously (e.g., rows 0-7 share the same prompt, rows 8-15 share the next prompt, and so on). ## Generation Details - **Model:** [Qwen/Qwen3-30B-A3B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507) - **Temperature:** 0.8 - **Max Output Tokens:** 32768 - **Responses per prompt:** 8 (different random seeds) ## Dataset Statistics - **Number of Samples:** 239,704 - **Number of Unique Prompts:** 29,963 - **Responses per Prompt:** 8 ## Dataset Structure | Column | Description | |--------|-------------| | `row_id` | A unique row identifier (0 to 239,703) | | `instruction_seed` | The original code problem/question text without chat formatting | | `_source` | The origin dataset (e.g., `ai2-adapt-dev/opencode-2-code`); tracks data provenance | | `output` | Reference output from the source dataset | | `__original_row_idx` | The row index from the original source dataset before filtering/processing | | `ms_id` | A unique sample identifier (shared across the 8 responses for the same prompt) | | `generated_text` | A response including chain-of-thought with `<think>` tags, generated by Qwen3-30B-A3B-Thinking-2507 | | `final_answer` | The extracted final answer from `\boxed{...}` after the `</think>` token, or `N/A` if the response is incomplete | | `complete_responses_count` | Number of complete responses (0-8) for this prompt; a response is complete if it contains `</think>` followed by a valid `\boxed{...}` | ## Related Datasets - [open-thoughts-4-30k-code-qwen3-30b-a3B-thinking-2507-annotated-32768-tokens-n8](https://huggingface.co/datasets/marin-community/open-thoughts-4-30k-code-qwen3-30b-a3B-thinking-2507-annotated-32768-tokens-n8) — Original dataset with 8 response columns per row
提供机构:
marin-community
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作