five

marin-community/open-thoughts-4-128-math-qwen3-32b-annotated-32768-tokens-n8-reformatted

收藏
Hugging Face2026-03-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/marin-community/open-thoughts-4-128-math-qwen3-32b-annotated-32768-tokens-n8-reformatted
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: row_id dtype: int64 - name: instruction_seed dtype: string - name: _source dtype: string - name: gpt41_mini_response dtype: string - name: __original_row_idx dtype: int64 - name: length dtype: int64 - name: ms_id dtype: int64 - name: generated_text dtype: string - name: final_answer dtype: string - name: complete_responses_count dtype: int64 splits: - name: train num_bytes: 60576957 num_examples: 1024 download_size: 18689160 dataset_size: 60576957 configs: - config_name: default data_files: - split: train path: data/train-* --- # open-thoughts-4-128-math-qwen3-32b-annotated-32768-tokens-n8-reformatted A 1,024-sample subset of [marin-community/open-thoughts-4-30k-math-qwen3-32b-annotated-32768-tokens-n8-reformatted](https://huggingface.co/datasets/marin-community/open-thoughts-4-30k-math-qwen3-32b-annotated-32768-tokens-n8-reformatted), containing the first 1,024 rows in their original order. ## Overview - **Total rows:** 1,024 - **Unique prompts:** 128 (each with 8 response annotations, hence n8) - **Source:** OpenThoughts-4 math problems annotated by Qwen3-32B - **Max sequence length:** 32,768 tokens ## Columns | Column | Description | |--------|-------------| | `row_id` | Sequential identifier (0–1023) | | `instruction_seed` | The math problem prompt | | `generated_text` | Qwen3-32B generated response | | `ms_id` | Math seed ID — groups all 8 responses for the same prompt | | `final_answer` | Extracted final answer | | `_source` | Source dataset identifier | | `__original_row_idx` | Row index in the pre-reformatted dataset | | `gpt41_mini_response` | GPT-4.1 mini reference response | | `length` | Response length | | `complete_responses_count` | Number of complete responses for this prompt | ## Construction Extracted by taking the first 1,024 rows of the parent dataset (no shuffling). Verified that all 128 prompts have exactly 8 responses each. The `row_id` column was reset to 0–1023.
提供机构:
marin-community
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作