five

marin-community/open-thoughts-4-5000-math-kimi-k2pt5-annotated-32768-tokens

收藏
Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/marin-community/open-thoughts-4-5000-math-kimi-k2pt5-annotated-32768-tokens
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: row_id dtype: int64 - name: instruction_seed dtype: string - name: _source dtype: string - name: gpt41_mini_response dtype: string - name: __original_row_idx dtype: int64 - name: length dtype: int64 - name: ms_id dtype: int64 - name: generated_text dtype: string - name: final_answer dtype: string - name: complete_responses_count dtype: int64 - name: kimi_k2pt5_generated_text dtype: string splits: - name: train num_examples: 40000 configs: - config_name: default data_files: - split: train path: data/train-* --- # open-thoughts-4-5000-math-kimi-k2pt5-annotated-32768-tokens Math reasoning responses generated by **Kimi K2.5** (moonshotai/Kimi-K2.5) via a Together AI dedicated instance. ## Overview - **Total rows:** 40,000 - **Unique prompts:** 5,000 (each with 8 response annotations) - **Source prompts:** marin-community/open-thoughts-4-30k-math-qwen3-32b-annotated-32768-tokens-n8-reformatted - **Generation model:** moonshotai/Kimi-K2.5 - **Max tokens:** 32,768 - **Temperature:** 0.8 - **Tokenizer used for stats:** Qwen/Qwen2.5-3B ## Statistics | Metric | Value | |--------|-------| | Avg tokens per response | 22,460 | | Median tokens per response | 22,040 | | Responses with `<think>` tag | 100.0% | | Complete responses (has `</think>` + `\boxed{...}`) | 32,473/40,000 (81.2%) | | Truncated responses | 7,527/40,000 (18.8%) | | Empty responses | 0 | ## Columns | Column | Description | |--------|-------------| | `row_id` | Sequential identifier (0-39999) | | `instruction_seed` | The math problem prompt | | `kimi_k2pt5_generated_text` | Kimi K2.5 generated response (with `<think>...</think>` reasoning trace) | | `ms_id` | Math seed ID -- groups all 8 responses for the same prompt | | `_source` | Source dataset identifier | | `gpt41_mini_response` | GPT-4.1 mini reference response | | `length` | Response length | ## Response Format Each response in the `kimi_k2pt5_generated_text` column follows this format: <think>[model's reasoning trace]</think>[final answer, typically containing \boxed{...}] Responses that are truncated (hit the 32,768 token limit) may be missing the closing `</think>` tag and/or the `\boxed{...}` answer. ## Construction Generated by sending each of the 5,000 math prompts to Kimi K2.5 8 times (n=8) via a Together AI dedicated instance, with max_tokens=32768 and temperature=0.8. The model's reasoning trace (from the message.reasoning API field) is wrapped in <think>...</think> tags.
提供机构:
marin-community
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作