five

WithinUsAI/seed_ai_150k_package

收藏
Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/WithinUsAI/seed_ai_150k_package
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 tags: - WithInUsAi - agent - Gss1147 --- ⸻ 📘 Dataset Card: seed_ai_150k_package --- license: apache-2.0 task_categories: - text-generation language: - en pretty_name: Seed AI 150K Recursive Training Package size_categories: - 100K<n<1M tags: - synthetic-data - instruction-tuning - reasoning - system-design - llm-training - recursive-training --- # 🧠 Seed AI 150K Recursive Training Package ## Dataset Overview The **Seed AI 150K Recursive Training Package** is a large-scale synthetic instruction dataset designed to support **LLM fine-tuning for structured reasoning, system-level thinking, and iterative reasoning behaviors**. It contains **150,000 training samples**, split into three conceptual reasoning layers: - **Mindset Layer (50K)** - **Mindframe Process Layer (50K)** - **Recursive Reasoning Layer (50K)** The dataset is designed to improve: - structured reasoning consistency - system-level abstraction ability - multi-step decomposition behavior - reflective and iterative reasoning patterns --- ## 📊 Dataset Structure Each record follows a simple instruction-response format: ```json { "instruction": "...", "response": "...", "metadata": { "type": "mindset | mindframe | recursive_reasoning", "domain": "...", "index": 0, "timestamp": "ISO-8601" } } ⸻ 🧩 Dataset Components 1. Mindset Layer (50K) Focuses on: • system-level thinking • abstraction of real-world computing systems • constraint-based reasoning Example behavior: • understanding systems as interacting components • identifying inputs, outputs, and constraints ⸻ 2. Mindframe Process Layer (50K) Focuses on: • step-by-step reasoning • structured decomposition • analytical workflows Encourages models to: • break down complex systems • follow ordered reasoning steps • reduce unstructured responses ⸻ 3. Recursive Reasoning Layer (50K) Focuses on: • self-correction patterns • iterative refinement of explanations • multi-stage reasoning improvement Important: This does NOT create autonomous intelligence loops. It teaches refinement-style reasoning patterns. ⸻ 🧠 Intended Use This dataset is intended for: • Supervised fine-tuning (SFT) • Instruction tuning of LLMs • Reasoning behavior improvement • System design education tasks • Synthetic data augmentation pipelines ⸻ ⚙️ Example Use Case from datasets import load_dataset dataset = load_dataset("GODsStrongestSoldier/seed_ai_150k_package") print(dataset["train"][0]) ⸻ ⚠️ Limitations This dataset has important limitations: • It is fully synthetic (not ground-truth factual data) • It does not guarantee factual correctness • It does not provide external verification sources (RAG not included) • It does not produce autonomous or recursive intelligence • Outputs reflect structured reasoning patterns, not real-world validation Models trained on this dataset should be combined with: • retrieval systems (RAG) • factual verification pipelines • evaluation benchmarks ⸻ 📦 Dataset Size • Total samples: 150,000 • Format: JSONL • Encoding: UTF-8 • Structure: instruction / response / metadata ⸻ 📚 Recommended Training Setup For best results: • Use supervised fine-tuning (SFT) • Combine with real-world corpora (Wikipedia, arXiv) • Add retrieval augmentation (RAG) • Use evaluation filtering (truthfulness + reasoning score) ⸻ 📌 Citation If you use this dataset, please cite: @dataset{seedai150k, title={Seed AI 150K Recursive Training Package}, author={GODsStrongestSoldier}, year={2026}, url={https://huggingface.co/datasets/GODsStrongestSoldier/seed_ai_150k_package} } ⸻ 🚀 Final Note This dataset is part of a broader research direction into: structured reasoning augmentation for large language models It is not a standalone intelligence system, but a training layer
提供机构:
WithinUsAI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作