five

horelulus/DeepSeek_0528_8B_Legal_Distill

收藏
Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/horelulus/DeepSeek_0528_8B_Legal_Distill
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - text-generation language: - en tags: - legal - distillation - grpo - deepseek - rlhf - reinforcement-learning --- # ⚖️ DeepSeek-0528-8B Legal Distill Dataset This repository contains a high-density trajectory dataset generated during the **GRPO (Group Relative Policy Optimization)** training of the **DeepSeek-8B** architecture. It is specifically optimized for advanced **Knowledge Distillation** and structural legal reasoning. 🚀 ## 💡 The Concept: "Log-as-Distillation" Traditional training often treats logs as temporary metadata. This dataset flips that script. By capturing the **multi-generation groups** produced during GRPO and pairing them with **complex, multi-dimensional reward scores**, we provide a ready-made "map" of model behavior. 🗺️ Instead of just seeing the "best" answer, you see the variety of attempts the model made and exactly why certain outputs were favored over others. This allows for: * 🎯 **Precision Filtering:** Users can extract only the highest-scoring reasoning paths. * 📉 **Negative Constraint Learning:** Analyze low-scoring outputs to understand and prevent common legal hallucinations. ## 🛠️ Complex Scoring Architecture Unlike simpler models, the **DeepSeek-0528-8B** run utilized a sophisticated reward ensemble to evaluate each generation: 1. **Legal Accuracy Reward:** Measures alignment with statutory references and regulatory language. 📜 2. **Structural Format Reward:** Ensures the model adheres to strict markdown or JSON schemas required for legal tech integration. 🏗️ 3. **Logical Consistency Reward:** Evaluates the internal "Chain of Thought" (CoT) for contradictions. 🧠 4. **Length & Verbosity Penalty:** Incentivizes concise, high-impact legal advice. ✂️ ## ☁️ Cloud-to-Cloud (C2C) Pipeline This dataset was built using a seamless, automated workflow: * **Infrastructure:** Orchestrated across high-performance cloud platforms. ⚡ * **Direct Sync:** A specialized pipeline pulls the base weights and pushes the resulting trajectory logs directly to **Hugging Face** in real-time. 🔄 * **Integrity:** Developed using legitimate developer methods, ensuring high-quality data lineage and zero-noise acquisition. ✅ ## 📂 Dataset Structure Each entry includes: * **Prompt:** The legal inquiry or regulatory task. * **Generation Group:** A collection of $N$ completions sampled for relative advantage. * **Weighted Rewards:** A detailed breakdown of the multi-complex scores for each completion. * **Model Metadata:** Checkpoint information from the DeepSeek-0528-8B training run. ## 🧪 Use Cases * **Student Distillation:** Train smaller models (1B–3B) to mimic the 8B model's complex reasoning. 🎓 * **RLHF Research:** Test new reward functions against pre-existing model trajectories. 🔬 * **Legal RAG Refinement:** Improve the "reasoning" step in Retrieval-Augmented Generation pipelines. 🔍 ## 📜 License & Attribution This dataset is licensed under the **Creative Commons Attribution 4.0 International (CC BY 4.0)**. 📝 ### Attribution 1. **Dataset Curator:** Azzindani (via Hugging Face Datasets). 2. **Base Architecture:** DeepSeek-AI. --- **Disclaimer:** *These generations are byproducts of an experimental RL run. Users should perform their own safety and fact-checking audits before deploying distilled models in production legal environments.* ⚠️ ---

许可协议: CC BY 4.0 任务类别: - 文本生成 语言: - 英语 标签: - 法律 - 知识蒸馏 - GRPO (Group Relative Policy Optimization) - DeepSeek - RLHF (Reinforcement Learning from Human Feedback) - 强化学习 # ⚖️ DeepSeek-0528-8B 法律蒸馏数据集 本仓库包含**DeepSeek-8B**架构在**GRPO (Group Relative Policy Optimization,群体相对策略优化)**训练过程中生成的高密度轨迹数据集,专为进阶知识蒸馏与结构化法律推理优化设计。🚀 ## 💡 核心理念:「日志即蒸馏数据」(Log-as-Distillation) 传统训练通常将日志视为临时元数据,本数据集颠覆了这一思路。通过捕获GRPO训练过程中产生的多生成候选组,并将其与复杂多维奖励评分配对,我们为模型行为提供了一份现成的「行为图谱」。🗺️ 用户不再仅能看到「最优」答案,还能了解模型尝试过的各类推理路径,以及特定输出被优先选中的具体原因。这可实现以下应用: * 🎯 **精准筛选:** 用户可仅提取得分最高的推理路径。 * 📉 **负约束学习:** 分析低分输出,以理解并规避常见的法律幻觉问题。 ## 🛠️ 复杂评分体系 与简化模型不同,本次**DeepSeek-0528-8B**训练采用了一套精密的奖励集成系统对每一次生成结果进行评估: 1. **法律准确性奖励:** 衡量输出与法定条文及监管语言的对齐程度。📜 2. **结构化格式奖励:** 确保模型输出符合法律科技集成所需的严格Markdown或JSON Schema规范。🏗️ 3. **逻辑一致性奖励:** 评估模型内部「思维链(Chain of Thought, CoT)」是否存在矛盾。🧠 4. **长度与冗余惩罚:** 鼓励生成简洁且高价值的法律建议。✂️ ## ☁️ 云间(C2C, Cloud-to-Cloud)流水线 本数据集通过一套无缝自动化工作流构建: * **基础设施:** 跨高性能云平台进行编排。⚡ * **直接同步:** 专属流水线拉取基础权重,并将生成的轨迹日志实时直接推送至**Hugging Face**平台。🔄 * **数据完整性:** 采用合规开发者方法构建,确保数据溯源清晰且无噪声污染。✅ ## 📂 数据集结构 每条数据条目包含以下内容: * **提示词(Prompt):** 法律问询或监管任务。 * **生成候选组(Generation Group):** 为评估相对优势而采样得到的N个补全结果集合。 * **加权奖励:** 针对每个补全结果的多维度复杂评分明细。 * **模型元数据:** DeepSeek-0528-8B训练轮次的检查点信息。 ## 🧪 应用场景 * **学生模型蒸馏:** 训练小型模型(1B–3B参数)以复刻8B模型的复杂推理能力。🎓 * **RLHF研究:** 基于已有模型轨迹测试新型奖励函数。🔬 * **法律RAG(Retrieval-Augmented Generation,检索增强生成)优化:** 改进检索增强生成流水线中的「推理」环节。🔍 ## 📜 许可协议与署名声明 本数据集采用**CC BY 4.0(Creative Commons Attribution 4.0 International,知识共享署名4.0国际许可协议)**进行授权。📝 ### 署名要求 1. **数据集维护者:** Azzindani(通过Hugging Face Datasets发布)。 2. **基础架构方:** DeepSeek-AI。 --- **免责声明:** *本数据集生成结果为实验性强化学习运行的副产品。用户在将蒸馏后的模型部署至生产级法律环境前,应自行开展安全性与事实核查审计。* ⚠️
提供机构:
horelulus
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作