five

financial-economics-reasoning

收藏
魔搭社区2025-12-05 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/financial-economics-reasoning
下载链接
链接失效反馈
官方服务:
资源简介:
# Model Card ## 📌 Summary financial-economics-reasoning dataset was constructed using advanced **Inference Distillation** techniques. We employed the **qwen-3-235b-a22b-thinking-2507** model as the Teacher Model to process the open-source **BAAI/IndustryInstruction_Finance-Economics** dataset, which contains **122,378** bilingual (Chinese-English) entries in finance, economics, and business. Unlike standard distillation datasets that only provide final answers, this dataset **retains the full reasoning chains (CoT) distilled from the Teacher Model**. For each instruction, the Teacher Model performed step-by-step reasoning and generated a complete reasoning trace along with the final distilled output. The distillation process aims to: - Filter out low-quality content - Enhance knowledge density and accuracy - Provide **reasoning-rich data** that supports training models with improved interpretability and reasoning ability - Enable downstream applications in finance, economics, and business with higher reliability --- ## ⚙️ Parameters - **Context Window:** 32,768 - **Temperature:** Default recommended setting of the model --- ## 📂 Data Source - **Original Dataset:** [BAAI/IndustryInstruction_Finance-Economics] - **Languages:** Chinese & English - **Domain:** Finance, Economics, Business --- ## 🛠️ Construction Method 1. **Teacher Model Selection:** qwen-3-235b-a22b-thinking-2507 2. **Step-by-Step Reasoning:** Each instruction was processed with deep reasoning 3. **Reasoning Chain Distillation:** The complete reasoning trace (not just the final answer) was distilled and preserved 4. **Quality Enhancement:** Removed redundant or low-quality content, improving both accuracy and reasoning clarity --- ## 🎯 Intended Use - Training or fine-tuning dialogue models in finance, economics, and business - Research on **reasoning distillation** and interpretability in large language models - Improving downstream performance in tasks such as Q&A, summarization, and reasoning-intensive applications --- ## ⚠️ Limitations & Considerations - Domain-specific: limited to finance, economics, and business - Distilled reasoning chains may inherit biases or errors from the Teacher Model - Not suitable for high-stakes fields such as medicine or law without further validation - Must comply with the license of the original dataset --- ## 📜 License - Follows the license of **BAAI/IndustryInstruction_Finance-Economics** - Distilled dataset is intended for research and non-commercial use; commercial use may require additional authorization --- ## 🔮 Future Work - Extend reasoning distillation to other domains (e.g., law, healthcare, technology) - Explore multi-teacher distillation for diversity and robustness - Incorporate automated evaluation metrics to ensure reliability of reasoning chains

# 模型卡片(Model Card) ## 📌 摘要(Summary) 金融经济推理数据集(financial-economics-reasoning dataset)采用先进的**推理蒸馏(Inference Distillation)**技术构建。我们选用**qwen-3-235b-a22b-thinking-2507**模型作为教师模型(Teacher Model),对开源的**BAAI/IndustryInstruction_Finance-Economics**数据集进行处理,该数据集包含122,378条金融、经济与商业领域的中英双语条目。 与仅提供最终答案的标准蒸馏数据集不同,本数据集**保留了从教师模型中蒸馏得到的完整推理链(Chain of Thought,CoT)**。针对每条指令,教师模型均会执行逐步推理,并生成完整的推理轨迹与最终蒸馏输出结果。 本次蒸馏流程旨在达成以下目标: - 过滤低质量内容 - 提升知识密度与准确性 - 提供**富含推理过程的数据**,助力训练具备更强可解释性与推理能力的模型 - 为金融、经济与商业领域的下游应用提供更高可靠性支撑 --- ## ⚙️ 参数(Parameters) - **上下文窗口(Context Window)**:32,768 - **温度系数(Temperature)**:采用模型推荐的默认参数设置 --- ## 📂 数据来源(Data Source) - **原始数据集**:[BAAI/IndustryInstruction_Finance-Economics] - **语言**:中文与英文 - **应用领域**:金融、经济、商业 --- ## 🛠️ 构建流程(Construction Method) 1. **教师模型选型**:qwen-3-235b-a22b-thinking-2507 2. **逐步推理**:对每条指令执行深度推理处理 3. **推理链蒸馏**:提取并保留完整的推理轨迹(而非仅最终答案) 4. **质量优化**:剔除冗余或低质量内容,提升结果准确性与推理清晰度 --- ## 🎯 预期用途(Intended Use) - 用于金融、经济与商业领域对话模型的训练或微调 - 开展大语言模型(Large Language Model,LLM)领域的**推理蒸馏**与可解释性相关研究 - 提升问答、摘要与高推理复杂度应用等下游任务的性能 --- ## ⚠️ 局限性与注意事项(Limitations & Considerations) - 领域局限性:仅适用于金融、经济与商业领域 - 蒸馏得到的推理链可能继承教师模型的偏见与错误 - 未经进一步验证的情况下,不适用于医疗、法律等高风险领域 - 需遵守原始数据集的许可协议 --- ## 📜 许可协议(License) - 遵循**BAAI/IndustryInstruction_Finance-Economics**的许可协议 - 本蒸馏数据集仅供研究与非商业用途;商业使用需获得额外授权 --- ## 🔮 未来工作(Future Work) - 将推理蒸馏技术拓展至其他领域(如法律、医疗、科技) - 探索多教师蒸馏技术以提升结果多样性与鲁棒性 - 引入自动化评估指标以保障推理链的可靠性
提供机构:
maas
创建时间:
2025-10-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作