financial-economics-reasoning
收藏魔搭社区2025-12-05 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/financial-economics-reasoning
下载链接
链接失效反馈官方服务:
资源简介:
# Model Card
## 📌 Summary
financial-economics-reasoning dataset was constructed using advanced **Inference Distillation** techniques. We employed the **qwen-3-235b-a22b-thinking-2507** model as the Teacher Model to process the open-source **BAAI/IndustryInstruction_Finance-Economics** dataset, which contains **122,378** bilingual (Chinese-English) entries in finance, economics, and business.
Unlike standard distillation datasets that only provide final answers, this dataset **retains the full reasoning chains (CoT) distilled from the Teacher Model**. For each instruction, the Teacher Model performed step-by-step reasoning and generated a complete reasoning trace along with the final distilled output.
The distillation process aims to:
- Filter out low-quality content
- Enhance knowledge density and accuracy
- Provide **reasoning-rich data** that supports training models with improved interpretability and reasoning ability
- Enable downstream applications in finance, economics, and business with higher reliability
---
## ⚙️ Parameters
- **Context Window:** 32,768
- **Temperature:** Default recommended setting of the model
---
## 📂 Data Source
- **Original Dataset:** [BAAI/IndustryInstruction_Finance-Economics]
- **Languages:** Chinese & English
- **Domain:** Finance, Economics, Business
---
## 🛠️ Construction Method
1. **Teacher Model Selection:** qwen-3-235b-a22b-thinking-2507
2. **Step-by-Step Reasoning:** Each instruction was processed with deep reasoning
3. **Reasoning Chain Distillation:** The complete reasoning trace (not just the final answer) was distilled and preserved
4. **Quality Enhancement:** Removed redundant or low-quality content, improving both accuracy and reasoning clarity
---
## 🎯 Intended Use
- Training or fine-tuning dialogue models in finance, economics, and business
- Research on **reasoning distillation** and interpretability in large language models
- Improving downstream performance in tasks such as Q&A, summarization, and reasoning-intensive applications
---
## ⚠️ Limitations & Considerations
- Domain-specific: limited to finance, economics, and business
- Distilled reasoning chains may inherit biases or errors from the Teacher Model
- Not suitable for high-stakes fields such as medicine or law without further validation
- Must comply with the license of the original dataset
---
## 📜 License
- Follows the license of **BAAI/IndustryInstruction_Finance-Economics**
- Distilled dataset is intended for research and non-commercial use; commercial use may require additional authorization
---
## 🔮 Future Work
- Extend reasoning distillation to other domains (e.g., law, healthcare, technology)
- Explore multi-teacher distillation for diversity and robustness
- Incorporate automated evaluation metrics to ensure reliability of reasoning chains
# 模型卡片(Model Card)
## 📌 摘要(Summary)
金融经济推理数据集(financial-economics-reasoning dataset)采用先进的**推理蒸馏(Inference Distillation)**技术构建。我们选用**qwen-3-235b-a22b-thinking-2507**模型作为教师模型(Teacher Model),对开源的**BAAI/IndustryInstruction_Finance-Economics**数据集进行处理,该数据集包含122,378条金融、经济与商业领域的中英双语条目。
与仅提供最终答案的标准蒸馏数据集不同,本数据集**保留了从教师模型中蒸馏得到的完整推理链(Chain of Thought,CoT)**。针对每条指令,教师模型均会执行逐步推理,并生成完整的推理轨迹与最终蒸馏输出结果。
本次蒸馏流程旨在达成以下目标:
- 过滤低质量内容
- 提升知识密度与准确性
- 提供**富含推理过程的数据**,助力训练具备更强可解释性与推理能力的模型
- 为金融、经济与商业领域的下游应用提供更高可靠性支撑
---
## ⚙️ 参数(Parameters)
- **上下文窗口(Context Window)**:32,768
- **温度系数(Temperature)**:采用模型推荐的默认参数设置
---
## 📂 数据来源(Data Source)
- **原始数据集**:[BAAI/IndustryInstruction_Finance-Economics]
- **语言**:中文与英文
- **应用领域**:金融、经济、商业
---
## 🛠️ 构建流程(Construction Method)
1. **教师模型选型**:qwen-3-235b-a22b-thinking-2507
2. **逐步推理**:对每条指令执行深度推理处理
3. **推理链蒸馏**:提取并保留完整的推理轨迹(而非仅最终答案)
4. **质量优化**:剔除冗余或低质量内容,提升结果准确性与推理清晰度
---
## 🎯 预期用途(Intended Use)
- 用于金融、经济与商业领域对话模型的训练或微调
- 开展大语言模型(Large Language Model,LLM)领域的**推理蒸馏**与可解释性相关研究
- 提升问答、摘要与高推理复杂度应用等下游任务的性能
---
## ⚠️ 局限性与注意事项(Limitations & Considerations)
- 领域局限性:仅适用于金融、经济与商业领域
- 蒸馏得到的推理链可能继承教师模型的偏见与错误
- 未经进一步验证的情况下,不适用于医疗、法律等高风险领域
- 需遵守原始数据集的许可协议
---
## 📜 许可协议(License)
- 遵循**BAAI/IndustryInstruction_Finance-Economics**的许可协议
- 本蒸馏数据集仅供研究与非商业用途;商业使用需获得额外授权
---
## 🔮 未来工作(Future Work)
- 将推理蒸馏技术拓展至其他领域(如法律、医疗、科技)
- 探索多教师蒸馏技术以提升结果多样性与鲁棒性
- 引入自动化评估指标以保障推理链的可靠性
提供机构:
maas
创建时间:
2025-10-17



