R1金融推理思维链数据集500K
收藏魔搭社区2026-05-30 更新2025-02-22 收录
下载链接:
https://modelscope.cn/datasets/IngeniusAI/Finance_R1-Distill_Data
下载链接
链接失效反馈官方服务:
资源简介:
# Finance R1 Distill Dataset
金融领域复杂问题思维链数据集项目。通过 Chain of Thought (CoT) 蒸馏方法从大规模语料中提取金融领域知识。
## 数据来源
- 基于 Finance-Instruct-500k 金融领域指令对话数据集
- 通过 DeepSeek-R1 大模型进行思维链推理蒸馏
- 原始数据集包含超过50万条金融领域的高质量数据条目
- 数据持续更新中,当前蒸馏进度约 2385+ 条目
## 原始数据特点
Finance-Instruct-500k 数据集整合了多个高质量金融数据集,包括:
- 金融问答与推理
- 实体识别与情感分析
- 多轮对话与指令示例
- XBRL标记和命名实体识别
- 多语言自然语言处理任务
主要数据来源:
- BAAI/IndustryInstruction_Finance-Economics
- Josephgflowers/Financial-NER-NLP
- Sujet-Finance-Instruct-177k
- 其他金融领域高质量数据集
## 数据格式
每条记录包含以下字段:
```json
{
"id": "唯一标识ID",
"user_input": "原始金融问题",
"reasoning_content": "大模型思维推理过程",
"answer_r1": "最终回答结果",
"created_by": "Ingenius_AI",
"contact": "Ingenius AI 公众号"
}
```
## 技术方案
1. 数据预处理:
- 从 JSON 输入文件加载原始金融问答数据
- 按批次(每1000条)进行处理以控制规模
2. 思维链蒸馏:
- 使用 DeepSeek-R1 模型进行推理
- 对每个问题生成详细的推理过程(reasoning_content)
- 生成最终答案(answer_r1)
3. 数据保存:
- 使用 JSONL 格式保存处理结果
- 按批次分文件存储,便于管理大规模数据
## 数据特点
- 聚焦金融领域专业问题
- 包含详细的推理思维链过程
- 由大模型蒸馏生成,质量可控
- 支持金融QA、推理、多轮对话等多种任务
- 持续更新扩充中
## 下载方法
数据集文件元信息以及数据文件,请浏览"数据集文件"页面获取。
您可以通过如下方式下载数据集:
:modelscope-code[]{type="sdk"}
:modelscope-code[]{type="git"}
## 引用方式
如果您使用了本数据集,请按如下格式引用:
```bibtex
@dataset{ingeniusai2025finance,
title={Finance_R1-Distill_Data},
author={IngeniusAI},
year={2025},
publisher={ModelScope}
}
```
## 联系方式
关注「Ingenius AI」公众号获取最新项目进展。
## License
该数据集基于 Apache License 2.0 协议开源,仅供学术研究使用。
# Finance R1 Distill Dataset
This is a Chain of Thought (CoT) distillation dataset project focused on complex financial domain problems. Financial domain knowledge is extracted from large-scale corpora using CoT distillation methods.
## Data Sources
- Based on the Finance-Instruct-500k financial domain instruction dialogue dataset
- Conducted chain-of-thought reasoning distillation via the DeepSeek-R1 Large Language Model (LLM)
- The original dataset contains over 500,000 high-quality financial data entries
- The dataset is under continuous updates, with approximately 2,385+ distilled entries completed so far.
## Original Data Characteristics
The Finance-Instruct-500k dataset integrates multiple high-quality financial datasets, including:
- Financial QA and reasoning
- Entity recognition and sentiment analysis
- Multi-turn dialogue and instruction examples
- XBRL tagging and named entity recognition
- Multilingual natural language processing tasks
Main data sources:
- BAAI/IndustryInstruction_Finance-Economics
- Josephgflowers/Financial-NER-NLP
- Sujet-Finance-Instruct-177k
- Other high-quality financial domain datasets
## Data Format
Each record contains the following fields:
json
{
"id": "Unique identifier ID",
"user_input": "Original financial question",
"reasoning_content": "LLM's chain-of-thought reasoning process",
"answer_r1": "Final answer result",
"created_by": "Ingenius_AI",
"contact": "Ingenius AI Official Account"
}
## Technical Framework
1. Data Preprocessing:
- Load raw financial QA data from JSON input files
- Process data in batches (1000 entries per batch) to control scale
2. Chain of Thought Distillation:
- Use the DeepSeek-R1 LLM for reasoning
- Generate detailed reasoning processes (reasoning_content) for each question
- Generate final answers (answer_r1)
3. Data Storage:
- Save processed results in JSONL format
- Store data in batch-separated files for convenient management of large-scale datasets
## Dataset Characteristics
- Focused on professional financial domain issues
- Contains detailed chain-of-thought reasoning processes
- Generated via LLM distillation with controllable quality
- Supports various tasks including financial QA, reasoning, multi-turn dialogue, etc.
- Under continuous expansion and update
## Download Method
For metadata and data files of the dataset, please browse the "Dataset Files" page.
You can download the dataset via the following methods:
:modelscope-code[]{type="sdk"}
:modelscope-code[]{type="git"}
## Citation
If you use this dataset, please cite it in the following format:
bibtex
@dataset{ingeniusai2025finance,
title={Finance_R1-Distill_Data},
author={IngeniusAI},
year={2025},
publisher={ModelScope}
}
## Contact Information
Follow the "Ingenius AI" official account for the latest project updates.
## License
This dataset is open-sourced under the Apache License 2.0, for academic research use only.
提供机构:
maas
创建时间:
2025-02-13
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个金融领域的思维链推理数据集,基于Finance-Instruct-500k金融指令对话数据,通过DeepSeek-R1大模型进行思维链蒸馏生成,包含超过50万条高质量条目(当前蒸馏进度约2385+条)。其特点在于专注于金融专业问题,提供详细的思维链推理过程和最终答案,支持金融问答、推理和多轮对话等任务,并持续更新,适用于学术研究。
以上内容由遇见数据集搜集并总结生成



