DianJin-R1-Data

Name: DianJin-R1-Data
Creator: maas
Published: 2026-05-15 20:20:17
License: 暂无描述

魔搭社区2026-05-15 更新2025-04-26 收录

下载链接：

https://modelscope.cn/datasets/tongyi_dianjin/DianJin-R1-Data

下载链接

链接失效反馈

官方服务：

资源简介：

## DianJin-R1-Data <div align="center"> <img alt="image" src="https://raw.githubusercontent.com/aliyun/qwen-dianjin/refs/heads/master/images/dianjin_logo.png"> <p align="center"> <a href="https://tongyi.aliyun.com/dianjin">Qwen DianJin Platform</a> | <a href="https://github.com/aliyun/qwen-dianjin">Github</a> | <a href="https://modelscope.cn/organization/tongyi_dianjin">ModelScope</a> </p> </div> ### Introduction We propose DianJin-R1, a novel framework that enhances financial reasoning in LLMs through reasoning-augmented supervision and reinforcement learning. Central to our approach is DianJin-R1-Data, a high-quality dataset constructed from CFLUE, FinQA, and a proprietary compliance corpus (Chinese Compliance Check, CCC), combining diverse financial reasoning scenarios with verified annotations. We adopt a structured training paradigm where models generate reasoning steps and final answers using supervised fine-tuning. To further improve reasoning quality, we use Group Relative Policy Optimization (GRPO), a reinforcement learning algorithm that incorporates dual reward signals for output structure and answer accuracy. \ \ We open-source enhanced versions of CFLUE and Fin-QA datasets. However, due to sensitivity concerns, CCC scenario data will not be made publicly available. <div align="center"> <img alt="image" src="https://github.com/aliyun/qwen-dianjin/blob/master/DianJin-R1/images/2-step-training.png?raw=true"> </div> #### CFLUE It is an open-source Chinese benchmark designed to assess the performance of LLMs on a variety of natural language processing (NLP) tasks within the financial domain. Our enhanced versions of CFLUE includes two parts: \ \ **Multiple-Choice Questions** ($CFLUE_{MCQ}$): We leverage DeepSeek-R1, a model known for its strong reasoning capabilities, to generate a chain-of- thought (CoT) along with a predicted answer. Then, we verify the predicted answers by comparing them with the glod answers and select the correct ones to construct this dataset.\ \ **Open-ended Questions** ($CFLUE_{OE}$): First, we begin by using GPT-4o to convert each multiple-choice question of CFLUE into an open-ended format. Then, we leverage DeepSeek-R1 to generate a chain-of- thought (CoT) along with a predicted answer. Finally, we employ GPT-4o as a verifier to assess two key aspects of the generated output: (1) whether the predicted answer matches the gold answer, and (2) whether the generated reasoning is consistent with the reference explanation ei. If both conditions are satisfied, we retain the instance as a valid reasoning sample. #### Fin-QA It is an open-source English benchmark containing 8,281 financial question-answer pairs that require numerical reasoning over financial reports. \ \ Different from instances in CFLUE, the QA pairs in FinQA are already in an open-ended format. We leverage DeepSeek-R1 to generate a chain-of- thought (CoT) along with a predicted answer, then use GPT-4o to verify the answers and select the correct ones to construct this dataset.

## DianJin-R1-Data <div align="center"> <img alt="image" src="https://raw.githubusercontent.com/aliyun/qwen-dianjin/refs/heads/master/images/dianjin_logo.png"> <p align="center"> <a href="https://tongyi.aliyun.com/dianjin">Qwen DianJin平台</a> | <a href="https://github.com/aliyun/qwen-dianjin">Github</a> | <a href="https://modelscope.cn/organization/tongyi_dianjin">ModelScope</a> </p> </div> ### 引言我们提出了DianJin-R1这一新颖框架，通过推理增强型监督与强化学习，提升大语言模型（Large Language Model，LLM）的金融推理能力。本方案的核心是DianJin-R1-Data数据集，该数据集由CFLUE、FinQA以及专有合规语料库（Chinese Compliance Check，CCC）构建而成，整合了多样化的金融推理场景与经过校验的标注数据。我们采用结构化训练范式，让模型通过监督微调生成推理步骤与最终答案。为进一步提升推理质量，我们使用了组相对策略优化（Group Relative Policy Optimization，GRPO）——这是一种结合了输出结构与答案准确性双重奖励信号的强化学习算法。我们开源了CFLUE与FinQA数据集的增强版本。但出于敏感性考量，CCC相关场景数据将不会公开。 <div align="center"> <img alt="image" src="https://github.com/aliyun/qwen-dianjin/blob/master/DianJin-R1/images/2-step-training.png?raw=true"> </div> #### CFLUE 它是一个开源中文基准测试集，用于评估大语言模型（LLM）在金融领域各类自然语言处理（Natural Language Processing，NLP）任务中的性能。我们推出的CFLUE增强版本包含两部分： **单项选择题（$CFLUE_{MCQ}$）**：我们借助推理能力出众的DeepSeek-R1模型生成思维链（Chain-of-Thought，CoT）与预测答案，随后将预测答案与标准答案进行比对以校验答案正确性，筛选出有效样本构建该数据集。 **开放式问答题（$CFLUE_{OE}$）**：首先，我们使用GPT-4o将CFLUE中的所有单项选择题转换为开放式题型；随后借助DeepSeek-R1生成思维链（CoT）与预测答案；最后以GPT-4o作为校验器，从两个维度评估生成结果：(1) 预测答案是否与标准答案一致；(2) 生成的推理过程是否与参考解释$e_i$相符。若两项条件均满足，则保留该样本作为有效推理数据集。 #### FinQA FinQA是一个开源英文基准测试集，包含8281道需对金融报告进行数值推理的金融问答对。与CFLUE中的样本不同，FinQA中的问答对本身即为开放式格式。我们借助DeepSeek-R1生成思维链（CoT）与预测答案，随后通过GPT-4o校验答案正确性，筛选有效样本构建该数据集。

提供机构：

maas

创建时间：

2025-04-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集