金赋大模型预训练强化学习数据集

Name: 金赋大模型预训练强化学习数据集
Creator: 广东金赋科技股份有限公司
Published: 2025-06-05 00:00:00
License: 暂无描述

广东省数据知识产权存证登记平台2025-06-05 更新2025-07-05 收录

下载链接：

https://data.gpic.gd.cn/dataStorage/credentialInfo.jhtml?no=20250644000005458

下载链接

链接失效反馈

官方服务：

资源简介：

本数据集主要包括以下字段，id, questions, query, db_name, shcema。数据id为数据主键，用于数据查询。Questions为模型训练问题，主要用于训练中自然语言处理部分，本字段为字符串，作为问题的自然语言文字部分。Query 为模型训练文字问题对应的 SQL 代码，db_name 为训练题对应的数据库。 schema 用于解释数据库的数据结构。模型通过提取模型训练问题（Questions）生成SQL代码，样本检验通过模型生成代码和Query评估样本相关性和学习影响度。然后通过db_name选择高质量的数据进行模型强化学习。

This dataset primarily includes the following fields: id, questions, query, db_name, and schema. The data id serves as the primary key of the dataset for data querying. The 'questions' field contains model training questions, which are mainly utilized for the natural language processing (NLP) module during model training; this field is a string type that stores the natural language text of the questions. The 'query' field stores the SQL code corresponding to the training questions, while 'db_name' indicates the database associated with the training samples. The 'schema' field is used to explain the data structure of the corresponding database. Models generate SQL code by extracting relevant information from the training questions in the 'questions' field. For sample validation, the correlation between the code generated by the model and the reference 'query' field is used to evaluate sample relevance and learning impact. High-quality data can then be selected via the 'db_name' field for model reinforcement learning.

提供机构：

广东金赋科技股份有限公司

创建时间：

2025-06-05

搜集汇总

数据集介绍