synthetic_text_to_sql-ShareGPT
收藏魔搭社区2026-01-06 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/mlabonne/synthetic_text_to_sql-ShareGPT
下载链接
链接失效反馈官方服务:
资源简介:
# synthetic_text_to_sql
ShareGPT version of [gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql) using the following code:
```python
from datasets import load_dataset, DatasetDict
# Load the dataset
dataset = load_dataset('gretelai/synthetic_text_to_sql', split='all')
def format_sample(sample):
conversations = [
{
"from": "human",
"value": f"{sample['sql_context']}\n\n{sample['sql_prompt']}"
},
{
"from": "gpt",
"value": f"{sample['sql']}\n\n{sample['sql_explanation']}"
}
]
return {"conversations": conversations}
dataset = dataset.map(format_sample, remove_columns=dataset.column_names)
dataset = DatasetDict({'train': dataset})
```
It means that the `sql_context` and `sql_prompt` fields are concatenated as user instruction, and the `sql` and `sql_explanation` are concatenated as answers.
Ideally, we'd want to steer the model's answer by providing the explanations first.
However, they're not phrased in a way that would make sense if they appeared before the code, which is why I decided to append them after the `sql` field.
Let me know if you think that another formatting would be better.
# 合成文本转SQL
本实现为基于以下代码的ShareGPT格式版[gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql)数据集:
python
from datasets import load_dataset, DatasetDict
# 加载数据集
dataset = load_dataset('gretelai/synthetic_text_to_sql', split='all')
def format_sample(sample):
conversations = [
{
"from": "human",
"value": f"{sample['sql_context']}
{sample['sql_prompt']}"
},
{
"from": "gpt",
"value": f"{sample['sql']}
{sample['sql_explanation']}"
}
]
return {"conversations": conversations}
dataset = dataset.map(format_sample, remove_columns=dataset.column_names)
dataset = DatasetDict({'train': dataset})
该实现的拼接逻辑为:将`sql_context`与`sql_prompt`字段拼接作为用户指令,将`sql`与`sql_explanation`字段拼接作为模型回复。
理想情况下,我们希望通过先提供解释来引导模型的输出。但原解释的措辞无法在SQL代码之前合理呈现,因此本次实现将其追加至`sql`字段之后。
若您认为另有更优的格式方案,欢迎告知。
提供机构:
maas
创建时间:
2025-03-18



