five

synthetic_text_to_sql-ShareGPT

收藏
魔搭社区2026-01-06 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/mlabonne/synthetic_text_to_sql-ShareGPT
下载链接
链接失效反馈
官方服务:
资源简介:
# synthetic_text_to_sql ShareGPT version of [gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql) using the following code: ```python from datasets import load_dataset, DatasetDict # Load the dataset dataset = load_dataset('gretelai/synthetic_text_to_sql', split='all') def format_sample(sample): conversations = [ { "from": "human", "value": f"{sample['sql_context']}\n\n{sample['sql_prompt']}" }, { "from": "gpt", "value": f"{sample['sql']}\n\n{sample['sql_explanation']}" } ] return {"conversations": conversations} dataset = dataset.map(format_sample, remove_columns=dataset.column_names) dataset = DatasetDict({'train': dataset}) ``` It means that the `sql_context` and `sql_prompt` fields are concatenated as user instruction, and the `sql` and `sql_explanation` are concatenated as answers. Ideally, we'd want to steer the model's answer by providing the explanations first. However, they're not phrased in a way that would make sense if they appeared before the code, which is why I decided to append them after the `sql` field. Let me know if you think that another formatting would be better.

# 合成文本转SQL 本实现为基于以下代码的ShareGPT格式版[gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql)数据集: python from datasets import load_dataset, DatasetDict # 加载数据集 dataset = load_dataset('gretelai/synthetic_text_to_sql', split='all') def format_sample(sample): conversations = [ { "from": "human", "value": f"{sample['sql_context']} {sample['sql_prompt']}" }, { "from": "gpt", "value": f"{sample['sql']} {sample['sql_explanation']}" } ] return {"conversations": conversations} dataset = dataset.map(format_sample, remove_columns=dataset.column_names) dataset = DatasetDict({'train': dataset}) 该实现的拼接逻辑为:将`sql_context`与`sql_prompt`字段拼接作为用户指令,将`sql`与`sql_explanation`字段拼接作为模型回复。 理想情况下,我们希望通过先提供解释来引导模型的输出。但原解释的措辞无法在SQL代码之前合理呈现,因此本次实现将其追加至`sql`字段之后。 若您认为另有更优的格式方案,欢迎告知。
提供机构:
maas
创建时间:
2025-03-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作