five

mlabonne/synthetic_text_to_sql-ShareGPT

收藏
Hugging Face2024-04-17 更新2024-04-19 收录
下载链接:
https://hf-mirror.com/datasets/mlabonne/synthetic_text_to_sql-ShareGPT
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: conversations list: - name: from dtype: string - name: value dtype: string splits: - name: train num_bytes: 78494888 num_examples: 105851 download_size: 31275284 dataset_size: 78494888 configs: - config_name: default data_files: - split: train path: data/train-* --- # synthetic_text_to_sql ShareGPT version of [gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql) using the following code: ```python from datasets import load_dataset, DatasetDict # Load the dataset dataset = load_dataset('gretelai/synthetic_text_to_sql', split='all') def format_sample(sample): conversations = [ { "from": "human", "value": f"{sample['sql_context']}\n\n{sample['sql_prompt']}" }, { "from": "gpt", "value": f"{sample['sql']}\n\n{sample['sql_explanation']}" } ] return {"conversations": conversations} dataset = dataset.map(format_sample, remove_columns=dataset.column_names) dataset = DatasetDict({'train': dataset}) ``` It means that the `sql_context` and `sql_prompt` fields are concatenated as user instruction, and the `sql` and `sql_explanation` are concatenated as answers. Ideally, we'd want to steer the model's answer by providing the explanations first. However, they're not phrased in a way that would make sense if they appeared before the code, which is why I decided to append them after the `sql` field. Let me know if you think that another formatting would be better.
提供机构:
mlabonne
原始信息汇总

数据集概述

数据集名称

synthetic_text_to_sql

数据集特征

  • 名称: conversations
    • 子特征:
      • 名称: from
        • 数据类型: string
      • 名称: value
        • 数据类型: string

数据集划分

  • 划分名称: train
    • 示例数量: 105851
    • 数据大小: 78494888字节

下载信息

  • 下载大小: 31275284字节
  • 数据集大小: 78494888字节

配置

  • 配置名称: default
    • 数据文件:
      • 划分: train
        • 路径: data/train-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作