mlabonne/synthetic_text_to_sql-ShareGPT

Name: mlabonne/synthetic_text_to_sql-ShareGPT
Creator: mlabonne
Published: 2024-04-17 10:38:33
License: 暂无描述

Hugging Face2024-04-17 更新2024-04-19 收录

下载链接：

https://hf-mirror.com/datasets/mlabonne/synthetic_text_to_sql-ShareGPT

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: conversations list: - name: from dtype: string - name: value dtype: string splits: - name: train num_bytes: 78494888 num_examples: 105851 download_size: 31275284 dataset_size: 78494888 configs: - config_name: default data_files: - split: train path: data/train-* --- # synthetic_text_to_sql ShareGPT version of [gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql) using the following code: ```python from datasets import load_dataset, DatasetDict # Load the dataset dataset = load_dataset('gretelai/synthetic_text_to_sql', split='all') def format_sample(sample): conversations = [ { "from": "human", "value": f"{sample['sql_context']}\n\n{sample['sql_prompt']}" }, { "from": "gpt", "value": f"{sample['sql']}\n\n{sample['sql_explanation']}" } ] return {"conversations": conversations} dataset = dataset.map(format_sample, remove_columns=dataset.column_names) dataset = DatasetDict({'train': dataset}) ``` It means that the `sql_context` and `sql_prompt` fields are concatenated as user instruction, and the `sql` and `sql_explanation` are concatenated as answers. Ideally, we'd want to steer the model's answer by providing the explanations first. However, they're not phrased in a way that would make sense if they appeared before the code, which is why I decided to append them after the `sql` field. Let me know if you think that another formatting would be better.

提供机构：

mlabonne

原始信息汇总

数据集概述

数据集名称

synthetic_text_to_sql

数据集特征

名称: conversations
- 子特征:
  - 名称: from
    - 数据类型: string
  - 名称: value
    - 数据类型: string

数据集划分

划分名称: train
- 示例数量: 105851
- 数据大小: 78494888字节

下载信息

下载大小: 31275284字节
数据集大小: 78494888字节

配置

配置名称: default
- 数据文件:
  - 划分: train
    - 路径: data/train-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集