TeeA/ViText2SQL_CoT_ChartGPT

Name: TeeA/ViText2SQL_CoT_ChartGPT
Creator: TeeA
Published: 2024-05-15 08:55:09
License: 暂无描述

Hugging Face2024-05-15 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/TeeA/ViText2SQL_CoT_ChartGPT

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: - config_name: default features: - name: db_id dtype: string - name: query dtype: string - name: query_toks sequence: string - name: query_toks_no_value sequence: string - name: question dtype: string - name: question_toks sequence: string - name: sql dtype: string - name: schema dtype: string - name: gemini_response dtype: string - name: chatgpt_response dtype: string - name: chatgpt_cot dtype: string splits: - name: train num_bytes: 28103144 num_examples: 6831 - name: validation num_bytes: 3420032 num_examples: 954 - name: test num_bytes: 4680728 num_examples: 1908 download_size: 5441382 dataset_size: 36203904 - config_name: word-level features: - name: db_id dtype: string - name: query dtype: string - name: query_toks sequence: string - name: query_toks_no_value sequence: string - name: question dtype: string - name: question_toks sequence: string - name: sql dtype: string - name: schema dtype: string - name: chatgpt_response dtype: string - name: chatgpt_cot dtype: string splits: - name: train num_bytes: 27997364 num_examples: 6831 - name: validation num_bytes: 3405221 num_examples: 954 - name: test num_bytes: 4651874 num_examples: 1908 download_size: 5456923 dataset_size: 36054459 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* - config_name: word-level data_files: - split: train path: word-level/train-* - split: validation path: word-level/validation-* - split: test path: word-level/test-* ---

The dataset includes two configurations: default and word-level. Each configuration contains features such as database ID, query, query tokens, query tokens without values, question, question tokens, SQL, schema, Gemini response, ChatGPT response, and ChatGPTs CoT (Chain of Thought). The dataset is divided into training, validation, and test sets, each with corresponding sizes and number of examples. The default configuration has 6831 examples in the training set, 954 examples in the validation set, and 1908 examples in the test set. The word-level configuration has the same number of examples in the training, validation, and test sets as the default configuration, but with different file paths.

提供机构：

TeeA

原始信息汇总

数据集概述

配置名称：default

特征信息：
- db_id: 字符串
- query: 字符串
- query_toks: 字符串序列
- query_toks_no_value: 字符串序列
- question: 字符串
- question_toks: 字符串序列
- sql: 字符串
- schema: 字符串
- gemini_response: 字符串
- chatgpt_response: 字符串
- chatgpt_cot: 字符串
数据分割：
- 训练集：6831个样本，28103144字节
- 验证集：954个样本，3420032字节
- 测试集：1908个样本，4680728字节
下载大小： 5441382字节
数据集大小： 36203904字节

配置名称：word-level

特征信息：
- db_id: 字符串
- query: 字符串
- query_toks: 字符串序列
- query_toks_no_value: 字符串序列
- question: 字符串
- question_toks: 字符串序列
- sql: 字符串
- schema: 字符串
- chatgpt_response: 字符串
- chatgpt_cot: 字符串
数据分割：
- 训练集：6831个样本，27997364字节
- 验证集：954个样本，3405221字节
- 测试集：1908个样本，4651874字节
下载大小： 5456923字节
数据集大小： 36054459字节

5,000+

优质数据集

54 个

任务类型

进入经典数据集