TeeA/ViText2SQL_CoT_ChartGPT
收藏Hugging Face2024-05-15 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/TeeA/ViText2SQL_CoT_ChartGPT
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: default
features:
- name: db_id
dtype: string
- name: query
dtype: string
- name: query_toks
sequence: string
- name: query_toks_no_value
sequence: string
- name: question
dtype: string
- name: question_toks
sequence: string
- name: sql
dtype: string
- name: schema
dtype: string
- name: gemini_response
dtype: string
- name: chatgpt_response
dtype: string
- name: chatgpt_cot
dtype: string
splits:
- name: train
num_bytes: 28103144
num_examples: 6831
- name: validation
num_bytes: 3420032
num_examples: 954
- name: test
num_bytes: 4680728
num_examples: 1908
download_size: 5441382
dataset_size: 36203904
- config_name: word-level
features:
- name: db_id
dtype: string
- name: query
dtype: string
- name: query_toks
sequence: string
- name: query_toks_no_value
sequence: string
- name: question
dtype: string
- name: question_toks
sequence: string
- name: sql
dtype: string
- name: schema
dtype: string
- name: chatgpt_response
dtype: string
- name: chatgpt_cot
dtype: string
splits:
- name: train
num_bytes: 27997364
num_examples: 6831
- name: validation
num_bytes: 3405221
num_examples: 954
- name: test
num_bytes: 4651874
num_examples: 1908
download_size: 5456923
dataset_size: 36054459
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
- config_name: word-level
data_files:
- split: train
path: word-level/train-*
- split: validation
path: word-level/validation-*
- split: test
path: word-level/test-*
---
The dataset includes two configurations: default and word-level. Each configuration contains features such as database ID, query, query tokens, query tokens without values, question, question tokens, SQL, schema, Gemini response, ChatGPT response, and ChatGPTs CoT (Chain of Thought). The dataset is divided into training, validation, and test sets, each with corresponding sizes and number of examples. The default configuration has 6831 examples in the training set, 954 examples in the validation set, and 1908 examples in the test set. The word-level configuration has the same number of examples in the training, validation, and test sets as the default configuration, but with different file paths.
提供机构:
TeeA
原始信息汇总
数据集概述
配置名称:default
-
特征信息:
- db_id: 字符串
- query: 字符串
- query_toks: 字符串序列
- query_toks_no_value: 字符串序列
- question: 字符串
- question_toks: 字符串序列
- sql: 字符串
- schema: 字符串
- gemini_response: 字符串
- chatgpt_response: 字符串
- chatgpt_cot: 字符串
-
数据分割:
- 训练集:6831个样本,28103144字节
- 验证集:954个样本,3420032字节
- 测试集:1908个样本,4680728字节
-
下载大小: 5441382字节
-
数据集大小: 36203904字节
配置名称:word-level
-
特征信息:
- db_id: 字符串
- query: 字符串
- query_toks: 字符串序列
- query_toks_no_value: 字符串序列
- question: 字符串
- question_toks: 字符串序列
- sql: 字符串
- schema: 字符串
- chatgpt_response: 字符串
- chatgpt_cot: 字符串
-
数据分割:
- 训练集:6831个样本,27997364字节
- 验证集:954个样本,3405221字节
- 测试集:1908个样本,4651874字节
-
下载大小: 5456923字节
-
数据集大小: 36054459字节



