hiltch/pandas-create-context
收藏Hugging Face2023-12-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/hiltch/pandas-create-context
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- text-generation
- question-answering
- table-question-answering
language:
- en
tags:
- pandas
- code
- NLP
- text-to-pandas
- context-pandas
- spider
- wikisql
pretty_name: pandas-create-context
size_categories:
- n<1K
---
#### Overview
This dataset is built from [sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context), which in itself builds from [WikiSQL](https://huggingface.co/datasets/wikisql) and [Spider](https://huggingface.co/datasets/spider).
I have used GPT4 to translate the SQL schema into pandas DataFrame schem initialization statements and to translate the SQL queries into pandas queries.
There are 862 examples of natural language queries, pandas DataFrame creation statements, and pandas query answering the question using the DataFrame creation statement as context. This dataset was built with text-to-pandas LLMs in mind.
#### TODO
- Further transform examples from sql_create_context
- Manually fix some examples that don't make sense
Random sample:
```json
{
"question": "What is the election year when the # of candidates nominated was 262?",
"context": "df = pd.DataFrame(columns=['election', '_number_of_candidates_nominated'])",
"answer": "df[df['_number_of_candidates_nominated'] == 262]['election'].count()"
},
{
"question": "What was the lowest # of total votes?",
"context": "df = pd.DataFrame(columns=['_number_of_total_votes'])",
"answer": "df['_number_of_total_votes'].min()"
},
```
This dataset contains 862 examples, each including a natural language query, a pandas DataFrame creation statement, and a pandas query using the creation statement as context. These examples were generated by translating SQL schemas and queries into pandas format using GPT4, designed for text-to-pandas language models.
提供机构:
hiltch
原始信息汇总
数据集概述
基本信息
- 许可证: cc-by-4.0
- 任务类别:
- 文本生成
- 问答
- 表格问答
- 语言: 英语
- 标签:
- pandas
- 代码
- NLP
- 文本到pandas
- 上下文pandas
- spider
- wikisql
- 数据集名称: pandas-create-context
- 数据集大小: n<1K
详细描述
- 该数据集基于sql-create-context构建,后者又基于WikiSQL和Spider。
- 使用GPT4将SQL模式翻译成pandas DataFrame模式初始化语句,并将SQL查询翻译成pandas查询。
- 包含862个自然语言查询示例、pandas DataFrame创建语句以及使用DataFrame创建语句作为上下文回答问题的pandas查询。
- 该数据集旨在用于文本到pandas的LLMs。
示例
json { "question": "What is the election year when the # of candidates nominated was 262?", "context": "df = pd.DataFrame(columns=[election, _number_of_candidates_nominated])", "answer": "df[df[_number_of_candidates_nominated] == 262][election].count()" }, { "question": "What was the lowest # of total votes?", "context": "df = pd.DataFrame(columns=[_number_of_total_votes])", "answer": "df[_number_of_total_votes].min()" }



