five

recastai/sql-create-context-chatml

收藏
Hugging Face2024-03-13 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/recastai/sql-create-context-chatml
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 dataset_info: features: - name: messages list: - name: content dtype: string - name: role dtype: string splits: - name: train num_bytes: 78885727 num_examples: 78577 download_size: 7507566 dataset_size: 78885727 configs: - config_name: default data_files: - split: train path: data/train-* task_categories: - text2text-generation language: - en tags: - text-to-sql - chatml pretty_name: 'sql-create-context-chatml ' size_categories: - 10K<n<100K --- ## Dataset Summary This dataset has been created by **Re:cast AI** to extend the existing dataset [b-mc2/sql-create-context](https://website-name.com](https://huggingface.co/datasets/b-mc2/sql-create-context) into a [chatml](https://huggingface.co/docs/transformers/main/en/chat_templating) friendly format for use in SFT tasks with pretrained models. ## Dataset Structure ```python messages = [ {'content': "You are a powerful text-to-SQL AI assistant that helps users ... etc.", 'role': 'system'}, {'content': '(Optional) Context information is below ... etc.', 'role': 'user'}, {'content': 'SELECT COUNT(*) FROM head WHERE age > 56', 'role': 'assistant'} ] ``` ## Annotation Process Example of how the dataset was created, which you can alter to update the author's original dataset into a form suited to your needs. ```python INSTRUCTIONS = """You are a powerful text-to-SQL AI assistant that helps users interact with SQL databases. Your job is to answer questions about a database. You are given a user question or command and (optional) context regarding one or more tables. You must output the SQL query that answers the question. Some rules to follow: 1. Never directly reference the given context in your answer. 2. Avoid statements like 'Based on the context, ...' or 'The context information ...' or 'The answer to the user's query...' or anything along those lines. 3. You only respond with valid SQL to the user's query.""" def process_chatml_fn(example): user_content = ( "(Optional) Context information is below.\n" "----------------\n" f"{example['context']}\n" "----------------\n" "Given the context information and not prior knowledge, answer the following query.\n" f"{example['question']}\n" ) assistant_content = f"{example['answer']}" message = [ {"role": "system", "content": INSTRUCTIONS}, {"role": "user", "content": user_content}, {"role": "assistant", "content": assistant_content} ] return message ds = load_dataset("b-mc2/sql-create-context", split = "train") ds = ds.map(lambda x: {"messages": process_chatml_fn(x)}, remove_columns=ds.features) # Conform to chatml format ``` ## Usage ```python from datasets import load_dataset dataset = load_dataset("recastai/sql-create-context-chatml") ```
提供机构:
recastai
原始信息汇总

数据集概述

基本信息

  • 许可证: cc-by-4.0
  • 数据集名称: sql-create-context-chatml
  • 数据集大小: 78885727字节
  • 下载大小: 7507566字节
  • 训练集大小: 78577个样本,78885727字节

数据结构

  • 特征:
    • messages:
      • content (字符串类型)
      • role (字符串类型)
  • 分割:
    • train: 78577个样本

配置

  • 默认配置:
    • 数据文件路径: data/train-*

任务与语言

  • 任务类别: text2text-generation
  • 语言: 英语
  • 标签: text-to-sql, chatml
  • 美观名称: sql-create-context-chatml
  • 大小类别: 10K<n<100K

数据集创建

  • 创建目的: 扩展b-mc2/sql-create-context数据集,使其适合于chatml格式的SFT任务。
  • 创建方法: 使用特定规则处理原始数据集,转换为chatml格式。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作