five

iamplus/Instruction_Tuning

收藏
Hugging Face2023-05-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/iamplus/Instruction_Tuning
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含多个子数据集,主要用于指令调优、文章摘要、邮件回复、邮件线程摘要、模型失败案例、身份识别、代码生成、角色扮演、生物学、化学、物理学、数学、金融问答、翻译、自动推理链(COT)等领域。数据集来源包括ChatGPT API、GPT-4 API、以及多个公开数据集。具体数据集包括:IAMAI的种子任务、指令调优数据集、文章摘要数据集、邮件摘要数据集、邮件回复数据集、邮件线程摘要数据集、模型失败案例数据集、身份识别数据集、ChatGPT提示数据集、Stanford Alpaca指令调优数据集、代码生成数据集、ColossalChat指令调优数据集、Laion高质量指令调优数据集、Databricks Dolly人类创建指令调优数据集、GPT-4指令数据集、GPT-4角色扮演数据集、生物学指令数据集、化学指令数据集、物理学指令数据集、数学指令数据集、金融问答指令数据集、翻译指令数据集、自动推理链(COT)指令数据集等。
提供机构:
iamplus
原始信息汇总

数据集概述

主要数据集

  1. iamai_seed_tasks_v1.csv

    • 内容: IAMAIs seed tasks - Version 1
    • 大小: 879
  2. iamai_v1.csv

    • 内容: Instruction Tuning Dataset collected using seeds from iamai_seed_tasks_v1.csv and ChatGPT API for both prompts and outputs
    • 大小: ~248k
  3. iamai_summarization_v1.csv

    • 内容: Article Summarization dataset (both prompts and outputs) collected using ChatGPT API
    • 大小: ~1.2k
  4. iamai_email_summarization.csv

    • 内容: Email Summarization dataset (both prompts and outputs) collected using ChatGPT API
    • 大小: ~14k
  5. iamai_email_reply_v1.csv

    • 内容: Instruction Tuning Dataset for Email Replying, used ChatGPT API for both prompts and outputs(reply emails)
    • 大小: ~14k
  6. iamai_email_threads.csv

    • 内容: Instruction Tuning Dataset for Email Threads Summarization, used ChatGPT API for both prompts and outputs(thread summaries)
    • 大小: ~17.5k
  7. iamai_failures_v1.csv

    • 内容: Instruction Tuning Dataset collected from failures of model (manojpreveen/gpt-neoxt-20b-v6) and ChatGPT API for outputs
    • 大小: ~10.7k
  8. iamai_identity.csv

    • 内容: Instruction Identity dataset focused on i.am+ organization
    • 模型名称: i.am.ai
    • 组织名称: iam+
    • 大小: ~900

其他相关数据集

  1. chat_gpt_v2.csv

    • 内容: Clean unique prompts collected from external datasets and outputs from ChatGPT API
    • 大小: ~23.8k
  2. stanford_alpaca_it_v3.csv

    • 内容: Instruction Tuning Set with inputs from external set and Outputs from ChatGPT API
    • 大小: ~51.5k
  3. stanford_alpaca_it_v4.csv

    • 内容: Instruction Tuning Set with inputs from external set and Outputs from GPT-4 API
    • 大小: ~51.5k
  4. code_alpaca.csv

    • 内容: Instruction Tuning Set generated Alpaca way for Coding domain with inputs from external set and Outputs from ChatGPT API
    • 大小: ~20k
  5. ColossalChat.csv

    • 内容: Instruction Tuning Set (English) with inputs from external set and Outputs from ChatGPT API
    • 大小: ~52k
  6. unified_chip2.csv

    • 内容: High Quality Instruction Tuning Set by Laion with Python Programming questions split across various programming languages and Outputs from ChatGPT API
    • 大小: ~210k
  7. databricks-dolly.csv

    • 内容: High Quality Human created Instruction Tuning Dataset by Databricks
    • 大小: ~15k
  8. gpt4_instruct.csv

    • 内容: Instruction dataset with outputs from GPT-4
    • 大小: ~18k
  9. gpt4_roleplay.csv

    • 内容: Instruction Roleplay dataset with outputs from GPT-4
    • 大小: ~3k
  10. gpt4_roleplay_v2.csv

    • 内容: Instruction Roleplay Supplemental dataset with outputs from GPT-4
    • 大小: ~7.2k
  11. camel_biology.csv

    • 内容: Instruction dataset on Biology domain with outputs from GPT-4
    • 大小: ~20k
  12. camel_chemistry.csv

    • 内容: Instruction dataset on Chemistry domain with outputs from GPT-4
    • 大小: ~20k
  13. camel_physics.csv

    • 内容: Instruction dataset on Physics domain with outputs from GPT-4
    • 大小: ~20k
  14. camel_math.csv

    • 内容: Instruction dataset on Math domain with outputs from GPT-4
    • 大小: ~50k
  15. FiQA_google.csv

    • 内容: Instruction Tuning dataset on Finance domain with prompts collected from external dataset and outputs from ChatGPT API
    • 大小: ~7k
  16. COIG_translate_en.csv

    • 内容: Instruction Tuning dataset with prompts collected from external dataset and outputs from ChatGPT API
    • 大小: ~66.2k
  17. synthetic_instruct.csv

    • 内容: Instruction Tuning dataset with prompts collected from external dataset and outputs from ChatGPT API
    • 大小: ~33.1k
  18. FLAN_auto_cot.csv

    • 内容: Instruction Tuning dataset (Mainly focused on Math COT) with prompts collected from external dataset and outputs from ChatGPT API
    • 大小: ~8.7k
  19. FLAN_cot_data.csv

    • 内容: Instruction Tuning COT dataset (from FLAN) with prompts collected from external dataset and outputs from ChatGPT API
    • 大小: ~73.4k
  20. LaMini_instruction.csv

    • 内容: Instruction Tuning dataset with prompts from various existing resources of prompts and outputs created using ChatGPT API
    • 大小: ~2.58M
  21. alpaca_evol_instruct_70k.csv

    • 内容: Instruction Tuning dataset - training data of WizardLM
    • 大小: ~70k
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作