iamplus/Instruction_Tuning

Name: iamplus/Instruction_Tuning
Creator: iamplus
Published: 2023-05-22 09:13:04
License: 暂无描述

Hugging Face2023-05-22 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/iamplus/Instruction_Tuning

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含多个子数据集，主要用于指令调优、文章摘要、邮件回复、邮件线程摘要、模型失败案例、身份识别、代码生成、角色扮演、生物学、化学、物理学、数学、金融问答、翻译、自动推理链（COT）等领域。数据集来源包括ChatGPT API、GPT-4 API、以及多个公开数据集。具体数据集包括：IAMAI的种子任务、指令调优数据集、文章摘要数据集、邮件摘要数据集、邮件回复数据集、邮件线程摘要数据集、模型失败案例数据集、身份识别数据集、ChatGPT提示数据集、Stanford Alpaca指令调优数据集、代码生成数据集、ColossalChat指令调优数据集、Laion高质量指令调优数据集、Databricks Dolly人类创建指令调优数据集、GPT-4指令数据集、GPT-4角色扮演数据集、生物学指令数据集、化学指令数据集、物理学指令数据集、数学指令数据集、金融问答指令数据集、翻译指令数据集、自动推理链（COT）指令数据集等。

提供机构：

iamplus

原始信息汇总

数据集概述

主要数据集

iamai_seed_tasks_v1.csv
- 内容: IAMAIs seed tasks - Version 1
- 大小: 879
iamai_v1.csv
- 内容: Instruction Tuning Dataset collected using seeds from iamai_seed_tasks_v1.csv and ChatGPT API for both prompts and outputs
- 大小: ~248k
iamai_summarization_v1.csv
- 内容: Article Summarization dataset (both prompts and outputs) collected using ChatGPT API
- 大小: ~1.2k
iamai_email_summarization.csv
- 内容: Email Summarization dataset (both prompts and outputs) collected using ChatGPT API
- 大小: ~14k
iamai_email_reply_v1.csv
- 内容: Instruction Tuning Dataset for Email Replying, used ChatGPT API for both prompts and outputs(reply emails)
- 大小: ~14k
iamai_email_threads.csv
- 内容: Instruction Tuning Dataset for Email Threads Summarization, used ChatGPT API for both prompts and outputs(thread summaries)
- 大小: ~17.5k
iamai_failures_v1.csv
- 内容: Instruction Tuning Dataset collected from failures of model (manojpreveen/gpt-neoxt-20b-v6) and ChatGPT API for outputs
- 大小: ~10.7k
iamai_identity.csv
- 内容: Instruction Identity dataset focused on i.am+ organization
- 模型名称: i.am.ai
- 组织名称: iam+
- 大小: ~900

其他相关数据集

chat_gpt_v2.csv
- 内容: Clean unique prompts collected from external datasets and outputs from ChatGPT API
- 大小: ~23.8k
stanford_alpaca_it_v3.csv
- 内容: Instruction Tuning Set with inputs from external set and Outputs from ChatGPT API
- 大小: ~51.5k
stanford_alpaca_it_v4.csv
- 内容: Instruction Tuning Set with inputs from external set and Outputs from GPT-4 API
- 大小: ~51.5k
code_alpaca.csv
- 内容: Instruction Tuning Set generated Alpaca way for Coding domain with inputs from external set and Outputs from ChatGPT API
- 大小: ~20k
ColossalChat.csv
- 内容: Instruction Tuning Set (English) with inputs from external set and Outputs from ChatGPT API
- 大小: ~52k
unified_chip2.csv
- 内容: High Quality Instruction Tuning Set by Laion with Python Programming questions split across various programming languages and Outputs from ChatGPT API
- 大小: ~210k
databricks-dolly.csv
- 内容: High Quality Human created Instruction Tuning Dataset by Databricks
- 大小: ~15k
gpt4_instruct.csv
- 内容: Instruction dataset with outputs from GPT-4
- 大小: ~18k
gpt4_roleplay.csv
- 内容: Instruction Roleplay dataset with outputs from GPT-4
- 大小: ~3k
gpt4_roleplay_v2.csv
- 内容: Instruction Roleplay Supplemental dataset with outputs from GPT-4
- 大小: ~7.2k
camel_biology.csv
- 内容: Instruction dataset on Biology domain with outputs from GPT-4
- 大小: ~20k
camel_chemistry.csv
- 内容: Instruction dataset on Chemistry domain with outputs from GPT-4
- 大小: ~20k
camel_physics.csv
- 内容: Instruction dataset on Physics domain with outputs from GPT-4
- 大小: ~20k
camel_math.csv
- 内容: Instruction dataset on Math domain with outputs from GPT-4
- 大小: ~50k
FiQA_google.csv
- 内容: Instruction Tuning dataset on Finance domain with prompts collected from external dataset and outputs from ChatGPT API
- 大小: ~7k
COIG_translate_en.csv
- 内容: Instruction Tuning dataset with prompts collected from external dataset and outputs from ChatGPT API
- 大小: ~66.2k
synthetic_instruct.csv
- 内容: Instruction Tuning dataset with prompts collected from external dataset and outputs from ChatGPT API
- 大小: ~33.1k
FLAN_auto_cot.csv
- 内容: Instruction Tuning dataset (Mainly focused on Math COT) with prompts collected from external dataset and outputs from ChatGPT API
- 大小: ~8.7k
FLAN_cot_data.csv
- 内容: Instruction Tuning COT dataset (from FLAN) with prompts collected from external dataset and outputs from ChatGPT API
- 大小: ~73.4k
LaMini_instruction.csv
- 内容: Instruction Tuning dataset with prompts from various existing resources of prompts and outputs created using ChatGPT API
- 大小: ~2.58M
alpaca_evol_instruct_70k.csv
- 内容: Instruction Tuning dataset - training data of WizardLM
- 大小: ~70k

5,000+

优质数据集

54 个

任务类型

进入经典数据集