iamplus/Instruction_Tuning
收藏Hugging Face2023-05-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/iamplus/Instruction_Tuning
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个子数据集,主要用于指令调优、文章摘要、邮件回复、邮件线程摘要、模型失败案例、身份识别、代码生成、角色扮演、生物学、化学、物理学、数学、金融问答、翻译、自动推理链(COT)等领域。数据集来源包括ChatGPT API、GPT-4 API、以及多个公开数据集。具体数据集包括:IAMAI的种子任务、指令调优数据集、文章摘要数据集、邮件摘要数据集、邮件回复数据集、邮件线程摘要数据集、模型失败案例数据集、身份识别数据集、ChatGPT提示数据集、Stanford Alpaca指令调优数据集、代码生成数据集、ColossalChat指令调优数据集、Laion高质量指令调优数据集、Databricks Dolly人类创建指令调优数据集、GPT-4指令数据集、GPT-4角色扮演数据集、生物学指令数据集、化学指令数据集、物理学指令数据集、数学指令数据集、金融问答指令数据集、翻译指令数据集、自动推理链(COT)指令数据集等。
提供机构:
iamplus
原始信息汇总
数据集概述
主要数据集
-
iamai_seed_tasks_v1.csv
- 内容: IAMAIs seed tasks - Version 1
- 大小: 879
-
iamai_v1.csv
- 内容: Instruction Tuning Dataset collected using seeds from iamai_seed_tasks_v1.csv and ChatGPT API for both prompts and outputs
- 大小: ~248k
-
iamai_summarization_v1.csv
- 内容: Article Summarization dataset (both prompts and outputs) collected using ChatGPT API
- 大小: ~1.2k
-
iamai_email_summarization.csv
- 内容: Email Summarization dataset (both prompts and outputs) collected using ChatGPT API
- 大小: ~14k
-
iamai_email_reply_v1.csv
- 内容: Instruction Tuning Dataset for Email Replying, used ChatGPT API for both prompts and outputs(reply emails)
- 大小: ~14k
-
iamai_email_threads.csv
- 内容: Instruction Tuning Dataset for Email Threads Summarization, used ChatGPT API for both prompts and outputs(thread summaries)
- 大小: ~17.5k
-
iamai_failures_v1.csv
- 内容: Instruction Tuning Dataset collected from failures of model (manojpreveen/gpt-neoxt-20b-v6) and ChatGPT API for outputs
- 大小: ~10.7k
-
iamai_identity.csv
- 内容: Instruction Identity dataset focused on i.am+ organization
- 模型名称: i.am.ai
- 组织名称: iam+
- 大小: ~900
其他相关数据集
-
chat_gpt_v2.csv
- 内容: Clean unique prompts collected from external datasets and outputs from ChatGPT API
- 大小: ~23.8k
-
stanford_alpaca_it_v3.csv
- 内容: Instruction Tuning Set with inputs from external set and Outputs from ChatGPT API
- 大小: ~51.5k
-
stanford_alpaca_it_v4.csv
- 内容: Instruction Tuning Set with inputs from external set and Outputs from GPT-4 API
- 大小: ~51.5k
-
code_alpaca.csv
- 内容: Instruction Tuning Set generated Alpaca way for Coding domain with inputs from external set and Outputs from ChatGPT API
- 大小: ~20k
-
ColossalChat.csv
- 内容: Instruction Tuning Set (English) with inputs from external set and Outputs from ChatGPT API
- 大小: ~52k
-
unified_chip2.csv
- 内容: High Quality Instruction Tuning Set by Laion with Python Programming questions split across various programming languages and Outputs from ChatGPT API
- 大小: ~210k
-
databricks-dolly.csv
- 内容: High Quality Human created Instruction Tuning Dataset by Databricks
- 大小: ~15k
-
gpt4_instruct.csv
- 内容: Instruction dataset with outputs from GPT-4
- 大小: ~18k
-
gpt4_roleplay.csv
- 内容: Instruction Roleplay dataset with outputs from GPT-4
- 大小: ~3k
-
gpt4_roleplay_v2.csv
- 内容: Instruction Roleplay Supplemental dataset with outputs from GPT-4
- 大小: ~7.2k
-
camel_biology.csv
- 内容: Instruction dataset on Biology domain with outputs from GPT-4
- 大小: ~20k
-
camel_chemistry.csv
- 内容: Instruction dataset on Chemistry domain with outputs from GPT-4
- 大小: ~20k
-
camel_physics.csv
- 内容: Instruction dataset on Physics domain with outputs from GPT-4
- 大小: ~20k
-
camel_math.csv
- 内容: Instruction dataset on Math domain with outputs from GPT-4
- 大小: ~50k
-
FiQA_google.csv
- 内容: Instruction Tuning dataset on Finance domain with prompts collected from external dataset and outputs from ChatGPT API
- 大小: ~7k
-
COIG_translate_en.csv
- 内容: Instruction Tuning dataset with prompts collected from external dataset and outputs from ChatGPT API
- 大小: ~66.2k
-
synthetic_instruct.csv
- 内容: Instruction Tuning dataset with prompts collected from external dataset and outputs from ChatGPT API
- 大小: ~33.1k
-
FLAN_auto_cot.csv
- 内容: Instruction Tuning dataset (Mainly focused on Math COT) with prompts collected from external dataset and outputs from ChatGPT API
- 大小: ~8.7k
-
FLAN_cot_data.csv
- 内容: Instruction Tuning COT dataset (from FLAN) with prompts collected from external dataset and outputs from ChatGPT API
- 大小: ~73.4k
-
LaMini_instruction.csv
- 内容: Instruction Tuning dataset with prompts from various existing resources of prompts and outputs created using ChatGPT API
- 大小: ~2.58M
-
alpaca_evol_instruct_70k.csv
- 内容: Instruction Tuning dataset - training data of WizardLM
- 大小: ~70k



