five

botp/shibing624_alpaca-zh

收藏
Hugging Face2024-05-29 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/botp/shibing624_alpaca-zh
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 32150579 num_examples: 48818 download_size: 35100559 dataset_size: 32150579 license: cc-by-4.0 language: - zh pretty_name: Instruction Tuning with GPT-4 size_categories: - 10K<n<100K task_categories: - text-generation tags: - gpt - alpaca - fine-tune - instruct-tune - instruction --- # Dataset Description - **Project Page:** https://instruction-tuning-with-gpt-4.github.io - **Repo:** https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM - **Paper:** https://arxiv.org/abs/2304.03277 # Dataset Card for "alpaca-zh" 本数据集是参考Alpaca方法基于GPT4得到的self-instruct数据,约5万条。 Dataset from https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM It is the chinese dataset from https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/data/alpaca_gpt4_data_zh.json # Usage and License Notices The data is intended and licensed for research use only. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes. train model with alpaca-zh dataset: https://github.com/shibing624/textgen # English Dataset [Found here](https://huggingface.co/datasets/c-s-ale/alpaca-gpt4-data) # Citation ``` @article{peng2023gpt4llm, title={Instruction Tuning with GPT-4}, author={Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, Jianfeng Gao}, journal={arXiv preprint arXiv:2304.03277}, year={2023} } ```
提供机构:
botp
原始信息汇总

数据集概述

基本信息

  • 名称: Instruction Tuning with GPT-4
  • 语言: 中文 (zh)
  • 大小: 10K<n<100K
  • 任务类别: 文本生成
  • 标签: gpt, alpaca, fine-tune, instruct-tune, instruction
  • 许可证: CC-BY-4.0

数据集结构

  • 特征:
    • instruction: 字符串类型
    • input: 字符串类型
    • output: 字符串类型

数据集划分

  • 训练集:
    • 示例数量: 48818
    • 字节数: 32150579

数据集大小

  • 下载大小: 35100559
  • 数据集大小: 32150579

数据集用途

  • 用途: 研究使用
  • 许可证: CC BY NC 4.0 (仅限非商业用途)
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作