GPT-4自指导数据集
收藏arXiv2024-03-06 更新2024-06-21 收录
下载链接:
https://github.com/hitoshizuku7/awesome-Ja-self-instruct
下载链接
链接失效反馈官方服务:
资源简介:
GPT-4自指导数据集是由京都大学开发的高质量日语指令数据集,用于大型语言模型的监督微调。该数据集包含52,000条指令,通过将少量英语指令翻译并本地化成日语,利用GPT-4自动生成多样化的指令数据。数据集的创建过程涉及翻译、本地化编辑和自动生成,旨在解决非英语语言在大型语言模型开发中的资源不足问题,特别是在日语领域的应用。
The GPT-4 Self-Instruct Dataset is a high-quality Japanese instruction dataset developed by Kyoto University for supervised fine-tuning of large language models (LLMs). It contains 52,000 instruction entries, where diverse instruction data is automatically generated by translating and localizing a small set of English instructions into Japanese with GPT-4. The dataset creation process involves translation, localization editing and automatic generation, which is designed to address the resource scarcity problem faced by non-English languages in large language model development, particularly for Japanese language applications.
提供机构:
京都大学
创建时间:
2024-03-06



