DavidLanz/alpaca-tw-input-output-52k

Name: DavidLanz/alpaca-tw-input-output-52k
Creator: DavidLanz
Published: 2023-09-25 00:56:10
License: 暂无描述

Hugging Face2023-09-25 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/DavidLanz/alpaca-tw-input-output-52k

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - text-generation - conversational - question-answering language: - en size_categories: - 10K<n<100K license: cc-by-4.0 tags: - gpt3 - alpaca - instruction-finetuning --- # Dataset Card for "alpaca-tw-input-output-52k" This dataset contains English Instruction-Following generated by GPT-3.5 using Alpaca prompts for fine-tuning LLMs. The dataset was originaly shared in this repository: https://github.com/ntunlplab/traditional-chinese-alpaca. This is just a wraper for compatibility with huggingface's datasets library. ## Dataset structure It contains 52K instruction-following data generated by GPT-3.5 using the same prompts as in Alpaca. The dataset has the same format as Alpaca data, except the output is generated by GPT-3.5: - `instruction`: `str`, describes the task the model should perform. Each of the 52K instructions is unique. - `input`: `str`, optional context or input for the task. - `output`: `str`, the answer to the instruction as generated by `GPT-3.5`. ## Difference with the original Alpaca dataset The original Alpaca dataset used text-davinci-003 to complete the prompts. This dataset uses those same prompts, but generating the completions with GPT-3.5. Thus, in general, the responses are of higher quality and lenght. Here is an example: #### Example from Alpaca-GPT3: ```bash { 'instruction': '辨識那一個平台與其他不同。', 'input': 'Twitter, Instagram, Telegram', 'output': '在Twitter、Instagram和Telegram之間，Telegram是與其他兩者最不同的平台。' } ``` ## Licensing Information The dataset is available under the [Creative Commons NonCommercial (CC BY-NC 4.0)](https://creativecommons.org/licenses/by-nc/4.0/legalcode).

提供机构：

DavidLanz

原始信息汇总