melvindave/alpaca-cleaned

Name: melvindave/alpaca-cleaned
Creator: melvindave
Published: 2025-12-14 12:53:22
License: 暂无描述

Hugging Face2025-12-14 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/melvindave/alpaca-cleaned

下载链接

链接失效反馈

官方服务：

资源简介：

这是斯坦福发布的原始Alpaca数据集的清理版本。原始数据集中存在多种问题，如幻觉指令（引用互联网数据导致GPT3产生虚假回答）、合并指令、空输出、空代码示例、生成图像的指令、N/A输出、输入字段不一致、错误答案、非理性/不清晰的指令以及多余的转义和控制字符等。这些问题在本数据集中得到了修复。原始Alpaca数据集包含52,000条指令和演示，由OpenAI的text-davinci-003引擎生成，用于对语言模型进行指令微调，使其更好地遵循指令。

This is a cleaned version of the original Alpaca Dataset released by Stanford. The following issues have been identified in the original release and fixed in this dataset: hallucinations (instructions referencing internet data causing GPT3 to hallucinate answers), merged instructions, empty outputs, empty code examples, instructions to generate images, N/A outputs, inconsistent input field usage, wrong answers, non-sensical/unclear instructions, and extraneous escape/control characters. The original Alpaca dataset consists of 52,000 instructions and demonstrations generated by OpenAIs text-davinci-003 engine, designed for instruction-tuning language models to better follow instructions.

提供机构：

melvindave

5,000+

优质数据集

54 个

任务类型

进入经典数据集