melvindave/alpaca-cleaned
收藏Hugging Face2025-12-14 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/melvindave/alpaca-cleaned
下载链接
链接失效反馈官方服务:
资源简介:
这是斯坦福发布的原始Alpaca数据集的清理版本。原始数据集中存在多种问题,如幻觉指令(引用互联网数据导致GPT3产生虚假回答)、合并指令、空输出、空代码示例、生成图像的指令、N/A输出、输入字段不一致、错误答案、非理性/不清晰的指令以及多余的转义和控制字符等。这些问题在本数据集中得到了修复。原始Alpaca数据集包含52,000条指令和演示,由OpenAI的text-davinci-003引擎生成,用于对语言模型进行指令微调,使其更好地遵循指令。
This is a cleaned version of the original Alpaca Dataset released by Stanford. The following issues have been identified in the original release and fixed in this dataset: hallucinations (instructions referencing internet data causing GPT3 to hallucinate answers), merged instructions, empty outputs, empty code examples, instructions to generate images, N/A outputs, inconsistent input field usage, wrong answers, non-sensical/unclear instructions, and extraneous escape/control characters. The original Alpaca dataset consists of 52,000 instructions and demonstrations generated by OpenAIs text-davinci-003 engine, designed for instruction-tuning language models to better follow instructions.
提供机构:
melvindave



