ritikwq/databricks-dolly-15k
收藏Hugging Face2025-12-13 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/ritikwq/databricks-dolly-15k
下载链接
链接失效反馈官方服务:
资源简介:
`databricks-dolly-15k` 是一个开源数据集,包含由数千名 Databricks 员工生成的超过 15,000 条指令遵循记录,旨在使大型语言模型能够展现类似 ChatGPT 的交互性。这些记录涵盖了多种行为类别,包括头脑风暴、分类、封闭问答、生成、信息提取、开放问答和摘要等。数据集可用于任何目的,包括学术或商业应用,遵循 Creative Commons Attribution-ShareAlike 3.0 Unported License 许可。数据集中的记录由员工根据特定指南创建,部分类别参考了 Wikipedia 的文本。数据集的语言为美式英语,已知限制包括 Wikipedia 的偏见和错误,以及部分标注者可能非英语母语者。
`databricks-dolly-15k` is an open source dataset of instruction-following records generated by thousands of Databricks employees to enable large language models to exhibit the magical interactivity of ChatGPT. The dataset contains over 15,000 records across eight different instruction categories, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. It can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported License. The records were created by employees following specific guidelines, with some categories referencing text from Wikipedia. The dataset is in American English, with known limitations including biases and errors from Wikipedia, and some annotators may not be native English speakers.
提供机构:
ritikwq



