five

DollyTails-12K

收藏
魔搭社区2025-11-12 更新2025-02-08 收录
下载链接:
https://modelscope.cn/datasets/PKU-Alignment/DollyTails-12K
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for DollyTails-12K ## Dataset Summary This dataset is designed with a System 2 (O1-like) thinking paradigm for instruction-following tasks. The prompts in the dataset are derived from [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k), with thoughts and answers annotated by GPT-4o. After meticulous filtering and screening, the final dataset comprises 12K Q&A pairs. The dataset averages 4.93 reasoning steps per task, with a cap of 7 steps to prevent unnecessary training overhead from lengthy samples. You can use this dataset to perform supervised fine-tuning (SFT) on a large language model (LLM) to obtain a model with a System 2-like reasoning paradigm. For detailed training code, please refer to [Align-Anything](https://github.com/PKU-Alignment/align-anything). ## Usage ``` from datasets import load_dataset train_dataset = load_dataset('PKU-Alignment/DollyTails-12K',split='train') val_dataset = load_dataset('PKU-Alignment/DollyTails-12K',split='validation') ``` ## Citation ``` @inproceedings{ji2024align, title={Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback}, author={Jiaming Ji and Jiayi Zhou and Hantao Lou and Boyuan Chen and Donghai Hong and Xuyao Wang and Wenqi Chen and Kaile Wang and Rui Pan and Jiahao Li and Mohan Wang and Josef Dai and Tianyi Qiu and Hua Xu and Dong Li and Weipeng Chen and Jun Song and Bo Zheng and Yaodong Yang}, year={2024}, url={https://arxiv.org/abs/2412.15838} } ``` ``` @online{DatabricksBlog2023DollyV2, author = {Mike Conover and Matt Hayes and Ankit Mathur and Jianwei Xie and Jun Wan and Sam Shah and Ali Ghodsi and Patrick Wendell and Matei Zaharia and Reynold Xin}, title = {Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM}, year = {2023}, url = {https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm}, urldate = {2023-06-30} } ```

# DollyTails-12K 数据集卡片 ## 数据集概述 本数据集基于类系统2(System 2,类似O1)的思维范式构建,专为指令跟随任务设计。数据集的提示词源自[databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k),其中的思维链与答案均由GPT-4o标注完成。经过细致的筛选与校验后,最终数据集共包含12000个问答对。 本数据集每个任务平均包含4.93个推理步骤,同时将推理步骤上限设为7,以避免过长样本带来不必要的训练开销。 您可利用本数据集对大语言模型(Large Language Model, LLM)执行监督微调(Supervised Fine-Tuning, SFT),以获得具备类系统2推理范式的模型。如需获取详细训练代码,请参考[Align-Anything](https://github.com/PKU-Alignment/align-anything)。 ## 使用方法 from datasets import load_dataset train_dataset = load_dataset('PKU-Alignment/DollyTails-12K',split='train') val_dataset = load_dataset('PKU-Alignment/DollyTails-12K',split='validation') ## 引用 @inproceedings{ji2024align, title={Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback}, author={Jiaming Ji and Jiayi Zhou and Hantao Lou and Boyuan Chen and Donghai Hong and Xuyao Wang and Wenqi Chen and Kaile Wang and Rui Pan and Jiahao Li and Mohan Wang and Josef Dai and Tianyi Qiu and Hua Xu and Dong Li and Weipeng Chen and Jun Song and Bo Zheng and Yaodong Yang}, year={2024}, url={https://arxiv.org/abs/2412.15838} } @online{DatabricksBlog2023DollyV2, author = {Mike Conover and Matt Hayes and Ankit Mathur and Jianwei Xie and Jun Wan and Sam Shah and Ali Ghodsi and Patrick Wendell and Matei Zaharia and Reynold Xin}, title = {Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM}, year = {2023}, url = {https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm}, urldate = {2023-06-30} }
提供机构:
maas
创建时间:
2025-02-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作