chimbiwide/databricks-filtered-1024
收藏Hugging Face2025-12-14 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/chimbiwide/databricks-filtered-1024
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是通过过滤databricks-thinking数据集创建的,仅包含少于1024个标记的对话。目的是解决大型模型生成冗长推理痕迹的问题,这些痕迹对于小型模型来说难以学习。数据集专注于创造性任务和开放式问答问题,选择了databricks-dolly数据集因其人工策划的内容。选用Qwen3-14b模型生成推理痕迹,因其能够无需大量提示即可生成简洁且逻辑性强的痕迹,速度快且知识容量足够。数据集包含提示和答案的示例,展示了模型的推理和回答格式。
This dataset was created by filtering the databricks-thinking dataset to include only conversations shorter than 1024 tokens. It aims to address the issue of large models generating lengthy reasoning traces that smaller models struggle to learn from. The focus is on creative tasks and open-ended QA questions, with the databricks-dolly dataset selected for its human-curated content. The Qwen3-14b model was chosen for generating reasoning traces due to its ability to produce concise and logical traces without extensive prompting, its speed, and its knowledge capacity. The dataset includes examples of prompts and answers, demonstrating the models reasoning and response format.
提供机构:
chimbiwide



