uv-scripts/synthetic-data
收藏Hugging Face2025-08-05 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/uv-scripts/synthetic-data
下载链接
链接失效反馈官方服务:
资源简介:
CoT-Self-Instruct是一个使用Chain-of-Thought自指令方法生成高质量合成训练数据的数据集。它能够生成适用于推理任务(如数学问题、逻辑谜题等)和一般指令任务(如创意写作、编码分析等)的多样化提示。该数据集通过分析种子示例,生成新的类似质量和复杂度的示例,并使用自动质量指标进行输出过滤,以确保数据的高质量。
CoT-Self-Instruct is a dataset that generates high-quality synthetic training data using the Chain-of-Thought Self-Instruct methodology. It is capable of generating diverse prompts suitable for reasoning tasks (such as math problems, logic puzzles) and general instruction tasks (such as creative writing, coding analysis). The dataset ensures high quality by analyzing seed examples, generating new examples of similar quality and complexity, and filtering outputs using automatic quality metrics.
提供机构:
uv-scripts



