Align-Anything-Instruction-100K

Name: Align-Anything-Instruction-100K
Creator: maas
Published: 2025-12-04 16:22:37
License: 暂无描述

魔搭社区2025-12-04 更新2025-02-08 收录

下载链接：

https://modelscope.cn/datasets/PKU-Alignment/Align-Anything-Instruction-100K

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for Align-Anything-Instruction-100K [[🏠 Homepage](https://github.com/PKU-Alignment/align-anything)] [[🤗 Instruction-Dataset-100K(en)](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-Instruction-100K)] [[🤗 Instruction-Dataset-100K(zh)](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-Instruction-100K-zh)] [[🤗 Align-Anything Datasets](https://huggingface.co/datasets/PKU-Alignment/align-anything/)] ## Highlights <div class="col-md-12"> <ul> <li><b>Data sources:</b> <a href="https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-QA" target="_blank">PKU-SafeRLHF QA</a> , <a href="https://huggingface.co/datasets/knkarthick/dialogsum" target="_blank">DialogSum</a>, <a href="https://ai.meta.com/research/publications/towards-empathetic-open-domain-conversation-models-a-new-benchmark-and-dataset" target="_blank">Empathetic</a>, <a href="https://github.com/XueFuzhao/InstructionWild" target="_blank">Instruction-Wild</a>, and <a href="https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json" target="_blank">Alpaca</a>. </li> <li><b>100K QA pairs:</b> By leveraging GPT-4 to annotate meticulously refined instructions, we obtain 105,333 QA pairs. </li> </ul> </div> ## Dataset Summary This dataset is a sibling project of [Align-Anything](https://github.com/PKU-Alignment/align-anything). We provide a high-quality instruction-following dataset consisting of 100K question-answer entries, annotated and refined by GPT-4. Our prompts are sourced from multiple public datasets such as [PKU-SafeRLHF Dataset QA](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-QA), [DialogSum](https://huggingface.co/datasets/knkarthick/dialogsum), [Empathetic Dataset](https://ai.meta.com/research/publications/towards-empathetic-open-domain-conversation-models-a-new-benchmark-and-dataset), [Alpaca](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json), and [InstructionWild](https://github.com/XueFuzhao/InstructionWild). Each prompt is refined by GPT-4 under expert demonstration and specific guidelines, followed by GPT-4's annotation of the responses. This comprehensive and fine-grained pipeline results in a high-quality instruction-following dataset. ## Dataset Comparison ### Detailed Results We visualize our prompt distribution and compared it with the widely-used instruction-following dataset, [Alpaca-52K](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json). Our dataset covers a broader range of prompt types and includes various task types such as text summarization, sentiment analysis, etc. <div align="center"> <img src="vs.png" width="70%"/> </div> We train several base models using both Align-Anything-Instruction-100K (sampled 52K) and Alpaca-52K. We evaluate the fine-tuned models on the [Just-Eval](https://huggingface.co/datasets/re-align/just-eval-instruct) benchmark, assessing the responses across five dimensions: helpfulness, clarity, factuality, depth, and engagement. The models demonstrate excellent performance in all dimensions. <div align="center"> <img src="performance.png" width="70%"/> </div> ## Evaluation Details ### Just-Eval Overview [Just-Eval](https://huggingface.co/datasets/re-align/just-eval-instruct) covers multiple prompts that fully assess the model's instruction-following capabilities, such as [AlpacaEval](https://huggingface.co/datasets/tatsu-lab/alpaca_eval), [LIMA-test](https://huggingface.co/datasets/GAIR/lima/viewer/plain_text/test), [MT-bench](https://huggingface.co/datasets/HuggingFaceH4/mt_bench_prompts), [Anthropic red-teaming](https://huggingface.co/datasets/Anthropic/hh-rlhf/tree/main/red-team-attempts), and [MaliciousInstruct](https://github.com/Princeton-SysML/Jailbreak_LLM/blob/main/data/MaliciousInstruct.txt). We utilize the 800 instructions that focus on problem-solving tests without considering the safety of responses, following the benchmark guidelines outlined [here](https://allenai.github.io/re-align/just_eval.html). ### Evaluation Criterias We adopt the same evaluation criteria as the [JustEval Benchmark](https://allenai.github.io/re-align/index.html), detailed as follows: <div class="col-md-12"> <ul> <li><b>Helpfulness:</b> Evaluates how well the response addresses the given query or question and assists the user. A good response is highly relevant and helpful.</li> <li><b>Clarity:</b> Assesses the logical flow and coherence of the response. A good response is well-structured, with ideas presented clearly and coherently.</li> <li><b>Factuality:</b> Assesses the accuracy of the information presented in the response. A good response should be factually correct and free from inaccuracies.</li> <li><b>Depth:</b> Evaluates the thoroughness and detail of the response. A good response should be comprehensive and in-depth.</li> <li><b>Engagement:</b> Assesses how engaging and natural the response sounds in a conversational context. A good response should feel engaging and have a human-like tone.</li> </ul> </div> ## Usage To load our dataset, use the `load_dataset()` function as follows: ```python from datasets import load_dataset dataset = load_dataset("PKU-Alignment/Align-Anything-Instruction-100K") ```

# 数据集卡片：Align-Anything-Instruction-100K [[🏠 主页（Homepage）](https://github.com/PKU-Alignment/align-anything)] [[🤗 Hugging Face 指令数据集100K（英文版，Instruction-Dataset-100K(en)）](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-Instruction-100K)] [[🤗 Hugging Face 指令数据集100K（中文版，Instruction-Dataset-100K(zh)）](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-Instruction-100K-zh)] [[🤗 Hugging Face Align-Anything 数据集合集（Align-Anything Datasets）](https://huggingface.co/datasets/PKU-Alignment/align-anything/)] ## 核心亮点 <div class="col-md-12"> <ul> <li><b>数据来源（Data sources）:</b> <a href="https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-QA" target="_blank">PKU-SafeRLHF问答数据集（PKU-SafeRLHF QA）</a> , <a href="https://huggingface.co/datasets/knkarthick/dialogsum" target="_blank">对话摘要数据集（DialogSum）</a>, <a href="https://ai.meta.com/research/publications/towards-empathetic-open-domain-conversation-models-a-new-benchmark-and-dataset" target="_blank">共情对话数据集（Empathetic）</a>, <a href="https://github.com/XueFuzhao/InstructionWild" target="_blank">野生指令数据集（Instruction-Wild）</a>, and <a href="https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json" target="_blank">Alpaca数据集（Alpaca）</a>. </li> <li><b>10万条问答对（100K QA pairs）:</b> 借助GPT-4对精心打磨的指令进行标注，我们共获取到105,333条问答对。 </li> </ul> </div> ## 数据集概览本数据集是[Align-Anything](https://github.com/PKU-Alignment/align-anything)的姊妹项目。我们推出了一款高质量的指令遵循数据集，包含10万条问答条目，均由GPT-4完成标注与打磨。本数据集的提示词来源于多个公开数据集，包括[PKU-SafeRLHF问答数据集（PKU-SafeRLHF Dataset QA）](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-QA)、[对话摘要数据集（DialogSum）](https://huggingface.co/datasets/knkarthick/dialogsum)、[共情对话数据集（Empathetic Dataset）](https://ai.meta.com/research/publications/towards-empathetic-open-domain-conversation-models-a-new-benchmark-and-dataset)、[Alpaca数据集（Alpaca）](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json)以及[野生指令数据集（InstructionWild）](https://github.com/XueFuzhao/InstructionWild)。每条提示词均由GPT-4在专家演示与特定准则的指导下完成打磨，随后由GPT-4对模型回复进行标注。这套全面且精细化的流程最终产出了这款高质量的指令遵循数据集。 ## 数据集对比 ### 详细结果我们对自身的提示词分布进行了可视化，并与当前广泛使用的指令遵循数据集[Alpaca-52K数据集（Alpaca-52K）](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json)进行了对比。本数据集覆盖了更广泛的提示词类型，涵盖文本摘要、情感分析等多种任务类型。 <div align="center"> <img src="vs.png" width="70%"/> </div> 我们分别使用Align-Anything-Instruction-100K（采样52K条数据）与Alpaca-52K数据集训练了多款基础模型，并在[Just-Eval评测基准（Just-Eval）](https://huggingface.co/datasets/re-align/just-eval-instruct)上对微调后的模型进行评估，从有用性（Helpfulness）、清晰性（Clarity）、事实性（Factuality）、深度（Depth）与交互性（Engagement）五个维度对模型回复进行打分。结果显示，两款模型在所有维度上均表现优异。 <div align="center"> <img src="performance.png" width="70%"/> </div> ## 评测细节 ### Just-Eval评测概览 [Just-Eval评测基准（Just-Eval）](https://huggingface.co/datasets/re-align/just-eval-instruct)包含多组提示词，可全面评估模型的指令遵循能力，涵盖[AlpacaEval评测集（AlpacaEval）](https://huggingface.co/datasets/tatsu-lab/alpaca_eval)、[LIMA测试集（LIMA-test）](https://huggingface.co/datasets/GAIR/lima/viewer/plain_text/test)、[MT-bench评测集（MT-bench）](https://huggingface.co/datasets/HuggingFaceH4/mt_bench_prompts)、[Anthropic红队测试数据集（Anthropic red-teaming）](https://huggingface.co/datasets/Anthropic/hh-rlhf/tree/main/red-team-attempts)以及[MaliciousInstruct数据集（MaliciousInstruct）](https://github.com/Princeton-SysML/Jailbreak_LLM/blob/main/data/MaliciousInstruct.txt)等多个评测源。我们遵循该评测基准在[此处](https://allenai.github.io/re-align/just_eval.html)公布的指南，选取了其中800条聚焦于问题解决能力的提示词，暂不考虑回复的安全性。 ### 评测准则我们采用与[JustEval评测基准（JustEval Benchmark）](https://allenai.github.io/re-align/index.html)一致的评测准则，具体如下： <div class="col-md-12"> <ul> <li><b>有用性（Helpfulness）:</b> 评估回复对给定查询或问题的回应效果与辅助程度。优质回复应具备高度相关性与实用性。</li> <li><b>清晰性（Clarity）:</b> 评估回复的逻辑流程与连贯性。优质回复结构清晰，观点表达明确且连贯。</li> <li><b>事实性（Factuality）:</b> 评估回复中呈现信息的准确性。优质回复应符合事实，无错误信息。</li> <li><b>深度（Depth）:</b> 评估回复的详尽程度与细节丰富度。优质回复应具备全面性与深度。</li> <li><b>交互性（Engagement）:</b> 评估回复在对话场景中的吸引力与自然度。优质回复应具备良好的交互感，语气贴近人类表达。</li> </ul> </div> ## 使用方法若需加载本数据集，请使用`load_dataset()`函数，示例代码如下： python from datasets import load_dataset dataset = load_dataset("PKU-Alignment/Align-Anything-Instruction-100K")

提供机构：

maas

创建时间：

2025-02-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集