five

Align-Anything-Instruction-100K

收藏
魔搭社区2025-12-04 更新2025-02-08 收录
下载链接:
https://modelscope.cn/datasets/PKU-Alignment/Align-Anything-Instruction-100K
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for Align-Anything-Instruction-100K [[🏠 Homepage](https://github.com/PKU-Alignment/align-anything)] [[🤗 Instruction-Dataset-100K(en)](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-Instruction-100K)] [[🤗 Instruction-Dataset-100K(zh)](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-Instruction-100K-zh)] [[🤗 Align-Anything Datasets](https://huggingface.co/datasets/PKU-Alignment/align-anything/)] ## Highlights <div class="col-md-12"> <ul> <li><b>Data sources:</b> <a href="https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-QA" target="_blank">PKU-SafeRLHF QA</a> , <a href="https://huggingface.co/datasets/knkarthick/dialogsum" target="_blank">DialogSum</a>, <a href="https://ai.meta.com/research/publications/towards-empathetic-open-domain-conversation-models-a-new-benchmark-and-dataset" target="_blank">Empathetic</a>, <a href="https://github.com/XueFuzhao/InstructionWild" target="_blank">Instruction-Wild</a>, and <a href="https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json" target="_blank">Alpaca</a>. </li> <li><b>100K QA pairs:</b> By leveraging GPT-4 to annotate meticulously refined instructions, we obtain 105,333 QA pairs. </li> </ul> </div> ## Dataset Summary This dataset is a sibling project of [Align-Anything](https://github.com/PKU-Alignment/align-anything). We provide a high-quality instruction-following dataset consisting of 100K question-answer entries, annotated and refined by GPT-4. Our prompts are sourced from multiple public datasets such as [PKU-SafeRLHF Dataset QA](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-QA), [DialogSum](https://huggingface.co/datasets/knkarthick/dialogsum), [Empathetic Dataset](https://ai.meta.com/research/publications/towards-empathetic-open-domain-conversation-models-a-new-benchmark-and-dataset), [Alpaca](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json), and [InstructionWild](https://github.com/XueFuzhao/InstructionWild). Each prompt is refined by GPT-4 under expert demonstration and specific guidelines, followed by GPT-4's annotation of the responses. This comprehensive and fine-grained pipeline results in a high-quality instruction-following dataset. ## Dataset Comparison ### Detailed Results We visualize our prompt distribution and compared it with the widely-used instruction-following dataset, [Alpaca-52K](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json). Our dataset covers a broader range of prompt types and includes various task types such as text summarization, sentiment analysis, etc. <div align="center"> <img src="vs.png" width="70%"/> </div> We train several base models using both Align-Anything-Instruction-100K (sampled 52K) and Alpaca-52K. We evaluate the fine-tuned models on the [Just-Eval](https://huggingface.co/datasets/re-align/just-eval-instruct) benchmark, assessing the responses across five dimensions: helpfulness, clarity, factuality, depth, and engagement. The models demonstrate excellent performance in all dimensions. <div align="center"> <img src="performance.png" width="70%"/> </div> ## Evaluation Details ### Just-Eval Overview [Just-Eval](https://huggingface.co/datasets/re-align/just-eval-instruct) covers multiple prompts that fully assess the model's instruction-following capabilities, such as [AlpacaEval](https://huggingface.co/datasets/tatsu-lab/alpaca_eval), [LIMA-test](https://huggingface.co/datasets/GAIR/lima/viewer/plain_text/test), [MT-bench](https://huggingface.co/datasets/HuggingFaceH4/mt_bench_prompts), [Anthropic red-teaming](https://huggingface.co/datasets/Anthropic/hh-rlhf/tree/main/red-team-attempts), and [MaliciousInstruct](https://github.com/Princeton-SysML/Jailbreak_LLM/blob/main/data/MaliciousInstruct.txt). We utilize the 800 instructions that focus on problem-solving tests without considering the safety of responses, following the benchmark guidelines outlined [here](https://allenai.github.io/re-align/just_eval.html). ### Evaluation Criterias We adopt the same evaluation criteria as the [JustEval Benchmark](https://allenai.github.io/re-align/index.html), detailed as follows: <div class="col-md-12"> <ul> <li><b>Helpfulness:</b> Evaluates how well the response addresses the given query or question and assists the user. A good response is highly relevant and helpful.</li> <li><b>Clarity:</b> Assesses the logical flow and coherence of the response. A good response is well-structured, with ideas presented clearly and coherently.</li> <li><b>Factuality:</b> Assesses the accuracy of the information presented in the response. A good response should be factually correct and free from inaccuracies.</li> <li><b>Depth:</b> Evaluates the thoroughness and detail of the response. A good response should be comprehensive and in-depth.</li> <li><b>Engagement:</b> Assesses how engaging and natural the response sounds in a conversational context. A good response should feel engaging and have a human-like tone.</li> </ul> </div> ## Usage To load our dataset, use the `load_dataset()` function as follows: ```python from datasets import load_dataset dataset = load_dataset("PKU-Alignment/Align-Anything-Instruction-100K") ```

# 数据集卡片:Align-Anything-Instruction-100K [[🏠 主页(Homepage)](https://github.com/PKU-Alignment/align-anything)] [[🤗 Hugging Face 指令数据集100K(英文版,Instruction-Dataset-100K(en))](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-Instruction-100K)] [[🤗 Hugging Face 指令数据集100K(中文版,Instruction-Dataset-100K(zh))](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-Instruction-100K-zh)] [[🤗 Hugging Face Align-Anything 数据集合集(Align-Anything Datasets)](https://huggingface.co/datasets/PKU-Alignment/align-anything/)] ## 核心亮点 <div class="col-md-12"> <ul> <li><b>数据来源(Data sources):</b> <a href="https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-QA" target="_blank">PKU-SafeRLHF问答数据集(PKU-SafeRLHF QA)</a> , <a href="https://huggingface.co/datasets/knkarthick/dialogsum" target="_blank">对话摘要数据集(DialogSum)</a>, <a href="https://ai.meta.com/research/publications/towards-empathetic-open-domain-conversation-models-a-new-benchmark-and-dataset" target="_blank">共情对话数据集(Empathetic)</a>, <a href="https://github.com/XueFuzhao/InstructionWild" target="_blank">野生指令数据集(Instruction-Wild)</a>, and <a href="https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json" target="_blank">Alpaca数据集(Alpaca)</a>. </li> <li><b>10万条问答对(100K QA pairs):</b> 借助GPT-4对精心打磨的指令进行标注,我们共获取到105,333条问答对。 </li> </ul> </div> ## 数据集概览 本数据集是[Align-Anything](https://github.com/PKU-Alignment/align-anything)的姊妹项目。 我们推出了一款高质量的指令遵循数据集,包含10万条问答条目,均由GPT-4完成标注与打磨。本数据集的提示词来源于多个公开数据集,包括[PKU-SafeRLHF问答数据集(PKU-SafeRLHF Dataset QA)](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-QA)、[对话摘要数据集(DialogSum)](https://huggingface.co/datasets/knkarthick/dialogsum)、[共情对话数据集(Empathetic Dataset)](https://ai.meta.com/research/publications/towards-empathetic-open-domain-conversation-models-a-new-benchmark-and-dataset)、[Alpaca数据集(Alpaca)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json)以及[野生指令数据集(InstructionWild)](https://github.com/XueFuzhao/InstructionWild)。每条提示词均由GPT-4在专家演示与特定准则的指导下完成打磨,随后由GPT-4对模型回复进行标注。这套全面且精细化的流程最终产出了这款高质量的指令遵循数据集。 ## 数据集对比 ### 详细结果 我们对自身的提示词分布进行了可视化,并与当前广泛使用的指令遵循数据集[Alpaca-52K数据集(Alpaca-52K)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json)进行了对比。本数据集覆盖了更广泛的提示词类型,涵盖文本摘要、情感分析等多种任务类型。 <div align="center"> <img src="vs.png" width="70%"/> </div> 我们分别使用Align-Anything-Instruction-100K(采样52K条数据)与Alpaca-52K数据集训练了多款基础模型,并在[Just-Eval评测基准(Just-Eval)](https://huggingface.co/datasets/re-align/just-eval-instruct)上对微调后的模型进行评估,从有用性(Helpfulness)、清晰性(Clarity)、事实性(Factuality)、深度(Depth)与交互性(Engagement)五个维度对模型回复进行打分。结果显示,两款模型在所有维度上均表现优异。 <div align="center"> <img src="performance.png" width="70%"/> </div> ## 评测细节 ### Just-Eval评测概览 [Just-Eval评测基准(Just-Eval)](https://huggingface.co/datasets/re-align/just-eval-instruct)包含多组提示词,可全面评估模型的指令遵循能力,涵盖[AlpacaEval评测集(AlpacaEval)](https://huggingface.co/datasets/tatsu-lab/alpaca_eval)、[LIMA测试集(LIMA-test)](https://huggingface.co/datasets/GAIR/lima/viewer/plain_text/test)、[MT-bench评测集(MT-bench)](https://huggingface.co/datasets/HuggingFaceH4/mt_bench_prompts)、[Anthropic红队测试数据集(Anthropic red-teaming)](https://huggingface.co/datasets/Anthropic/hh-rlhf/tree/main/red-team-attempts)以及[MaliciousInstruct数据集(MaliciousInstruct)](https://github.com/Princeton-SysML/Jailbreak_LLM/blob/main/data/MaliciousInstruct.txt)等多个评测源。 我们遵循该评测基准在[此处](https://allenai.github.io/re-align/just_eval.html)公布的指南,选取了其中800条聚焦于问题解决能力的提示词,暂不考虑回复的安全性。 ### 评测准则 我们采用与[JustEval评测基准(JustEval Benchmark)](https://allenai.github.io/re-align/index.html)一致的评测准则,具体如下: <div class="col-md-12"> <ul> <li><b>有用性(Helpfulness):</b> 评估回复对给定查询或问题的回应效果与辅助程度。优质回复应具备高度相关性与实用性。</li> <li><b>清晰性(Clarity):</b> 评估回复的逻辑流程与连贯性。优质回复结构清晰,观点表达明确且连贯。</li> <li><b>事实性(Factuality):</b> 评估回复中呈现信息的准确性。优质回复应符合事实,无错误信息。</li> <li><b>深度(Depth):</b> 评估回复的详尽程度与细节丰富度。优质回复应具备全面性与深度。</li> <li><b>交互性(Engagement):</b> 评估回复在对话场景中的吸引力与自然度。优质回复应具备良好的交互感,语气贴近人类表达。</li> </ul> </div> ## 使用方法 若需加载本数据集,请使用`load_dataset()`函数,示例代码如下: python from datasets import load_dataset dataset = load_dataset("PKU-Alignment/Align-Anything-Instruction-100K")
提供机构:
maas
创建时间:
2025-02-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作