Align-Anything-Instruction-100K
收藏魔搭社区2025-12-04 更新2025-02-08 收录
下载链接:
https://modelscope.cn/datasets/PKU-Alignment/Align-Anything-Instruction-100K
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for Align-Anything-Instruction-100K
[[🏠 Homepage](https://github.com/PKU-Alignment/align-anything)]
[[🤗 Instruction-Dataset-100K(en)](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-Instruction-100K)]
[[🤗 Instruction-Dataset-100K(zh)](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-Instruction-100K-zh)]
[[🤗 Align-Anything Datasets](https://huggingface.co/datasets/PKU-Alignment/align-anything/)]
## Highlights
<div class="col-md-12">
<ul>
<li><b>Data sources:</b>
<a href="https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-QA" target="_blank">PKU-SafeRLHF QA</a> ,
<a href="https://huggingface.co/datasets/knkarthick/dialogsum" target="_blank">DialogSum</a>,
<a href="https://ai.meta.com/research/publications/towards-empathetic-open-domain-conversation-models-a-new-benchmark-and-dataset" target="_blank">Empathetic</a>,
<a href="https://github.com/XueFuzhao/InstructionWild" target="_blank">Instruction-Wild</a>,
and <a href="https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json" target="_blank">Alpaca</a>. </li>
<li><b>100K QA pairs:</b> By leveraging GPT-4 to annotate meticulously refined instructions, we obtain 105,333 QA pairs. </li>
</ul>
</div>
## Dataset Summary
This dataset is a sibling project of [Align-Anything](https://github.com/PKU-Alignment/align-anything).
We provide a high-quality instruction-following dataset consisting of 100K question-answer entries, annotated and refined by GPT-4. Our prompts are sourced from multiple public datasets such as [PKU-SafeRLHF Dataset QA](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-QA), [DialogSum](https://huggingface.co/datasets/knkarthick/dialogsum), [Empathetic Dataset](https://ai.meta.com/research/publications/towards-empathetic-open-domain-conversation-models-a-new-benchmark-and-dataset), [Alpaca](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json), and [InstructionWild](https://github.com/XueFuzhao/InstructionWild). Each prompt is refined by GPT-4 under expert demonstration and specific guidelines, followed by GPT-4's annotation of the responses. This comprehensive and fine-grained pipeline results in a high-quality instruction-following dataset.
## Dataset Comparison
### Detailed Results
We visualize our prompt distribution and compared it with the widely-used instruction-following dataset, [Alpaca-52K](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json). Our dataset covers a broader range of prompt types and includes various task types such as text summarization, sentiment analysis, etc.
<div align="center">
<img src="vs.png" width="70%"/>
</div>
We train several base models using both Align-Anything-Instruction-100K (sampled 52K) and Alpaca-52K. We evaluate the fine-tuned models on the [Just-Eval](https://huggingface.co/datasets/re-align/just-eval-instruct) benchmark, assessing the responses across five dimensions: helpfulness, clarity, factuality, depth, and engagement. The models demonstrate excellent performance in all dimensions.
<div align="center">
<img src="performance.png" width="70%"/>
</div>
## Evaluation Details
### Just-Eval Overview
[Just-Eval](https://huggingface.co/datasets/re-align/just-eval-instruct) covers multiple prompts that fully assess the model's instruction-following capabilities, such as [AlpacaEval](https://huggingface.co/datasets/tatsu-lab/alpaca_eval), [LIMA-test](https://huggingface.co/datasets/GAIR/lima/viewer/plain_text/test), [MT-bench](https://huggingface.co/datasets/HuggingFaceH4/mt_bench_prompts), [Anthropic red-teaming](https://huggingface.co/datasets/Anthropic/hh-rlhf/tree/main/red-team-attempts), and [MaliciousInstruct](https://github.com/Princeton-SysML/Jailbreak_LLM/blob/main/data/MaliciousInstruct.txt).
We utilize the 800 instructions that focus on problem-solving tests without considering the safety of responses, following the benchmark guidelines outlined [here](https://allenai.github.io/re-align/just_eval.html).
### Evaluation Criterias
We adopt the same evaluation criteria as the [JustEval Benchmark](https://allenai.github.io/re-align/index.html), detailed as follows:
<div class="col-md-12">
<ul>
<li><b>Helpfulness:</b> Evaluates how well the response addresses the given query or question and assists the user. A good response is highly relevant and helpful.</li>
<li><b>Clarity:</b> Assesses the logical flow and coherence of the response. A good response is well-structured, with ideas presented clearly and coherently.</li>
<li><b>Factuality:</b> Assesses the accuracy of the information presented in the response. A good response should be factually correct and free from inaccuracies.</li>
<li><b>Depth:</b> Evaluates the thoroughness and detail of the response. A good response should be comprehensive and in-depth.</li>
<li><b>Engagement:</b> Assesses how engaging and natural the response sounds in a conversational context. A good response should feel engaging and have a human-like tone.</li>
</ul>
</div>
## Usage
To load our dataset, use the `load_dataset()` function as follows:
```python
from datasets import load_dataset
dataset = load_dataset("PKU-Alignment/Align-Anything-Instruction-100K")
```
# 数据集卡片:Align-Anything-Instruction-100K
[[🏠 主页(Homepage)](https://github.com/PKU-Alignment/align-anything)]
[[🤗 Hugging Face 指令数据集100K(英文版,Instruction-Dataset-100K(en))](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-Instruction-100K)]
[[🤗 Hugging Face 指令数据集100K(中文版,Instruction-Dataset-100K(zh))](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-Instruction-100K-zh)]
[[🤗 Hugging Face Align-Anything 数据集合集(Align-Anything Datasets)](https://huggingface.co/datasets/PKU-Alignment/align-anything/)]
## 核心亮点
<div class="col-md-12">
<ul>
<li><b>数据来源(Data sources):</b>
<a href="https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-QA" target="_blank">PKU-SafeRLHF问答数据集(PKU-SafeRLHF QA)</a> ,
<a href="https://huggingface.co/datasets/knkarthick/dialogsum" target="_blank">对话摘要数据集(DialogSum)</a>,
<a href="https://ai.meta.com/research/publications/towards-empathetic-open-domain-conversation-models-a-new-benchmark-and-dataset" target="_blank">共情对话数据集(Empathetic)</a>,
<a href="https://github.com/XueFuzhao/InstructionWild" target="_blank">野生指令数据集(Instruction-Wild)</a>,
and <a href="https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json" target="_blank">Alpaca数据集(Alpaca)</a>. </li>
<li><b>10万条问答对(100K QA pairs):</b> 借助GPT-4对精心打磨的指令进行标注,我们共获取到105,333条问答对。 </li>
</ul>
</div>
## 数据集概览
本数据集是[Align-Anything](https://github.com/PKU-Alignment/align-anything)的姊妹项目。
我们推出了一款高质量的指令遵循数据集,包含10万条问答条目,均由GPT-4完成标注与打磨。本数据集的提示词来源于多个公开数据集,包括[PKU-SafeRLHF问答数据集(PKU-SafeRLHF Dataset QA)](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-QA)、[对话摘要数据集(DialogSum)](https://huggingface.co/datasets/knkarthick/dialogsum)、[共情对话数据集(Empathetic Dataset)](https://ai.meta.com/research/publications/towards-empathetic-open-domain-conversation-models-a-new-benchmark-and-dataset)、[Alpaca数据集(Alpaca)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json)以及[野生指令数据集(InstructionWild)](https://github.com/XueFuzhao/InstructionWild)。每条提示词均由GPT-4在专家演示与特定准则的指导下完成打磨,随后由GPT-4对模型回复进行标注。这套全面且精细化的流程最终产出了这款高质量的指令遵循数据集。
## 数据集对比
### 详细结果
我们对自身的提示词分布进行了可视化,并与当前广泛使用的指令遵循数据集[Alpaca-52K数据集(Alpaca-52K)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json)进行了对比。本数据集覆盖了更广泛的提示词类型,涵盖文本摘要、情感分析等多种任务类型。
<div align="center">
<img src="vs.png" width="70%"/>
</div>
我们分别使用Align-Anything-Instruction-100K(采样52K条数据)与Alpaca-52K数据集训练了多款基础模型,并在[Just-Eval评测基准(Just-Eval)](https://huggingface.co/datasets/re-align/just-eval-instruct)上对微调后的模型进行评估,从有用性(Helpfulness)、清晰性(Clarity)、事实性(Factuality)、深度(Depth)与交互性(Engagement)五个维度对模型回复进行打分。结果显示,两款模型在所有维度上均表现优异。
<div align="center">
<img src="performance.png" width="70%"/>
</div>
## 评测细节
### Just-Eval评测概览
[Just-Eval评测基准(Just-Eval)](https://huggingface.co/datasets/re-align/just-eval-instruct)包含多组提示词,可全面评估模型的指令遵循能力,涵盖[AlpacaEval评测集(AlpacaEval)](https://huggingface.co/datasets/tatsu-lab/alpaca_eval)、[LIMA测试集(LIMA-test)](https://huggingface.co/datasets/GAIR/lima/viewer/plain_text/test)、[MT-bench评测集(MT-bench)](https://huggingface.co/datasets/HuggingFaceH4/mt_bench_prompts)、[Anthropic红队测试数据集(Anthropic red-teaming)](https://huggingface.co/datasets/Anthropic/hh-rlhf/tree/main/red-team-attempts)以及[MaliciousInstruct数据集(MaliciousInstruct)](https://github.com/Princeton-SysML/Jailbreak_LLM/blob/main/data/MaliciousInstruct.txt)等多个评测源。
我们遵循该评测基准在[此处](https://allenai.github.io/re-align/just_eval.html)公布的指南,选取了其中800条聚焦于问题解决能力的提示词,暂不考虑回复的安全性。
### 评测准则
我们采用与[JustEval评测基准(JustEval Benchmark)](https://allenai.github.io/re-align/index.html)一致的评测准则,具体如下:
<div class="col-md-12">
<ul>
<li><b>有用性(Helpfulness):</b> 评估回复对给定查询或问题的回应效果与辅助程度。优质回复应具备高度相关性与实用性。</li>
<li><b>清晰性(Clarity):</b> 评估回复的逻辑流程与连贯性。优质回复结构清晰,观点表达明确且连贯。</li>
<li><b>事实性(Factuality):</b> 评估回复中呈现信息的准确性。优质回复应符合事实,无错误信息。</li>
<li><b>深度(Depth):</b> 评估回复的详尽程度与细节丰富度。优质回复应具备全面性与深度。</li>
<li><b>交互性(Engagement):</b> 评估回复在对话场景中的吸引力与自然度。优质回复应具备良好的交互感,语气贴近人类表达。</li>
</ul>
</div>
## 使用方法
若需加载本数据集,请使用`load_dataset()`函数,示例代码如下:
python
from datasets import load_dataset
dataset = load_dataset("PKU-Alignment/Align-Anything-Instruction-100K")
提供机构:
maas
创建时间:
2025-02-07



