five

Abbey4799/Complex-Instructions-DPO

收藏
Hugging Face2024-04-23 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/Abbey4799/Complex-Instructions-DPO
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 --- # From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models [![Github](https://img.shields.io/static/v1?logo=github&style=flat&color=pink&label=github&message=meowpass/FollowComplexInstruction)]([https://github.com/YJiangcm/FollowBench](https://github.com/meowpass/FollowComplexInstruction)) [![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97-huggingface-yellow)](https://huggingface.co/datasets/Abbey4799/Complex-Instructions-DPO) Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models". We systematically study **how to enhance the ability of LLMs to follow complex instructions**, addressing the following research questions: - ***What training data* is effective in enhancing complex constraint-following abilities?** - Training with *compositional data* can generally enhance models’ ability to follow complex instructions. - Training with *atomic data* (mostly with 1 constraint) can *generally decrease performance* compared to the backbone model for instructions with more than 1 constraint. - Training with compositional data (instructions with multi-constraints) can *better generalize to lower-level complex instructions* (instructions with fewer constraints). - Training with compositional data can even *generalize to the compositions of out-of-domain constraints*. - ***How to obtain* high-quality compositional data?** - The outputs *from weaker LLMs then refined by advanced LLMs* (Discrimination) significantly outperform the outputs generated by advanced LLMs directly (Generation). - ***How to effectively utilize* the data obtained through the discrimination-based method?** - We introduce *a reinforcement learning fine-tuning (RLFT) based method* that leverages both positive and negative samples to improve complex instruction following. - We conduct extensive experiments to prove the effectiveness of our methods in terms of *overall performance, training efficiency, and generalization abilities under four settings*. <p align="center"> <br> <img src="https://github-production-user-asset-6210df.s3.amazonaws.com/56729976/324175306-debacf40-1858-402b-b94a-700e7b7ad20b.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20240423%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240423T122931Z&X-Amz-Expires=300&X-Amz-Signature=25944f590d3b8f8ec48b481ce08a14b32aa8d841deb4b8f6733073c0124406ff&X-Amz-SignedHeaders=host&actor_id=50980570&key_id=0&repo_id=788879431" width="800"/> <br> </p> We offer training data for our contrastive method, which includes: - `prompt`: Synthesized complex instructions with 3 to 5 constraints from [IFEval](https://github.com/google-research/google-research/tree/master/instruction_following_eval). - `constraint`: The constraint not followed by the `reject` output. - `reject`: Model output that doesn't meet some constraints in the complex instructions. - `chosen`: Model output that meets all constraints in the complex instructions after `Teacher Correction` as proposed in our paper. We introduce a *reinforcement learning fine-tuning (RLFT) based method* that leverages both positive and negative samples to improve *complex instruction following*. We conduct extensive experiments to prove the effectiveness of our methods in terms of *overall performance*, *training efficiency*, and *generalization abilities under four settings*.
提供机构:
Abbey4799
原始信息汇总

数据集概述

数据集名称

  • Complex-Instructions-DPO

数据集来源

  • 来自论文 "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models" 的官方实现。

数据集内容

  • prompt: 包含3至5个约束的合成复杂指令,源自IFEval
  • constraint: 未被reject输出满足的约束。
  • reject: 未能满足复杂指令中某些约束的模型输出。
  • chosen: 经过Teacher Correction后满足所有约束的模型输出,该方法在论文中提出。

数据集用途

  • 用于研究如何增强大型语言模型遵循复杂指令的能力,特别是通过强化学习微调(RLFT)方法,利用正负样本来提升复杂指令的遵循能力。

数据集研究成果

  • 通过广泛实验验证了方法的有效性,包括整体性能、训练效率和在四种设置下的泛化能力。

数据集许可证

  • Apache-2.0
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作