pengxiang/OwLore_Dataset
收藏Hugging Face2024-05-28 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/pengxiang/OwLore_Dataset
下载链接
链接失效反馈官方服务:
资源简介:
# OwLore
The datasets used in the paper, such as `mmlu_auxiliary_train`, `boolq`, `gsm8k`, etc.
Outlier-weighed Layerwise Sampled Low-Rank Projection (OwLore) is a novel memory-efficient LLM fine-tuning approach, enhances fine-tuning performance by using layerwise sampling and gradient low-rank training.
<div align="center">
<img src="https://github.com/pixeli99/OwLore/assets/46072190/fb60054b-7af1-4aa0-9cc8-329c0f96d093" alt="Image 2" style="width: 900px; margin: 0 auto;">
</div>
## Abstract
The rapid advancements in Large Language Models (LLMs) have revolutionized various natural language processing tasks. However, the substantial size of LLMs presents significant challenges in training or fine-tuning. While parameter-efficient approaches such as low-rank adaptation (LoRA) have gained popularity, they often compromise performance compared to full-rank fine-tuning. In this paper, we propose Outlier-weighed Layerwise Sampled Low-Rank Projection (OwLore), a new memory-efficient fine-tuning approach, inspired by the layerwise outlier distribution of LLMs, which dynamically samples pre-trained layers to fine-tune instead of adding additional adaptors. We first interpret the outlier phenomenon through the lens of Heavy-Tailed Self-Regularization theory (HT-SR), discovering that layers with more outliers tend to be more heavy-tailed and consequently better trained. Inspired by this finding, OwLore strategically assigns higher sampling probabilities to layers with more outliers to better leverage the knowledge stored in pre-trained LLMs. To further mitigate the memory demands of fine-tuning, we integrate gradient low-rank projection into our approach, which facilitates each layer to be efficiently trained in a low-rank manner. By incorporating the efficient characteristics of low-rank and optimal layerwise sampling, OwLore significantly improves the memory-performance trade-off in LLM pruning. Our extensive experiments across various architectures, including LLaMa2, LLaMa3, and Mistral, demonstrate that OwLore consistently outperforms baseline approaches, including full fine-tuning. Specifically, it achieves up to a 1.1% average accuracy gain on the Commonsense Reasoning benchmark, a 3.0% improvement on MMLU, and a notable 10% boost on MT-Bench, while being more memory efficient. OwLore allows us to fine-tune LLaMa2-7B with only 21GB of memory.
提供机构:
pengxiang
原始信息汇总
OwLore 数据集
概述
OwLore 数据集用于论文中,包括 mmlu_auxiliary_train、boolq、gsm8k 等多个数据集。
方法介绍
OwLore 是一种新颖的内存高效 LLM 微调方法,通过使用层级采样和梯度低秩训练来增强微调性能。该方法受 LLM 层级异常分布的启发,动态采样预训练层进行微调,而不是添加额外的适配器。
理论基础
OwLore 方法基于 Heavy-Tailed Self-Regularization 理论 (HT-SR),发现具有更多异常值的层往往具有更重的尾部,因此训练效果更好。受此启发,OwLore 策略性地为具有更多异常值的层分配更高的采样概率,以更好地利用预训练 LLM 中存储的知识。
实验结果
OwLore 在多种架构上进行了广泛实验,包括 LLaMa2、LLaMa3 和 Mistral,结果表明 OwLore 始终优于基线方法,包括完全微调。具体来说,它在 Commonsense Reasoning 基准测试中平均准确率提高了 1.1%,在 MMLU 上提高了 3.0%,在 MT-Bench 上提高了 10%,同时更加内存高效。OwLore 允许我们仅使用 21GB 内存微调 LLaMa2-7B。



