pengxiang/OwLore_Dataset

Name: pengxiang/OwLore_Dataset
Creator: pengxiang
Published: 2024-05-28 18:23:25
License: 暂无描述

Hugging Face2024-05-28 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/pengxiang/OwLore_Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

# OwLore The datasets used in the paper, such as `mmlu_auxiliary_train`, `boolq`, `gsm8k`, etc. Outlier-weighed Layerwise Sampled Low-Rank Projection (OwLore) is a novel memory-efficient LLM fine-tuning approach, enhances fine-tuning performance by using layerwise sampling and gradient low-rank training. <div align="center"> <img src="https://github.com/pixeli99/OwLore/assets/46072190/fb60054b-7af1-4aa0-9cc8-329c0f96d093" alt="Image 2" style="width: 900px; margin: 0 auto;"> </div> ## Abstract The rapid advancements in Large Language Models (LLMs) have revolutionized various natural language processing tasks. However, the substantial size of LLMs presents significant challenges in training or fine-tuning. While parameter-efficient approaches such as low-rank adaptation (LoRA) have gained popularity, they often compromise performance compared to full-rank fine-tuning. In this paper, we propose Outlier-weighed Layerwise Sampled Low-Rank Projection (OwLore), a new memory-efficient fine-tuning approach, inspired by the layerwise outlier distribution of LLMs, which dynamically samples pre-trained layers to fine-tune instead of adding additional adaptors. We first interpret the outlier phenomenon through the lens of Heavy-Tailed Self-Regularization theory (HT-SR), discovering that layers with more outliers tend to be more heavy-tailed and consequently better trained. Inspired by this finding, OwLore strategically assigns higher sampling probabilities to layers with more outliers to better leverage the knowledge stored in pre-trained LLMs. To further mitigate the memory demands of fine-tuning, we integrate gradient low-rank projection into our approach, which facilitates each layer to be efficiently trained in a low-rank manner. By incorporating the efficient characteristics of low-rank and optimal layerwise sampling, OwLore significantly improves the memory-performance trade-off in LLM pruning. Our extensive experiments across various architectures, including LLaMa2, LLaMa3, and Mistral, demonstrate that OwLore consistently outperforms baseline approaches, including full fine-tuning. Specifically, it achieves up to a 1.1% average accuracy gain on the Commonsense Reasoning benchmark, a 3.0% improvement on MMLU, and a notable 10% boost on MT-Bench, while being more memory efficient. OwLore allows us to fine-tune LLaMa2-7B with only 21GB of memory.

提供机构：

pengxiang

原始信息汇总

OwLore 数据集

概述

OwLore 数据集用于论文中，包括 mmlu_auxiliary_train、boolq、gsm8k 等多个数据集。

方法介绍

OwLore 是一种新颖的内存高效 LLM 微调方法，通过使用层级采样和梯度低秩训练来增强微调性能。该方法受 LLM 层级异常分布的启发，动态采样预训练层进行微调，而不是添加额外的适配器。

理论基础

OwLore 方法基于 Heavy-Tailed Self-Regularization 理论 (HT-SR)，发现具有更多异常值的层往往具有更重的尾部，因此训练效果更好。受此启发，OwLore 策略性地为具有更多异常值的层分配更高的采样概率，以更好地利用预训练 LLM 中存储的知识。

实验结果

OwLore 在多种架构上进行了广泛实验，包括 LLaMa2、LLaMa3 和 Mistral，结果表明 OwLore 始终优于基线方法，包括完全微调。具体来说，它在 Commonsense Reasoning 基准测试中平均准确率提高了 1.1%，在 MMLU 上提高了 3.0%，在 MT-Bench 上提高了 10%，同时更加内存高效。OwLore 允许我们仅使用 21GB 内存微调 LLaMa2-7B。

5,000+

优质数据集

54 个

任务类型

进入经典数据集