GEM025/GEM_Arsenal
收藏Hugging Face2025-03-15 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/GEM025/GEM_Arsenal
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
---
# GEM_Testing_Arsenal
Welcome to ***GEM_Testing_Arsenal***, where groundbreaking research meets practical power! This repository unveils a novel architecture for On-Device Language Models (ODLMs), straight from our paper, ["Fragile Mastery: are domain-specific trade-offs undermining On-Device Language Models?"](./link_to_be_insterted). With just a few lines of code, our custom `gem_trainer.py` script lets you train ODLMs that are more accurate than ever, tracking accuracy and loss as you go.
---
## Highlights:
- **Next-Level ODLMs**: Boosts accuracy with a new architecture from our research.
- **Easy Training**: Call run_gem_pipeline to train on your dataset in minutes.
- **Live Metrics**: Get accuracy and loss results as training unfolds.
- **Flexible Design**: Works with any compatible dataset—plug and play!
---
## Prerequisites:
To dive in, you’ll need:
- **Python** `3.8+`
- Required libraries (go through [quick start](#quick-start) below 👇)
- **Git** *(to clone the repo)*
---
## Quick Start:
1. **Clone the repository:**
```bash
git clone https://huggingface.co/datasets/GEM025/GEM_Arsenal
```
2. **Install Dependencies:**
```pwsh
pip install -r requirements.txt
```
3. **Train Your Model:**
Create a new python file and execute the code like:
```python
from datasets import load_dataset
from gem_trainer import run_gem_pipeline
# Load a dataset (e.g., Banking77) {just replace the dataset here.}
dataset = load_dataset("banking77")
# Train the ODLM
results = run_gem_pipeline(dataset, num_classes=77)
print(results) # See accuracy and loss
```
> ***Boom—your ODLM is training with boosted accuracy!***
---
## Running on Colab/Kaggle?
Well it's pretty similar to the local run.
```python
""" This is very recommended to run for clean ouput during trains...
import warnings
warnings.filterwarnings('ignore')
"""
#@ Step 1: Clone the github repo
! git clone https://huggingface.co/datasets/GEM025/GEM_Arsenal
#@ Step 2: Install all requirements
!pip install -r /content/GEM_Arsenal/requirements.txt #! For colab
"""
@! For kaggle:
!pip install -r /kaggle/working/GEM_Arsenal/requirements.txt
"""
#@ Step 3: Add repo to path
import sys
sys.path.append('/content/GEM_Arsenal') #! Or /kaggle/working/GEM_Arsenal (for kaggle)
#@ Step 4: Import and run function
from gem_trainer import run_gem_pipeline
from datasets import load_dataset
#@ Rest of the code as above
dataset = load_dataset("imdb")
result = run_gem_pipeline(dataset, num_classes=2, num_epochs=2)
print(result)
```
---
## Customizing Training:
`run_gem_pipeline` keeps it simple, but you can tweak it! Dive into [`gem_trainer.py`](./gem_trainer.py) to adjust epochs, batch size, or other settings to fit your needs.
---
## Contributing 💓
Got ideas to make this even better? We’re all ears!
- Fork the repo.
- Branch off (`git checkout -b your-feature`).
- Submit a pull request with your magic.
---
许可证:Apache-2.0
# GEM_Testing_Arsenal
欢迎来到**GEM_Testing_Arsenal**,开创性研究与实用效能在此交融!本仓库源自我们的学术论文《脆弱的掌控:领域特定权衡是否正在削弱端侧语言模型?》(英文原名:*Fragile Mastery: are domain-specific trade-offs undermining On-Device Language Models?*,链接待补充),公开了一种全新的端侧语言模型(On-Device Language Models, ODLMs)架构。仅需数行代码,我们定制的`gem_trainer.py`脚本即可助力你训练精度表现更优异的端侧语言模型,并实时追踪训练过程中的精度与损失变化。
---
## 核心亮点:
- **进阶端侧语言模型**:基于我们的研究提出全新架构,有效提升模型精度。
- **便捷训练流程**:调用`run_gem_pipeline`即可在数分钟内基于自定义数据集完成模型训练。
- **实时指标监控**:训练过程中即可同步获取精度与损失结果。
- **灵活适配设计**:兼容所有合规数据集,即插即用!
---
## 前置依赖:
如需使用本项目,你需准备:
- **Python** 3.8及以上版本
- 所需依赖库(详见下文[快速上手](#快速上手)章节 👇)
- **Git**(用于克隆本仓库)
---
## 快速上手:
1. **克隆本仓库:**
bash
git clone https://huggingface.co/datasets/GEM025/GEM_Arsenal
2. **安装依赖:**
pwsh
pip install -r requirements.txt
3. **训练你的模型:**
创建新的Python文件并执行如下代码:
python
from datasets import load_dataset
from gem_trainer import run_gem_pipeline
# 加载数据集(以Banking77为例,只需替换此处即可更换数据集)
dataset = load_dataset("banking77")
# 训练端侧语言模型
results = run_gem_pipeline(dataset, num_classes=77)
print(results) # 查看精度与损失结果
> **大功告成——你的高精度端侧语言模型已开始训练!**
---
## 在Colab/Kaggle上运行?
流程与本地运行基本一致。
python
""" 强烈建议在训练期间添加以下代码以获得整洁的输出...
import warnings
warnings.filterwarnings('ignore')
"""
#@ 步骤1:克隆GitHub仓库
! git clone https://huggingface.co/datasets/GEM025/GEM_Arsenal
#@ 步骤2:安装所有依赖项
!pip install -r /content/GEM_Arsenal/requirements.txt # ! 适用于Colab环境
"""
@! 适用于Kaggle的命令:
!pip install -r /kaggle/working/GEM_Arsenal/requirements.txt
"""
#@ 步骤3:将仓库路径添加至系统路径
import sys
sys.path.append('/content/GEM_Arsenal') # ! Kaggle环境请替换为 /kaggle/working/GEM_Arsenal
#@ 步骤4:导入并运行函数
from gem_trainer import run_gem_pipeline
from datasets import load_dataset
#@ 其余代码与上文一致
dataset = load_dataset("imdb")
result = run_gem_pipeline(dataset, num_classes=2, num_epochs=2)
print(result)
---
## 自定义训练参数
`run_gem_pipeline`已简化了训练流程,但你仍可根据需求进行调整!可直接编辑[`gem_trainer.py`](./gem_trainer.py)文件,修改训练轮次、批次大小等参数以适配你的任务需求。
---
## 贡献指南 💓
如有任何优化想法,我们竭诚欢迎!
- Fork本仓库
- 创建功能分支(`git checkout -b your-feature`)
- 提交包含你的改进内容的拉取请求
提供机构:
GEM025



