GEM025/GEM_Arsenal

Name: GEM025/GEM_Arsenal
Creator: GEM025
Published: 2025-03-15 17:02:36
License: 暂无描述

Hugging Face2025-03-15 更新2025-11-01 收录

下载链接：

https://hf-mirror.com/datasets/GEM025/GEM_Arsenal

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 --- # GEM_Testing_Arsenal Welcome to ***GEM_Testing_Arsenal***, where groundbreaking research meets practical power! This repository unveils a novel architecture for On-Device Language Models (ODLMs), straight from our paper, ["Fragile Mastery: are domain-specific trade-offs undermining On-Device Language Models?"](./link_to_be_insterted). With just a few lines of code, our custom `gem_trainer.py` script lets you train ODLMs that are more accurate than ever, tracking accuracy and loss as you go. --- ## Highlights: - **Next-Level ODLMs**: Boosts accuracy with a new architecture from our research. - **Easy Training**: Call run_gem_pipeline to train on your dataset in minutes. - **Live Metrics**: Get accuracy and loss results as training unfolds. - **Flexible Design**: Works with any compatible dataset—plug and play! --- ## Prerequisites: To dive in, you’ll need: - **Python** `3.8+` - Required libraries (go through [quick start](#quick-start) below 👇) - **Git** *(to clone the repo)* --- ## Quick Start: 1. **Clone the repository:** ```bash git clone https://huggingface.co/datasets/GEM025/GEM_Arsenal ``` 2. **Install Dependencies:** ```pwsh pip install -r requirements.txt ``` 3. **Train Your Model:** Create a new python file and execute the code like: ```python from datasets import load_dataset from gem_trainer import run_gem_pipeline # Load a dataset (e.g., Banking77) {just replace the dataset here.} dataset = load_dataset("banking77") # Train the ODLM results = run_gem_pipeline(dataset, num_classes=77) print(results) # See accuracy and loss ``` > ***Boom—your ODLM is training with boosted accuracy!*** --- ## Running on Colab/Kaggle? Well it's pretty similar to the local run. ```python """ This is very recommended to run for clean ouput during trains... import warnings warnings.filterwarnings('ignore') """ #@ Step 1: Clone the github repo ! git clone https://huggingface.co/datasets/GEM025/GEM_Arsenal #@ Step 2: Install all requirements !pip install -r /content/GEM_Arsenal/requirements.txt #! For colab """ @! For kaggle: !pip install -r /kaggle/working/GEM_Arsenal/requirements.txt """ #@ Step 3: Add repo to path import sys sys.path.append('/content/GEM_Arsenal') #! Or /kaggle/working/GEM_Arsenal (for kaggle) #@ Step 4: Import and run function from gem_trainer import run_gem_pipeline from datasets import load_dataset #@ Rest of the code as above dataset = load_dataset("imdb") result = run_gem_pipeline(dataset, num_classes=2, num_epochs=2) print(result) ``` --- ## Customizing Training: `run_gem_pipeline` keeps it simple, but you can tweak it! Dive into [`gem_trainer.py`](./gem_trainer.py) to adjust epochs, batch size, or other settings to fit your needs. --- ## Contributing 💓 Got ideas to make this even better? We’re all ears! - Fork the repo. - Branch off (`git checkout -b your-feature`). - Submit a pull request with your magic. ---

许可证：Apache-2.0 # GEM_Testing_Arsenal 欢迎来到**GEM_Testing_Arsenal**，开创性研究与实用效能在此交融！本仓库源自我们的学术论文《脆弱的掌控：领域特定权衡是否正在削弱端侧语言模型？》（英文原名：*Fragile Mastery: are domain-specific trade-offs undermining On-Device Language Models?*，链接待补充），公开了一种全新的端侧语言模型（On-Device Language Models, ODLMs）架构。仅需数行代码，我们定制的`gem_trainer.py`脚本即可助力你训练精度表现更优异的端侧语言模型，并实时追踪训练过程中的精度与损失变化。 --- ## 核心亮点： - **进阶端侧语言模型**：基于我们的研究提出全新架构，有效提升模型精度。 - **便捷训练流程**：调用`run_gem_pipeline`即可在数分钟内基于自定义数据集完成模型训练。 - **实时指标监控**：训练过程中即可同步获取精度与损失结果。 - **灵活适配设计**：兼容所有合规数据集，即插即用！ --- ## 前置依赖：如需使用本项目，你需准备： - **Python** 3.8及以上版本 - 所需依赖库（详见下文[快速上手](#快速上手)章节 👇） - **Git**（用于克隆本仓库） --- ## 快速上手： 1. **克隆本仓库：** bash git clone https://huggingface.co/datasets/GEM025/GEM_Arsenal 2. **安装依赖：** pwsh pip install -r requirements.txt 3. **训练你的模型：** 创建新的Python文件并执行如下代码： python from datasets import load_dataset from gem_trainer import run_gem_pipeline # 加载数据集（以Banking77为例，只需替换此处即可更换数据集） dataset = load_dataset("banking77") # 训练端侧语言模型 results = run_gem_pipeline(dataset, num_classes=77) print(results) # 查看精度与损失结果 > **大功告成——你的高精度端侧语言模型已开始训练！** --- ## 在Colab/Kaggle上运行？流程与本地运行基本一致。 python """ 强烈建议在训练期间添加以下代码以获得整洁的输出... import warnings warnings.filterwarnings('ignore') """ #@ 步骤1：克隆GitHub仓库 ! git clone https://huggingface.co/datasets/GEM025/GEM_Arsenal #@ 步骤2：安装所有依赖项 !pip install -r /content/GEM_Arsenal/requirements.txt # ! 适用于Colab环境 """ @! 适用于Kaggle的命令： !pip install -r /kaggle/working/GEM_Arsenal/requirements.txt """ #@ 步骤3：将仓库路径添加至系统路径 import sys sys.path.append('/content/GEM_Arsenal') # ! Kaggle环境请替换为 /kaggle/working/GEM_Arsenal #@ 步骤4：导入并运行函数 from gem_trainer import run_gem_pipeline from datasets import load_dataset #@ 其余代码与上文一致 dataset = load_dataset("imdb") result = run_gem_pipeline(dataset, num_classes=2, num_epochs=2) print(result) --- ## 自定义训练参数 `run_gem_pipeline`已简化了训练流程，但你仍可根据需求进行调整！可直接编辑[`gem_trainer.py`](./gem_trainer.py)文件，修改训练轮次、批次大小等参数以适配你的任务需求。 --- ## 贡献指南 💓 如有任何优化想法，我们竭诚欢迎！ - Fork本仓库 - 创建功能分支（`git checkout -b your-feature`） - 提交包含你的改进内容的拉取请求

提供机构：

GEM025

5,000+

优质数据集

54 个

任务类型

进入经典数据集