five

GEM025/GEM_Arsenal

收藏
Hugging Face2025-03-15 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/GEM025/GEM_Arsenal
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 --- # GEM_Testing_Arsenal Welcome to ***GEM_Testing_Arsenal***, where groundbreaking research meets practical power! This repository unveils a novel architecture for On-Device Language Models (ODLMs), straight from our paper, ["Fragile Mastery: are domain-specific trade-offs undermining On-Device Language Models?"](./link_to_be_insterted). With just a few lines of code, our custom `gem_trainer.py` script lets you train ODLMs that are more accurate than ever, tracking accuracy and loss as you go. --- ## Highlights: - **Next-Level ODLMs**: Boosts accuracy with a new architecture from our research. - **Easy Training**: Call run_gem_pipeline to train on your dataset in minutes. - **Live Metrics**: Get accuracy and loss results as training unfolds. - **Flexible Design**: Works with any compatible dataset—plug and play! --- ## Prerequisites: To dive in, you’ll need: - **Python** `3.8+` - Required libraries (go through [quick start](#quick-start) below 👇) - **Git** *(to clone the repo)* --- ## Quick Start: 1. **Clone the repository:** ```bash git clone https://huggingface.co/datasets/GEM025/GEM_Arsenal ``` 2. **Install Dependencies:** ```pwsh pip install -r requirements.txt ``` 3. **Train Your Model:** Create a new python file and execute the code like: ```python from datasets import load_dataset from gem_trainer import run_gem_pipeline # Load a dataset (e.g., Banking77) {just replace the dataset here.} dataset = load_dataset("banking77") # Train the ODLM results = run_gem_pipeline(dataset, num_classes=77) print(results) # See accuracy and loss ``` > ***Boom—your ODLM is training with boosted accuracy!*** --- ## Running on Colab/Kaggle? Well it's pretty similar to the local run. ```python """ This is very recommended to run for clean ouput during trains... import warnings warnings.filterwarnings('ignore') """ #@ Step 1: Clone the github repo ! git clone https://huggingface.co/datasets/GEM025/GEM_Arsenal #@ Step 2: Install all requirements !pip install -r /content/GEM_Arsenal/requirements.txt #! For colab """ @! For kaggle: !pip install -r /kaggle/working/GEM_Arsenal/requirements.txt """ #@ Step 3: Add repo to path import sys sys.path.append('/content/GEM_Arsenal') #! Or /kaggle/working/GEM_Arsenal (for kaggle) #@ Step 4: Import and run function from gem_trainer import run_gem_pipeline from datasets import load_dataset #@ Rest of the code as above dataset = load_dataset("imdb") result = run_gem_pipeline(dataset, num_classes=2, num_epochs=2) print(result) ``` --- ## Customizing Training: `run_gem_pipeline` keeps it simple, but you can tweak it! Dive into [`gem_trainer.py`](./gem_trainer.py) to adjust epochs, batch size, or other settings to fit your needs. --- ## Contributing 💓 Got ideas to make this even better? We’re all ears! - Fork the repo. - Branch off (`git checkout -b your-feature`). - Submit a pull request with your magic. ---

许可证:Apache-2.0 # GEM_Testing_Arsenal 欢迎来到**GEM_Testing_Arsenal**,开创性研究与实用效能在此交融!本仓库源自我们的学术论文《脆弱的掌控:领域特定权衡是否正在削弱端侧语言模型?》(英文原名:*Fragile Mastery: are domain-specific trade-offs undermining On-Device Language Models?*,链接待补充),公开了一种全新的端侧语言模型(On-Device Language Models, ODLMs)架构。仅需数行代码,我们定制的`gem_trainer.py`脚本即可助力你训练精度表现更优异的端侧语言模型,并实时追踪训练过程中的精度与损失变化。 --- ## 核心亮点: - **进阶端侧语言模型**:基于我们的研究提出全新架构,有效提升模型精度。 - **便捷训练流程**:调用`run_gem_pipeline`即可在数分钟内基于自定义数据集完成模型训练。 - **实时指标监控**:训练过程中即可同步获取精度与损失结果。 - **灵活适配设计**:兼容所有合规数据集,即插即用! --- ## 前置依赖: 如需使用本项目,你需准备: - **Python** 3.8及以上版本 - 所需依赖库(详见下文[快速上手](#快速上手)章节 👇) - **Git**(用于克隆本仓库) --- ## 快速上手: 1. **克隆本仓库:** bash git clone https://huggingface.co/datasets/GEM025/GEM_Arsenal 2. **安装依赖:** pwsh pip install -r requirements.txt 3. **训练你的模型:** 创建新的Python文件并执行如下代码: python from datasets import load_dataset from gem_trainer import run_gem_pipeline # 加载数据集(以Banking77为例,只需替换此处即可更换数据集) dataset = load_dataset("banking77") # 训练端侧语言模型 results = run_gem_pipeline(dataset, num_classes=77) print(results) # 查看精度与损失结果 > **大功告成——你的高精度端侧语言模型已开始训练!** --- ## 在Colab/Kaggle上运行? 流程与本地运行基本一致。 python """ 强烈建议在训练期间添加以下代码以获得整洁的输出... import warnings warnings.filterwarnings('ignore') """ #@ 步骤1:克隆GitHub仓库 ! git clone https://huggingface.co/datasets/GEM025/GEM_Arsenal #@ 步骤2:安装所有依赖项 !pip install -r /content/GEM_Arsenal/requirements.txt # ! 适用于Colab环境 """ @! 适用于Kaggle的命令: !pip install -r /kaggle/working/GEM_Arsenal/requirements.txt """ #@ 步骤3:将仓库路径添加至系统路径 import sys sys.path.append('/content/GEM_Arsenal') # ! Kaggle环境请替换为 /kaggle/working/GEM_Arsenal #@ 步骤4:导入并运行函数 from gem_trainer import run_gem_pipeline from datasets import load_dataset #@ 其余代码与上文一致 dataset = load_dataset("imdb") result = run_gem_pipeline(dataset, num_classes=2, num_epochs=2) print(result) --- ## 自定义训练参数 `run_gem_pipeline`已简化了训练流程,但你仍可根据需求进行调整!可直接编辑[`gem_trainer.py`](./gem_trainer.py)文件,修改训练轮次、批次大小等参数以适配你的任务需求。 --- ## 贡献指南 💓 如有任何优化想法,我们竭诚欢迎! - Fork本仓库 - 创建功能分支(`git checkout -b your-feature`) - 提交包含你的改进内容的拉取请求
提供机构:
GEM025
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作