OPTIMIZING LLAMA 3.2 1B USING QUANTIZATION TECHNIQUES USING BITSANDBYTES FOR EFFICIENT AI DEPLOYMENT

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/records/15194422

下载链接

链接失效反馈

官方服务：

资源简介：

Large Language Models (LLMs) have transformed natural language processing, which has achieved state-of-the-art performance on various tasks. However, their high computational and memory requirements lead to significant challenges for deployment, especially on resource-constrained hardware. In this paper, we conduct a controlled experiment to optimize the LLaMA 3.2 1B model using post-training quantization techniques implemented using the Bitsandbytes library. Evaluating multiple precision settings like BF16, FP16, INT8, and INT4 compare their accuracy, throughput, latency, and resource utilization tradeoffs. Experiments are conducted on a workstation GPU (NVIDIA T1000) for accuracy benchmarking and a cloud-based GPU (Nvidia T4 on Google Colab) for performance benchmarking. Our findings show that lower precision quantization can significantly reduce memory usage and improve throughput with minimal impact on model accuracy, providing valuable insights for efficient AI deployment for production environments.

创建时间：

2025-04-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集