LLaMA Models Performance Comparison

Name: LLaMA Models Performance Comparison
Creator: Authors of the paper
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/ggerganov/llama.cpp

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集对比了优化后的4位组量化核（Q4_0_8_8和Q4_0_4_8）与LLaMA.cpp的4位核（Q4_0）在推理吞吐量和准确度方面的性能指标。这一比较是通过使用困惑度（PPL）来衡量的。数据集还包括了不同批量大小下的吞吐量数据，以及带有高级MMLA操作与不带此操作的核之间的比较。此外，该数据集还涵盖了在多种处理器配置（Graviton2、Graviton3、Graviton4）上的性能测量。该任务的目的是比较优化核与LLaMA.cpp在推理吞吐量上的差异。

This dataset compares the performance metrics of two optimized grouped 4-bit quantization kernels (Q4_0_8_8 and Q4_0_4_8) against the 4-bit kernel from LLaMA.cpp (Q4_0) in terms of inference throughput and accuracy. The comparison is evaluated using perplexity (PPL). Additionally, the dataset includes throughput data across different batch sizes, as well as comparisons between kernels with advanced MMLA operations and those without such operations. Furthermore, it covers performance measurements across multiple processor configurations: Graviton2, Graviton3, and Graviton4. The purpose of this dataset is to compare the differences in inference throughput between the optimized kernels and LLaMA.cpp's baseline kernel.

提供机构：

Authors of the paper

5,000+

优质数据集

54 个

任务类型

进入经典数据集