five

LLaMA Models Performance Comparison

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/ggerganov/llama.cpp
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集对比了优化后的4位组量化核(Q4_0_8_8和Q4_0_4_8)与LLaMA.cpp的4位核(Q4_0)在推理吞吐量和准确度方面的性能指标。这一比较是通过使用困惑度(PPL)来衡量的。数据集还包括了不同批量大小下的吞吐量数据,以及带有高级MMLA操作与不带此操作的核之间的比较。此外,该数据集还涵盖了在多种处理器配置(Graviton2、Graviton3、Graviton4)上的性能测量。该任务的目的是比较优化核与LLaMA.cpp在推理吞吐量上的差异。

This dataset compares the performance metrics of two optimized grouped 4-bit quantization kernels (Q4_0_8_8 and Q4_0_4_8) against the 4-bit kernel from LLaMA.cpp (Q4_0) in terms of inference throughput and accuracy. The comparison is evaluated using perplexity (PPL). Additionally, the dataset includes throughput data across different batch sizes, as well as comparisons between kernels with advanced MMLA operations and those without such operations. Furthermore, it covers performance measurements across multiple processor configurations: Graviton2, Graviton3, and Graviton4. The purpose of this dataset is to compare the differences in inference throughput between the optimized kernels and LLaMA.cpp's baseline kernel.
提供机构:
Authors of the paper
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作