"Beyond the GPU: Efficiency, Limitations, and Future Trends in FPGA LLM Inference"

Name: "Beyond the GPU: Efficiency, Limitations, and Future Trends in FPGA LLM Inference"
Creator: IEEE DataPort
Published: 2026-01-08 19:13:21
License: 暂无描述

DataCite Commons2026-01-08 更新2026-05-03 收录

下载链接：

https://ieee-dataport.org/documents/beyond-gpu-efficiency-limitations-and-future-trends-fpga-llm-inference

下载链接

链接失效反馈

官方服务：

资源简介：

"Large-language model (LLM) inference is a rapidly growing class of computer workload, with over 100~GW of compute capacity expected to come online in the next 5 years. The most popular chips used for LLM inference are graphics processing units (GPUs), which are expensive and power-intensive. We present a review of the latest literature on LLM inference using field-programmable gate arrays (FPGAs), which are chips that can be programmatically optimized for inference tasks. We find that current FPGA implementations can achieve similar inference speeds to GPUs while using up to 80 percent less energy. We also find that FPGA performance is primarily constrained by memory bandwidth, limiting them to smaller models. We discuss these limitations, provide analysis of upcoming FPGA products that may overcome these limitations, and analyze the implications of our findings on the future of LLM inference hardware."

大语言模型（Large-language model，LLM）推理是一类快速增长的计算机工作负载，预计未来5年将新增超过100~吉瓦的算力投入使用。当前用于LLM推理的主流芯片为图形处理器（Graphics Processing Units，GPUs），但其成本高昂且功耗密集。本文综述了利用现场可编程门阵列（Field-Programmable Gate Arrays，FPGAs）开展LLM推理的最新研究进展，这类芯片可针对推理任务进行可编程优化。研究发现，当前的FPGA实现方案可实现与GPU相当的推理速度，同时能耗最高可降低80%。此外，研究表明FPGA的性能主要受限于内存带宽，这使其仅能适配规模较小的模型。本文针对上述局限性展开讨论，对有望突破此类限制的下一代FPGA产品进行分析，并探讨了本研究结论对LLM推理硬件未来发展的启示。

提供机构：

IEEE DataPort

创建时间：

2026-01-08