Llama3 405B model

Name: Llama3 405B model
Creator: Meta
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集采用了一种密集型变压器模型，专为处理长上下文的大型语言模型推断而设计，能够处理最大为128K个标记的上下文窗口。该模型包含了行级量化FP8权重，并针对多GPU的高并行效率进行了优化。在规模上，该数据集具备100万上下文预填充的能力，并在1至16个节点的性能基准测试中表现出色。其任务是对长上下文进行预填充及并行扩展解码。

This dataset employs a dense Transformer model tailored for large language model (LLM) inference with long contexts, supporting a maximum context window of 128K tokens. The model integrates row-wise quantized FP8 weights and is optimized for high parallel efficiency across multi-GPU setups. Regarding scalability, this dataset enables 1 million-token context prefill and delivers outstanding performance in benchmarks spanning 1 to 16 compute nodes. Its core tasks include long-context prefill and parallel scalable decoding.

提供机构：