Qwen 3 model series
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/Ledzy/StreamBP
下载链接
链接失效反馈官方服务:
资源简介:
该数据集对基于Qwen 3模型系列的StreamBP进行了评估,该评估主要关注内存高效和精确的反向传播方法。此外,StreamBP还可应用于因果语言模型,例如Llama、Mistral和Gemma。在单一A800-80GB GPU的规模上,该数据集的任务包括反向传播成本、训练成本以及分布式训练。
This dataset evaluates StreamBP based on the Qwen 3 model series, with the evaluation primarily focusing on memory-efficient and precise backpropagation methods. Furthermore, StreamBP can also be applied to causal language models such as Llama, Mistral, and Gemma. For experiments conducted on a single A800-80GB GPU, the tasks included in this dataset cover backpropagation cost, training cost, and distributed training.
提供机构:
Authors of the paper



