Automatic compilation mechanism for dynamic horizontal fusion on accelerators
收藏中国科学数据2026-03-25 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.1360/SSI-2025-0283
下载链接
链接失效反馈官方服务:
资源简介:
Accelerators such as graphics processing units (GPUs) have been widely adopted to accelerate compute-intensive tasks like deep learning, owing to their powerful parallel processing capabilities. To fully exploit the hardware potential of GPUs, system-level optimization techniques are crucial, among which kernel fusion has become a widely used approach in mainstream frameworks. Horizontal fusion enables parallel scheduling of multiple independent kernels, thereby improving resource utilization. However, existing horizontal fusion methods are typically designed for static computation graphs and struggle to support tasks with dynamic branching structures—such as mixture-of-experts (MoE) models—where inputs are dynamically routed to different sub-networks at runtime, making it difficult to predefine fused kernels.To address this challenge, we propose Fluxer an automated horizontal fusion technique tailored for dynamic branches. Fluxer
创建时间:
2025-10-15



